On October 1, the Tokyo Stock Exchange (TSE) was halted for a complete day due to an issue IT professionals tend to overlook, a piece of hardware.
A crucial data storage and distribution device dubbed “Arrowhead” had malfunctioned, and the automatic backup failed to initiate. Arrowhead is the heartbeat of the TSE, giving and taking commands, routing data and, most importantly, monitoring trades. Without it, the exchange was forced to shut down for a full day, its longest sustained outage since 1999.
The Arrowhead system is a hardware and software suite developed by Fujitsu that consists of two shared disk devices. On the day in question, the primary disk “Number 1 shared disk” encountered a memory error. When this occurred, the secondary device “Number 2 shared disk” should have automatically taken over in a failover procedure – essentially a handshake – to seamlessly keep processes functioning as normal. But this was not the case, since a forced manual failover needed to occur and that would have required a restart of the entire system, which was out of the question since orders, trades and data were already beginning to backlog. TSE officials made the decision to halt trading and resume operations the next day.
What could have prevented this?
Testing recovery strategies – Testing recovery strategies and documenting results can often aid in the ability to recover in scenarios where time is of the essence. During the testing of recovery strategies, critical recovery gaps can be identified, addressed and resolved. Conducting testing scenarios better prepares an organization and can help greatly reduce the impact of downtime when dealing with a real-life disruption.
Additional Backup and Recovery Considerations
Configuring backups – Configuring backup jobs are vital to all types of businesses. Backing up device configurations and data allows organizations to be resilient to disruptive events and occurrences. The ability to recovery quickly and completely is invaluable.
Monitoring backups – While the configuration of backups is critical, what good is backup data if it fails to actually back up successfully? Monitoring backup jobs is equally as important, as it ensures data is in a healthy state and ready for recovery.
Restoration testing of backups – A crucial part of successful data backups is testing the ability to restore that data. Restoration testing will help ensure data and systems are able to be recovered as intended. Businesses should conduct backup restoration testing frequently to confirm that they are prepared for an unscheduled outage or disaster-type event similar to what was faced by the TSE.
If you have questions about your backup and recovery strategies or your disaster recovery plan, please connect with us. We would welcome a discussion.
You’ve heard our thoughts… We’d like to hear yours
The Schneider Downs Our Thoughts On blog exists to create a dialogue on issues that are important to organizations and individuals. While we enjoy sharing our ideas and insights, we’re especially interested in what you may have to say. If you have a question or a comment about this article – or any article from the Our Thoughts On blog – we hope you’ll share it with us. After all, a dialogue is an exchange of ideas, and we’d like to hear from you. Email us at [email protected].
Material discussed is meant for informational purposes only, and it is not to be construed as investment, tax, or legal advice. Please note that individual situations can vary. Therefore, this information should be relied upon when coordinated with individual professional advice.