Can you afford unplanned downtime in your data centre?
The “always-on” digital world is driving the need for ‘always-available’ data centres – data centres that ensure uninterrupted business operations and provide steady uptime for mission-critical applications.
However, according to the 2018 Uptime Institute survey, nearly one-third of all data centres had an outage in 2017, up from 25 percent the year before. This is mainly due to the rapidly increasing volume of business-critical applications, which are adding more complexity to modern-day data centre operations.
While redundancies are built into data centre design, there will always be outages due to unforeseen circumstances. When problems do occur, recovery times can be lengthy and fault diagnosis complicated, resulting in significant costs.
THE REAL COST OF DOWNTIME
A Single Hour Of Downtime Can Cost Large Enterprises Over $100,000.
The Average Cost Of An Unplanned Outage Is $9,000 PER MINUTE And The Maximum Downtime Cost Is $2,409,991.
A Data Centre Outage Results In More Than Just Finance Loss It Can:
- Damage Data And Critical Equipment
- Impact Productivity
- Affect The Credibility And Reputation Of Your Organisation
Tackling the Causes of Data Centre Downtime
There are several possible causes of data centre outages, including skill shortages, higher densities being deployed in a facility that is not fit for its purpose, and, in some cases, over-engineered solutions that result in further complexity. Also, many data centres are showing their age, but are not upgraded or modernised since their importance is often ignored.
A 2016 Ponemon study points to four main reasons for data centre downtime:
- UPS failures: 25 percent of all outages are a result of UPS equipment failure, or when power is drawn from the UPS beyond its capacity. UPS systems rely on battery backups in case of a power outage, but batteries do not last forever.
- Human error: Even the most experienced people make mistakes when weary, in haste, or when simply distracted. Human error accounts for 22 percent of all such incidents.
- Poor infrastructure capacity management: Data centres are becoming denser to offer higher performance and computing power. As they draw more power at each rack, data centre managers need data and insights to understand current system capacity and accurately predict future needs.
- Poor maintenance and lifecycle strategy: Poor maintenance can also result in outages, making it necessary to conduct installation and visual inspections, and perform capacity checks.
As increasing complexity continues to create even more operational challenges, organisations must consider ways to minimise costly disruptions. Early operational threat detection can help prevent or minimise potential outages for a more resilient data centre.
Data centre managers today need new tools and proactive approaches that can facilitate early threat detection. This means deploying solutions that can consolidate and analyse equipment performance data in real-time, and use the insights garnered from previous analyses to reduce the risks of future oversights. Such tools will dramatically increase the data centre’s ability to detect, pre-empt, and respond to disruptions quickly.
Predictive Analytics: Preventing Data Centre Downtime Through Early Threat Detection
Sophisticated predictive algorithms in advanced analytics with machine learning (ML) can help detect anomalies and predict operational threats in data centres before they occur.
Forecasting is the basis of anomaly detection. Information is gathered from millions of real-time and historical data-sets across a wide range of data centre equipment and sensors. This data is analysed and used to predict or infer what should happen at a point in time. Predicted values are then compared with actual values; a difference between the two values indicates an unusual event.
Using ML technologies, data models can inform why and when equipment might fail based upon trends in machine data. These empower data centres with predictive insights to identify when a piece of equipment is showing signs of wear, and act accordingly before any breakdown or outage occurs. They can also determine the circumstances that tend to cause equipment to fail, thus preventing their recurrence. Data-driven anomaly detection can reduce downtime by 30 to 50 percent and increase equipment life by 20 to 40 percent.
Predictive algorithms and ML work together to help you do the following:
- Anticipate operational threats and anomalies to mitigate disruptions: The use of algorithms that work synergistically by leveraging machine-learning techniques, as well as mathematical and statistical analysis, helps establish behavioural baselines and decipher hidden patterns that vary from the norm. Real-time alerts about impending issues enable you to quickly respond to factors that could cause downtime – before they cause significant disruptions to the business.
- Analyse root causes of failures: Deep learning and data mining techniques can accurately analyse the critical moments of an event to determine cause-and-effect relationships. These techniques enable meaningful insights to determine the root cause of an event and establish the right actions to be taken to prevent recurrence.
- Manage the reliability of critical assets: Machine learning algorithms can predict factors that cause equipment failure based on the equipment’s current behavioural pattern. Such a capability can enable proactive understanding of an issue relating to a machine’s performance and health, thus averting possible disaster.
At ENGIE, we use ML-powered predictive analytics to proactively detect unseen anomalies and anticipate operational threats that need immediate attention. Our data centre experts investigate the first signs of impending threats, use data to determine underlining causes, and initiate corrective actions to mitigate the risk of downtime.
ENGIE data centre experts provide a fact-based, data-driven approach with predictive capabilities that eradicate problems and reduce the risk of critical events and unplanned data centre outages.
To find out more about how ENGIE can help you anticipate operational threats and reduce data centre operational complexities, contact us today at engieDC@avrildigital.com.