Proactive 'always-on' incident management - critical in cloud environments
From user error and accidental file deletion to a system crash, incident management forms the last line of defence when it comes to recovering information. Given the importance of data to today’s business, a reactive approach that is only available during business hours is simply no longer enough. In all data-driven environments today, on-premise and particularly when it comes to the cloud, an always-on, proactive approach to incident management is key to success.
The conventional approach
The typical approach to incident management involves sending alerts, which can be system or user-generated, through to a support centre. System-generated alerts are configured to send when certain errors occur, while users are also able to log issues when they experience a problem.
These alerts are logged into an incident management system, which creates a ticket that is then assigned to a team lead to allocate a level of severity and appropriate resources. This reactive approach only assigns resources once a problem has already occurred, which means that it is only addressed when the system is already down. The consequence is that data may already be missing and user productivity affected. Often, service resources are also only available during office hours, causing further delays in resolution.
But what about a more pre-emptive approach?
As businesses increasingly migrate to the cloud, it is essential for incident management to be available 24/7/365 and to become more proactive to identify issues before they can cause downtime. Incident management should offer proactive alerting of anomalies so that potential problems can be identified before they happen.
Support teams can then engage with customers to investigate and provide remedial action. This will ensure that the number of actual incidents can be drastically reduced. Some simple examples include licensing that is about to expire, licenses that are on the verge of reaching full capacity, and storage space that has almost run out. These can easily be resolved proactively before they can cause downtime and lost productivity.
Communication is key
Aside from always available and proactive incident management, communication and change control processes need to be updated for the cloud. Customers need to be aware of the status of any incidents at all times and should be proactively updated when this status changes. It is also important to have the right collaboration tools in place, to prevent any misinterpretation and misunderstanding from either party.
This will help to ensure incidents can be resolved as quickly as possible. Change processes must become - more agile and flexible to enable 24x7 decision-making capability, so that incident resolution is not delayed unnecessarily.
From managing your data to managing your dAIta
Artificial Intelligence (AI) offers a number of potential opportunities for incident management. For example, AI can be used to more effectively categorise manually generated incidents to ensure that the most appropriate resources are assigned from the start. AI also has applications within the data management space, including identifying the best network routes for data migrations, determining the optimal mix of storage tiers, and automatically resolving certain issues. While these solutions do not currently exist, many vendors are looking to build AI into their systems.
Incident management starts and ends with availability and agility
As data becomes increasingly important, and the prevalence of cloud-based solutions grows, incident management must evolve to meet changing demand. Business can no longer afford any downtime at all, and data must be effectively managed at all times. Incident resolution cannot wait for business hours or Monday morning, especially once you have moved into the cloud.
Data management specialists should be agile enough to assist any customer at any time regardless of the issue at hand. In addition to this the data management specialists should deliver proactive, always-on incident management, including after hours and on weekends.