By Ronald van Loon, Director, Adversitement
This article is by Featured Blogger Ronald van Loon from his LinkedIn page. Republished with the author's permission.
To state that DevOps and IT operations teams will face new challenges in the coming years sounds a bit redundant, as their core responsibility is to solve problems and overcome challenges. However, with the dramatic pace in which the current landscape of processes, technologies, and tools are changing, it has become quite problematic to cope with it. Moreover, the pressure business users have been putting on DevOps and IT operations teams is staggering, demanding that everything should be solved with a tap on an app. However, at the backend, handling issues is a different ball game; the users can’t even imagine how difficult it is to find a problem and solve it.
One of the biggest challenges IT operations and DevOps teams face nowadays is being able to pinpoint the small yet potentially harmful issues in large streams of Big Data being logged in their environment. Put simply, it is just like finding a needle in the haystack.
If you work in the IT department of a company with online presence that boasts 24/7 availability, here is a scenario that may sound familiar to you. Assume that you get a call in the middle of the night from an angry customer or your boss complaining about a failed credit card transaction or an application crash. You go to your laptop right away and open the log management system. You see there are a more than a hundred thousand messages logged at the set timeframe – a data set impossible for a human being to review line by line.
So what do you do in such a situation?
It is the story of every IT operations and DevOps professional; they spend many sleepless nights, navigating through the sea of log entries to find critical events that triggered a specific event. This is where real-time and centralized log analytics come to the rescue. It helps them in understanding the essential aspects of their log data, and easily identify the main issues. With this, the troubleshooting process becomes a walk in the park, making it shorter and more effective, as well as enabling experts to predict the future problems.
AI and Its Effect on IT Operations and DevOps
While Artificial Intelligence (AI) used to be the buzzword a few decades ago, it is now being commonly applied across different industries for a diverse range of purposes. Combining big data, AI, and human domain knowledge, technologists and scientists have become able to create astounding breakthroughs and opportunities, which used to be possible in science fiction novels and movies only.
As IT operations become agile and dynamic, they are also getting immensely complex. The human mind is no longer capable of keeping up with the velocity, volume, and variety of Big Data streaming through daily operations, making AI a powerful and essential tool for optimizing the analyzing and decision-making processes. AI helps in filling the gaps between humans and Big Data, giving them the required operational intelligence and speed to significantly waive off the burden of troubleshooting and real-time decision-making.
Addressing the Elephant in the Room – How AI can Help
In all the above situations, one thing is common; these companies need a solution – as discussed in the beginning – that helps IT and DevOps teams to quickly find problems in the mountain of log data entries. To identify that single log entry putting cracks in the environment and crashing your applications, wouldn’t it be easy if you just knew what kind of error you are looking for to filter your log data? Of course, it would cut down the amount of work by half.
One solution can be to have a platform that has collected data from the internet about all kinds of related incidents, observed how people using similar setups resolved them in their systems, and scanned through your system to identify the potential problems. One way to achieve this is to design a system that mimics how a user investigates, monitors, and troubleshoots events, and allows it to develop an understating how humans interact with data instead of trying to analyze the data itself. For example, this technology can be similar to Amazon’s product recommendation system and Google’s PageRank algorithm, but it will be focused on log data.
Introducing Cognitive Insights
A recent technology implements a solution as envisioned by this post. The technology - which has been generating quite a lot of buzz lately- is called Cognitive Insights. This groundbreaking technology uses machine-learning algorithms to match human domain knowledge with log data, along with open source repositories, discussion forums, and social thread. Using all this information, it makes a data reservoir of relevant insights that may contain solutions to a wide range of critical issues, faced by IT operations and DevOps teams on a daily basis.
The Real-Time Obstacles
DevOps engineers, IT Operations managers, CTOs, VP engineering, and CISO face numerous challenges, which can be mitigated effectively by integrating AI in log analysis and related operations. While there are several applications of Cognitive Insights, the two main use cases are:
Distributed Denial of Service (DDoS) attacks are increasingly becoming common. What used to be just limited to governments, high-profile websites, and multinational organizations is now targeting prominent individuals, SMBs and mid-sized enterprises.
To ward off such attacks, having a centralized logging architecture to identify suspicious activities and pinpoint the potential threats from thousands of entries is essential. For this, anti-DDoS mitigation through Cognitive Insights has been proven to be highly effective. Leading names, such as Dyn and British Airways, that sustained significant damage from DDoS attacks in the past now have a full-fledge, ELK-based anti-DDoS mitigation strategy in place to keep hackers at bay and secure their operations from any future attacks.
Wouldn’t it be great to have all your logs compiled into a single place, with each entry carefully monitored and registered? Well, certainly. You will be able to view the process flow clearly and execute queries pertaining to the logs from different applications all from one place, hence dramatically increasing the efficiency of your IT operations. To solve one of the biggest challenges IT operations and DevOps teams face is being able to pinpoint the small yet potentially harmful issues in large streams of log data in their environment. This is precisely what Cognitive Insights does. Since the core of this program is based on the ELK stack, it sorts and simplifies the data and makes it easy to have clear picture of your IT operations. Asurion and Performance Gateway are perfect examples that have leveraged from Cognitive Insights and taken their IT game up a notch.
The Good AI Integration Can Yield
Using AI driven log analytics systems, it becomes considerably easy to find the needle in the haystack, and efficiently solve issues. Such a system will have a considerable impact on management and operations of the entire organization. Like the problems of companies discussed above in this blog, integrating AI with log management system will benefit in:
- Improved customer success
- Monitoring and customer support
- Risk reduction and resource optimization
- Maximize efficiency by making logging data accessible
In other words, Cognitive Insights and other similar systems can be of great help in data log management and troubleshooting.
Rent-A-Center (RAC) is a Texas-based, Fortune 1000 company that offers a wide range of rent-to-own products and services. It has over 3000 stores and 2000 kiosks spread across Mexico, Puerto Rico, Canada, and United States. The company tried integrating two different ELK stacks, but handling 100GB data every day was too much of a hassle, not to mention the exorbitant cost and time spent every day for disk management, memory tuning, additional data input capabilities, and other technical issues. RAC transitioned to Cognitive Insights, which gave them the confidence that they will be able to detect future anomalies and made it quite easily to scale the constantly growing volume of data. They benefitted from a dedicated IT team managing on-premise and off-premise ELK stacks.
The Role of Open Source in Data Log Management
Many reputed vendors are proactively researching and testing AI in different avenues to enhance the efficiency of data log management systems. Some of the vendors are Logz.io, Splunk, Sumo Logic, and Loggly.
There is no surprise in the fact that ELK is fast becoming part of the trend, and more and more vendors are offering logging solutions. This is because it has become a great way for companies to install a setup without incurring a staggering upfront cost. It also allows for some basic graphing and searching capabilities, and in order for the organizations to recognize the issues in their haystack of log data, they can opt for latest technologies, like Cognitive Insights, to quickly find the needle and eliminate the main problems.
Originally published on LinkedIn