Learn about anomaly detection in machine learning, including types of anomalies, various anomaly detection techniques, and industry applications.
![[Featured image] A group of doctors meets in a conference room to discuss how anomaly detection in machine learning can help them detect diseases in patients.](https://d3njjcbhbojbot.cloudfront.net/api/utilities/v1/imageproxy/https://images.ctfassets.net/wp1lcwdav1p1/6FAJmZ2MFELyAIyUzUuRuZ/95be511c57cce58b877bcd354c473da7/GettyImages-533768683.jpg?w=1500&h=680&q=60&fit=fill&f=faces&fm=jpg&fl=progressive&auto=format%2Ccompress&dpr=1&w=1000)
Anomaly detection machine learning is a data-related task where algorithms work to identify outliers.
Identifying any outliers or anomalies is important for maintaining high-quality data, ultimately leading to reliable outputs in your analysis.
Anomaly detection is useful in a range of industries, including cybersecurity, e-commerce, and fraud detection.
You can use supervised and unsupervised learning techniques to detect anomalies, as well as a combination of both with semi-supervised learning.
Learn how anomaly detection machine learning techniques can help you improve the quality of your data. Ready to start building machine learning skills? Earn an IBM Machine Learning Professional Certificate to help you gain in-demand skills and prepare for a career in machine learning. With this certificate, you can build a portfolio displaying your work, like coding your own projects and training neural networks.
Anomaly detection in machine learning is the process of using machine learning models to identify anomalies rapidly. This serves several purposes, whether to maintain clean, high-quality data that you will use for processing or specific business purposes. By ensuring quality data, organizations can have trust in their analysis, leading to better decision-making.
Although anomaly detection techniques have previously existed, more modern efforts that utilize machine learning can automatically detect outliers. The main advantages of anomaly detection in machine learning include the ability to handle significant volumes of data, high-dimensional data from various sources, a high success rate in identifying anomalies, and the ability to have real-time detection.
Read more: 10 AI Tools to Improve Data Quality (and How to Use Them)
The three basic approaches to anomaly detection are the same as the three basic approaches to machine learning: supervised, unsupervised, and semi-supervised. A supervised anomaly detection algorithm uses pre-labeled data to understand patterns, while an unsupervised algorithm determines patterns from unlabeled data. Semi-supervised anomaly detection algorithms start with labeled data and build a model of behavior, which they can then use to determine whether new activity falls into typical or abnormal categories.
Anomalies can present themselves in different ways. While some anomalies are merely outliers that happen suddenly, other anomalies are less obvious. In some cases, anomalies can appear as a gradual change over time, slowly altering the data.
Two ways you can classify anomalies are intentional versus unintentional:
Intentional anomalies occur because of a specific event, such as a cyber attack on a company's network.
Unintentional anomalies instead arise due to an error at some point during data collection, such as a human error or miscalculation, damaging the quality of a data set.
When looking closer at the relationships within data, three more types of anomalies exist: contextual anomalies, point anomalies, and collective anomalies:
Contextual anomalies are similar to intentional anomalies in that they can occur due to specific events.
Point anomalies are outliers that stand out from other data points within a data set.
Collective anomalies are data points that sequentially occur and may need closer investigation to determine if they’re a cause for concern.
Anomaly detection techniques fall into one of three categories: unsupervised anomaly detection, supervised anomaly detection, and semi-supervised anomaly detection. The right anomaly detection technique depends greatly on the type of data you're working with and how much of the data has labels versus unlabeled.
Unsupervised anomaly detection is a popular approach to anomaly detection in machine learning. This is because unlabeled anomalous data is more common, allowing the unsupervised anomaly detection algorithm to make discoveries on its own, with no need for labels. This technique is seen in deep learning, using algorithms such as artificial neural networks, isolation forests, and one-class support vector machines. You can see unsupervised anomaly detection used in areas such as fraud detection and detecting medical anomalies.
Supervised anomaly detection requires the use of labeled data, unlike unsupervised learning methods. The downside of this is the fact that the algorithm can only detect anomalies that it’s seen before in its training data. This requires providing the algorithm with enough examples of anomalies and proper data. Examples of supervised anomaly detection algorithms include random forests and k-nearest neighbors. Some of the industry applications for these algorithms are detecting fraudulent transactions as well as detecting any defects that occur during manufacturing.
Semi-supervised anomaly detection blends together facets of both supervised and unsupervised anomaly detection methods, with the ability to handle some labeled data in addition to large amounts of unlabeled data. Using labeled data gives you more control over the training process, potentially leading to better outcomes. An example of a semi-supervised anomaly detection algorithm is linear regression. Use cases for these algorithms include highly complex and industry-specific systems, as well as fraud detection.
Anomaly detection in machine learning does come with certain challenges. Unsupervised and supervised approaches can sometimes return too many false positives. This means that extra effort has to then go into developing a better model, as well as identifying the false positives in the first place. The results of anomaly detection also aren’t always simple to interpret, making it necessary to have employees properly equipped with skills to understand what they’re reviewing.
Anomaly detection requires some specific data features as well. The data you use to develop a trained algorithm needs to be clean, with no duplicate information or incomplete data sets. Additionally, the size of the data set you use for training matters. If you don’t have a big enough training set, the model can’t accurately develop the model. To make up for a lack of data, one option is to implement synthetic data sets.
Implementing anomaly detection is useful across a wide range of industries. Here’s a look at specific use cases:
In e-commerce, anomaly detection allows businesses to measure changes in conversion rate. This allows you to quickly spot the issues causing these changes, whether due to seasonal changes, website issues, or other technical problems.
When businesses partner with content creators on social media, it’s important to ensure they’re working with real people and not fraudulent accounts. Anomaly detection algorithms can identify behavior that is concerning in this context.
The cybersecurity industry benefits greatly from anomaly detection, with the power to differentiate potentially malicious activity that can cause damage to the system from standard online actions.
Anomaly detection helps with the monitoring of information technology systems, observing metrics that provide context to the overall performance of a system.
Interested in learning more about machine learning? Check out some of our free resources, like our LinkedIn newsletter, Career Chat, to stay up to date with the latest industry developments.
Watch on YouTube: Machine Learning in Real Life: From Spotify to Healthcare
Hear from an expert: The AI Advantage: 9 Questions with UC Davis AI Instructor Sadie St. Lawrence
Whether you want to get comfortable with an in-demand technology or learn a new skill, keep growing with a Coursera Plus subscription. You’ll get access to over 10,000 flexible courses.
Editorial Team
Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...
This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.