MIT AI² Predicts 85% of Cyber Attacks

MIT's new AI² artificial intelligence platform combs through data and detects suspicious activity using unsupervised machine-learning. It then presents this activity to human analysts, who confirm which events are actual attacks, and incorporate that feedback into its models for the next set of data.


Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the machine-learning startup PatternEx demonstrate an artificial intelligence platform called AI² that predicts cyber-attacks significantly better than existing systems by continuously incorporating input from human experts. (The name comes from merging artificial intelligence with what the researchers call “analyst intuition.”)

The team showed that AI² can detect 85 percent of attacks, which is roughly three times better than previous benchmarks, while also reducing the number of false positives by a factor of 5. The system was tested on 3.6 billion pieces of data known as “log lines,” which were generated by millions of users over a period of three months.

To predict attacks, AI² combs through data and detects suspicious activity by clustering the data into meaningful patterns using unsupervised machine-learning. It then presents this activity to human analysts who confirm which events are actual attacks, and incorporates that feedback into its models for the next set of data.

“You can think about the system as a virtual analyst,” says CSAIL research scientist Kalyan Veeramachaneni, who developed AI² with Ignacio Arnaldo, a chief data scientist at PatternEx and a former CSAIL postdoc. “It continuously generates new models that it can refine in as little as a few hours, meaning it can improve its detection rates significantly and rapidly.”

Veeramachaneni presented a paper about the system (PDF) at last week’s IEEE International Conference on Big Data Security in New York City.

Creating cybersecurity systems that merge human- and computer-based approaches is tricky, partly because of the challenge of manually labeling cybersecurity data for the algorithms.

For example, let’s say you want to develop a computer-vision algorithm that can identify objects with high accuracy. Labeling data for that is simple: Just enlist a few human volunteers to label photos as either “objects” or “non-objects,” and feed that data into the algorithm.


More MIT Robotics Coverage

MIT 3D Prints Hydraulic Robots

MIT’s Chronos Could Lead to Safer Drones

DeepDrumpf Twitterbot Uses AI to Imitate Donald Trump

MIT ‘Eyeriss’ Neural Chip Powers Mobile AI

MIT Drone Autonomously Navigates Obstacle Course

Complete MIT Coverage


But for a cybersecurity task, the average person on a crowdsourcing site like Amazon Mechanical Turk simply doesn’t have the skillset to apply labels like “DDOS” or “exfiltration attacks,” says Veeramachaneni. “You need security experts.”

That opens up another problem: Experts are busy, and they can’t spend all day reviewing reams of data that have been flagged as suspicious. Companies have been known to give up on platforms that are too much work, so an effective machine-learning system has to be able to improve itself without overwhelming its human overlords.

AI²’s secret weapon is that it fuses together three different unsupervised-learning methods, and then shows the top events to analysts for them to label. It then builds a supervised model that it can constantly refine through what the team calls a “continuous active learning system.”

Specifically, on day one of its training, AI² picks the 200 most abnormal events and gives them to the expert. As it improves over time, it identifies more and more of the events as actual attacks, meaning that in a matter of days the analyst may only be looking at 30 or 40 events a day.

“This paper brings together the strengths of analyst intuition and machine learning, and ultimately drives down both false positives and false negatives,” says Nitesh Chawla, the Frank M. Freimann Professor of Computer Science at the University of Notre Dame. “This research has the potential to become a line of defense against attacks such as fraud, service abuse and account takeover, which are major challenges faced by consumer-facing systems.”

The team says that AI² can scale to billions of log lines per day, transforming the pieces of data on a minute-by-minute basis into different “features”, or discrete types of behavior that are eventually deemed “normal” or “abnormal.”

“The more attacks the system detects, the more analyst feedback it receives, which, in turn, improves the accuracy of future predictions,” Veeramachaneni says. “That human-machine interaction creates a beautiful, cascading effect.”

This article was republished with permission from MIT News.




About the Author

MIT News · MIT News is dedicated to communicating to the media and the public the news and achievements of the students at the Massachusetts Institute of Technology.
Contact MIT News: newsoffice@mit.edu  ·  View More by MIT News.
Follow MIT on Twitter. Follow on FaceBook



Comments



Log in to leave a Comment



Editors’ Picks

FAA’s Recreational Drone Registration Struck Down in Court
A federal court rules the FAA's mandatory recreational drone registration violates Section 336...

Self-Driving Cars Approved for Public Tests in Germany
Germany passed a law that allows self-driving car tests on public roads....

Smart Exoskeleton Prevents Elderly Falls
The Active Pelvis Orthosis is a smart exoskeleton that recognizes in just 350...

EduExo DIY Kit Lets You Build Exoskeletons
EduExo is a 3D-printable, Arduino-powered kit for students, hobbyists and educators that...