The Transform Technology Summits start October 13th with Low-Code/No Code: Enabling Enterprise Agility. Register now!
The last decade’s growing interest in deep learning was triggered by the proven capacity of neural networks in computer vision tasks. If you train a neural network with enough labeled photos of cats and dogs, it will be able to find recurring patterns in each category and classify unseen images with decent accuracy.
What else can you do with an image classifier?
In 2019, a group of cybersecurity researchers wondered if they could treat security threat detection as an image classification problem. Their intuition proved to be well-placed, and they were able to create a machine learning model that could detect malware based on images created from the content of application files. A year later, the same technique was used to develop a machine learning system that detects phishing websites.
The combination of binary visualization and machine learning is a powerful technique that can provide new solutions to old problems. It is showing promise in cybersecurity, but it could also be applied to other domains.
Detecting malware with deep learning
The traditional way to detect malware is to search files for known signatures of malicious payloads. Malware detectors maintain a database of virus definitions which include opcode sequences or code snippets, and they search new files for the presence of these signatures. Unfortunately, malware developers can easily circumvent such detection methods using different techniques such as obfuscating their code or using polymorphism techniques to mutate their code at runtime.
Dynamic analysis tools try to detect malicious behavior during runtime, but they are slow and require the setup of a sandbox environment to test suspicious programs.
In recent years, researchers have also tried a range of machine learning techniques to detect malware. These ML models have managed to make progress on some of the challenges of malware detection, including code obfuscation. But they present new challenges, including the need to learn too many features and a virtual environment to analyze the target samples.
Binary visualization can redefine malware detection by turning it into a computer vision problem. In this methodology, files are run through algorithms that transform binary and ASCII values to color codes.
In a paper published in 2019, researchers at the University of Plymouth and the University of Peloponnese showed that when benign and malicious files were visualized using this method, new patterns emerge that separate malicious and safe files. These differences would have gone unnoticed using classic malware detection methods.