Talk

Command-line Obfuscation Detection Using Large Language Models

conf 2023-11-03 11:45 – 12:30 La Marive EN

Command-line Obfuscation Detection Using Large Language Models

Command-line obfuscation is one of the most common methods adversaries use to avoid detection. These methods include changing the case of letters in the paths to binaries, adding symbols that are ignored by the command-line interpreter, using homoglyphs or storing the arguments in variables and re-ordering them on the command-line, etc. Most security solutions use signatures to detect state-of-the-art malware requiring threat analysts to create an exhaustive enumeration of signatures for obfuscation techniques. Instead of signatures, we utilize an NLP approach that can generalize and detect previously unseen obfuscation techniques.

The proposed NLP method consists of two components, a tokenizer and a classifier. The tokenizer augments the command lines and transforms them into a low-dimensional representation without losing information about the underlying obfuscation technique. Since the command line has a different structure than natural language, the pre-trained classification model is fine-tuned on samples observed in the wild.

The experiments demonstrate that the approach yields high precision and recall with a small number of false positives. Additionally, it uncovered new hard-to-detect obfuscation techniques that rely on pre-installed software on the operating system. The novel detections include new strains of the Raspberry Robin worm on Windows 11 that use a highly obfuscated execution of wt.exe or Gamarue that uses rundll32.exe to execute its obfuscated payload.