Abstract
Binary diffing is the problem of determining whether two binary programs originate from the same source code. Binary diffing tools are used to identify malware, plagiarism, or code theft. Many instances of binary diffing assume an adversarial setting, where a malicious actor transforms binary code by changing the compiler. Traditional diffing techniques rely on statistical similarity analysis, often leveraging stochastic models. However, recent studies have shown that these classifiers perform poorly in the face of adversarial code transformations. To mitigate this scenario, this paper introduces a new diffing technique that is resilient against current obfuscation approaches. We propose comparing executables by matching their library signatures (libsig). A program’s library signature is the sequence of calls it makes to functions outside its .text section. The proposed classifier, LibSIG, is faster than off-the-shelf alternatives, such as ltrace, and, like it, works on stripped binaries or binaries running with address space layout randomization (ASLR) enabled. Furthermore, in contrast to ltrace, LibSIG can be engineered to detect even library calls that bypass conventional application binary interface patterns. Our experiments on the GNU Core Utilities demonstrate that LibSIG remains robust against obfuscators like Khaos and ollvm, as well as typical optimization patterns, outperforming binary diffing approaches such as SAFE, BinDiff, or Asm2Vec.