Knowledge and identification of Malware binaries is a crucial part of detection and incident response. There was a time when using MD5s was sufficient to ID binaries. The reverse engineering analysis conducted once would be useful anytime that same MD5 hash was seen again. This has rapidly changed in recent years. Polymorphic samples of the same specimen change the file hash (MD5, SHAx etc) without much effort by the attacker. Also, cyber criminals and advanced adversaries reuse their codebase to create newer versions of their malware, but changes in the file hash disallow any opportunity to connect and leverage previous analyses of similar samples by defenders. This gives them an asymmetric advantage. In recent years, there has been research into “similarity metrics”― methods that can identify whether, or to what degree, two malware binaries are similar to each other. Imphash, ssdeep and sdhash are examples of such techniques. In this talk, Bhavna will review which of these techniques is more suitable for evaluating similarities in code for APT related samples. This presentation will take a data analytics approach. We will look at binary samples from APT events from Jan- Mar 2015 and create clusters of similar binaries based on each of the three similarity metrics under consideration. We will then evaluate the accuracy of the clusters and examine their implications on the effectiveness of each technique in identifying provenance of an APT related binary. This can aid Incident responders in connecting otherwise disparate infections in their environment to a single threat group and apply past analyses of the abilities and motivations of that adversary to conduct more effective response.