Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Wird geladen in …3
×
1 von 23

Wrangle 2016: Malware Tracking at Scale

1

Teilen

Herunterladen, um offline zu lesen

By Michael Bentley, Lookout

Historically, mobile-device malware detection has required security researchers to write a heuristic, then scan binaries for a match. Rinse, recycle, and repeat until the entire malware family can be detected. This approach has been effective, but it does not scale to Lookout’s challenge of analyzing more than 30 million applications. In this session, Michael explains how Lookout took an entirely different approach: using graph data modeling techniques. One significant outcome of this approach is a new data model that has the powerful ability to track variants of malware that are under active development. This model also allows Lookout to extract more metadata about malware families through the discovery of relationships that were previously unknown.

Weitere Verwandte Inhalte

Das Könnte Ihnen Auch Gefallen

Ähnliche Bücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Ähnliche Hörbücher

Kostenlos mit einer 30-tägigen Testversion von Scribd

Alle anzeigen

Wrangle 2016: Malware Tracking at Scale

  1. 1. © 2016 Cloudera, Inc. All rights reserved. 1 Malware Tracking at Scale
  2. 2. © 2016 Cloudera, Inc. All rights reserved. 2 About me • Michael Bentley • Formerly Director of Research and Response @ Lookout • Currently working on data mining projects • KK6WCN • michael@setnorth.com
  3. 3. © 2016 Cloudera, Inc. All rights reserved. 3 Agenda • What we are trying to accomplish • How basic heuristics work • Where basic heuristics don’t work • Tracking with pairwise similarity and EMR • Visualizations to help extract more information • Mistakes and caveats
  4. 4. © 2016 Cloudera, Inc. All rights reserved. 4 What are we trying to accomplish • Searching for major versions of software (malware) • Find ways to detect it with simple heuristics • Find ways to track it • Dataset discovery
  5. 5. © 2016 Cloudera, Inc. All rights reserved. 5 Simple heuristics • Detect on static data • Detect on analysis stack created metadata applications analysisacquisition Hashes Strings Who signed it / certificate
  6. 6. © 2016 Cloudera, Inc. All rights reserved. 6 Simple heuristics - hashes APK file Hashes Icon Dex File
  7. 7. © 2016 Cloudera, Inc. All rights reserved. 7 Simple heuristics - string detection • Nice ASCII string delimited by null bytes • Malicious class path • Byte code • Exact match in one or both directions of string • Ctrl + F Null byte
  8. 8. © 2016 Cloudera, Inc. All rights reserved. 8 Simple heuristics- certificates • Same malware • Different certificates
  9. 9. © 2016 Cloudera, Inc. All rights reserved. 9 Where simple heuristics are good • Good for things that don’t change • Computationally cheap • About the same scenario for network (IDS) or application inspection (malware detection)
  10. 10. © 2016 Cloudera, Inc. All rights reserved. 10 Where it’s problematic • Anything with funding/making money. • Malware created in Eastern Europe, Asia, Italy (Hacking Team) • Mass creation of certificates • Code taken from Stack Overflow • Anything with basic string obfuscation • Hunting for new major versions
  11. 11. © 2016 Cloudera, Inc. All rights reserved. 11 Enter pairwise similarity You’re about to see a spreadsheet at a big data conference http://gunshowcomic.com/648
  12. 12. © 2016 Cloudera, Inc. All rights reserved. 12 Application pairwise similarity
  13. 13. © 2016 Cloudera, Inc. All rights reserved. 13 Go from pick one app and rescan corpus
  14. 14. © 2016 Cloudera, Inc. All rights reserved. 14 Pick one application – Rescan corpus • Examine one app • Find heuristic • Rescan corpus • Rinse repeat ad infinitum • Throw people at the problem http://bit.ly/2a0zcZR
  15. 15. © 2016 Cloudera, Inc. All rights reserved. 15 Decoding what you already have • Pairwise similarity defines the relationships for us • Dots represent unique (SHA1) applications • Colors represent major versions of malware • Each color is within ~85% match of code distance
  16. 16. © 2016 Cloudera, Inc. All rights reserved. 16 Clustering and intelligence APK APK APK APK APK APK APK Nearest neighbor 95% similar Cluster 1 85% similar Cluster 2 85% similar Cluster 0 < 85% similar • APKs are nodes and edges • Clusters are neighborhoods
  17. 17. © 2016 Cloudera, Inc. All rights reserved. 17 Clustering and intelligence
  18. 18. © 2016 Cloudera, Inc. All rights reserved. 18 Clustering versus heuristics
  19. 19. © 2016 Cloudera, Inc. All rights reserved. 19 Evolution of malware over time • By taking the clustering data and then overlaying it with the packaged at data we can watch malware evolve over time. • Color represents major version • Time is a 4 month sliding window • Shows iterations from malware writers
  20. 20. © 2016 Cloudera, Inc. All rights reserved. 20 Pairwise problems and options • Comparing 3500 applications is 12,250,000 operations • As you bring more applications in, expect to scale EMR cluster or reduce n. • You can overmatch on similarity – outlier issue
  21. 21. © 2016 Cloudera, Inc. All rights reserved. 21 Tripping over the bar • Pairwise similarity for 7k apps is about 5gB. • So is S3 • Things go bad when you don’t respect the bucket size • Troubleshooting CSV sizes is a thing • Doesn’t work well on small applications • Temporary files on your local machine that are 70gB cause problems
  22. 22. © 2016 Cloudera, Inc. All rights reserved. 22 Knowledge • I had never used NetworkX before ~2014 • I had no idea how to go from what we had into a decent format for visualizing this (GraphML). • Almost no experience in graph theory before ~2014 • Gilad Lotan had a great PyCon talk which got me started. I still reference his talks. • Gephi is a great shortcut for visualizing in 2D if you aren’t familiar with D3 • Seth Hardy who gave tons of amazing feedback while I was learning • Jack Urban who proved that it was possible to track applications as a network • Gensim library is a great way to get started in doing comparisons of applications • Lots of inspiration from the Defcon 22 OpenDNS talk (theirs is better)
  23. 23. Thank you.

×