SlideShare a Scribd company logo
1 of 23
© 2016 Cloudera, Inc. All rights reserved. 1
Malware Tracking at Scale
© 2016 Cloudera, Inc. All rights reserved. 2
About me
• Michael Bentley
• Formerly Director of Research and Response @ Lookout
• Currently working on data mining projects
• KK6WCN
• michael@setnorth.com
© 2016 Cloudera, Inc. All rights reserved. 3
Agenda
• What we are trying to accomplish
• How basic heuristics work
• Where basic heuristics don’t work
• Tracking with pairwise similarity and EMR
• Visualizations to help extract more information
• Mistakes and caveats
© 2016 Cloudera, Inc. All rights reserved. 4
What are we trying to accomplish
• Searching for major versions of software (malware)
• Find ways to detect it with simple heuristics
• Find ways to track it
• Dataset discovery
© 2016 Cloudera, Inc. All rights reserved. 5
Simple heuristics
• Detect on static data
• Detect on analysis stack created metadata
applications analysisacquisition
Hashes
Strings
Who signed
it / certificate
© 2016 Cloudera, Inc. All rights reserved. 6
Simple heuristics - hashes
APK file
Hashes
Icon
Dex File
© 2016 Cloudera, Inc. All rights reserved. 7
Simple heuristics - string detection
• Nice ASCII string delimited by
null bytes
• Malicious class path
• Byte code
• Exact match in one or both
directions of string
• Ctrl + F
Null byte
© 2016 Cloudera, Inc. All rights reserved. 8
Simple heuristics- certificates
• Same
malware
• Different
certificates
© 2016 Cloudera, Inc. All rights reserved. 9
Where simple heuristics are good
• Good for things that don’t change
• Computationally cheap
• About the same scenario for network (IDS) or
application inspection (malware detection)
© 2016 Cloudera, Inc. All rights reserved. 10
Where it’s problematic
• Anything with funding/making money.
• Malware created in Eastern Europe, Asia, Italy (Hacking
Team)
• Mass creation of certificates
• Code taken from Stack Overflow
• Anything with basic string obfuscation
• Hunting for new major versions
© 2016 Cloudera, Inc. All rights reserved. 11
Enter pairwise
similarity
You’re about to see a spreadsheet at a big data
conference
http://gunshowcomic.com/648
© 2016 Cloudera, Inc. All rights reserved. 12
Application pairwise similarity
© 2016 Cloudera, Inc. All rights reserved. 13
Go from pick one
app and rescan
corpus
© 2016 Cloudera, Inc. All rights reserved. 14
Pick one application – Rescan corpus
• Examine one app
• Find heuristic
• Rescan corpus
• Rinse repeat ad infinitum
• Throw people at the problem
http://bit.ly/2a0zcZR
© 2016 Cloudera, Inc. All rights reserved. 15
Decoding what you already have
• Pairwise similarity defines the
relationships for us
• Dots represent unique (SHA1)
applications
• Colors represent major versions of
malware
• Each color is within ~85% match of
code distance
© 2016 Cloudera, Inc. All rights reserved. 16
Clustering and intelligence
APK
APK
APK
APK
APK
APK
APK
Nearest neighbor
95% similar
Cluster 1
85% similar
Cluster 2
85% similar
Cluster 0
< 85% similar
• APKs are nodes and edges
• Clusters are neighborhoods
© 2016 Cloudera, Inc. All rights reserved. 17
Clustering and intelligence
© 2016 Cloudera, Inc. All rights reserved. 18
Clustering versus heuristics
© 2016 Cloudera, Inc. All rights reserved. 19
Evolution of malware over time
• By taking the clustering data and
then overlaying it with the packaged
at data we can watch malware
evolve over time.
• Color represents major version
• Time is a 4 month sliding window
• Shows iterations from malware
writers
© 2016 Cloudera, Inc. All rights reserved. 20
Pairwise problems and options
• Comparing 3500 applications is 12,250,000 operations
• As you bring more applications in, expect to scale EMR cluster or
reduce n.
• You can overmatch on similarity – outlier issue
© 2016 Cloudera, Inc. All rights reserved. 21
Tripping over the bar
• Pairwise similarity for 7k apps is about 5gB.
• So is S3
• Things go bad when you don’t respect the bucket
size
• Troubleshooting CSV sizes is a thing
• Doesn’t work well on small applications
• Temporary files on your local machine that are
70gB cause problems
© 2016 Cloudera, Inc. All rights reserved. 22
Knowledge
• I had never used NetworkX before ~2014
• I had no idea how to go from what we had into a decent format for visualizing this
(GraphML).
• Almost no experience in graph theory before ~2014
• Gilad Lotan had a great PyCon talk which got me started. I still reference his talks.
• Gephi is a great shortcut for visualizing in 2D if you aren’t familiar with D3
• Seth Hardy who gave tons of amazing feedback while I was learning
• Jack Urban who proved that it was possible to track applications as a network
• Gensim library is a great way to get started in doing comparisons of applications
• Lots of inspiration from the Defcon 22 OpenDNS talk (theirs is better)
Thank you.

More Related Content

What's hot

Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedSqrrl
 
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...Flink Forward
 
WestJet Customer Presentation
WestJet Customer PresentationWestJet Customer Presentation
WestJet Customer PresentationSplunk
 
Detecting Mobile Malware with Apache Spark with David Pryce
 Detecting Mobile Malware with Apache Spark with David Pryce Detecting Mobile Malware with Apache Spark with David Pryce
Detecting Mobile Malware with Apache Spark with David PryceDatabricks
 
Sqrrl May Webinar: Data-Centric Security
Sqrrl May Webinar: Data-Centric SecuritySqrrl May Webinar: Data-Centric Security
Sqrrl May Webinar: Data-Centric SecuritySqrrl
 
Managing Indicator Deprecation in ThreatConnect
Managing Indicator Deprecation in ThreatConnectManaging Indicator Deprecation in ThreatConnect
Managing Indicator Deprecation in ThreatConnectThreatConnect
 
SplunkLive! Customer Presentation – athenahealth
SplunkLive! Customer Presentation – athenahealthSplunkLive! Customer Presentation – athenahealth
SplunkLive! Customer Presentation – athenahealthSplunk
 
University of Alberta Customer Presentation
University of Alberta Customer PresentationUniversity of Alberta Customer Presentation
University of Alberta Customer PresentationSplunk
 
April 2015 Webinar: Cyber Hunting with Sqrrl
April 2015 Webinar: Cyber Hunting with SqrrlApril 2015 Webinar: Cyber Hunting with Sqrrl
April 2015 Webinar: Cyber Hunting with SqrrlSqrrl
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionSplunk
 
Security Events Logging at Bell with the Elastic Stack
Security Events Logging at Bell with the Elastic StackSecurity Events Logging at Bell with the Elastic Stack
Security Events Logging at Bell with the Elastic StackElasticsearch
 
The Security Industry is Suffering from Fragmentation, What Can Your Organiza...
The Security Industry is Suffering from Fragmentation, What Can Your Organiza...The Security Industry is Suffering from Fragmentation, What Can Your Organiza...
The Security Industry is Suffering from Fragmentation, What Can Your Organiza...ThreatConnect
 
SplunkLive! Cincinnati - Hurricane Labs - Oct 2012
SplunkLive! Cincinnati - Hurricane Labs - Oct 2012SplunkLive! Cincinnati - Hurricane Labs - Oct 2012
SplunkLive! Cincinnati - Hurricane Labs - Oct 2012Splunk
 
Sqrrl 2.0 Launch Webinar
Sqrrl 2.0 Launch WebinarSqrrl 2.0 Launch Webinar
Sqrrl 2.0 Launch WebinarSqrrl
 
SplunkLive! Customer Presentation – Covance Inc"
SplunkLive! Customer Presentation – Covance Inc"SplunkLive! Customer Presentation – Covance Inc"
SplunkLive! Customer Presentation – Covance Inc"Splunk
 
Join2017_Deep Dive_AWS Operations
Join2017_Deep Dive_AWS OperationsJoin2017_Deep Dive_AWS Operations
Join2017_Deep Dive_AWS OperationsLooker
 
Palestra de abertura: Evolução e visão do Elastic Observability
Palestra de abertura: Evolução e visão do Elastic ObservabilityPalestra de abertura: Evolução e visão do Elastic Observability
Palestra de abertura: Evolução e visão do Elastic ObservabilityElasticsearch
 
Splunk @ Adobe
Splunk @ AdobeSplunk @ Adobe
Splunk @ AdobeSplunk
 
Advanced Threat Hunting - Botconf 2017
Advanced Threat Hunting - Botconf 2017Advanced Threat Hunting - Botconf 2017
Advanced Threat Hunting - Botconf 2017Kevin Finley
 

What's hot (20)

SQRRL threat hunting platform
SQRRL threat hunting platformSQRRL threat hunting platform
SQRRL threat hunting platform
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
 
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
 
WestJet Customer Presentation
WestJet Customer PresentationWestJet Customer Presentation
WestJet Customer Presentation
 
Detecting Mobile Malware with Apache Spark with David Pryce
 Detecting Mobile Malware with Apache Spark with David Pryce Detecting Mobile Malware with Apache Spark with David Pryce
Detecting Mobile Malware with Apache Spark with David Pryce
 
Sqrrl May Webinar: Data-Centric Security
Sqrrl May Webinar: Data-Centric SecuritySqrrl May Webinar: Data-Centric Security
Sqrrl May Webinar: Data-Centric Security
 
Managing Indicator Deprecation in ThreatConnect
Managing Indicator Deprecation in ThreatConnectManaging Indicator Deprecation in ThreatConnect
Managing Indicator Deprecation in ThreatConnect
 
SplunkLive! Customer Presentation – athenahealth
SplunkLive! Customer Presentation – athenahealthSplunkLive! Customer Presentation – athenahealth
SplunkLive! Customer Presentation – athenahealth
 
University of Alberta Customer Presentation
University of Alberta Customer PresentationUniversity of Alberta Customer Presentation
University of Alberta Customer Presentation
 
April 2015 Webinar: Cyber Hunting with Sqrrl
April 2015 Webinar: Cyber Hunting with SqrrlApril 2015 Webinar: Cyber Hunting with Sqrrl
April 2015 Webinar: Cyber Hunting with Sqrrl
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Security Events Logging at Bell with the Elastic Stack
Security Events Logging at Bell with the Elastic StackSecurity Events Logging at Bell with the Elastic Stack
Security Events Logging at Bell with the Elastic Stack
 
The Security Industry is Suffering from Fragmentation, What Can Your Organiza...
The Security Industry is Suffering from Fragmentation, What Can Your Organiza...The Security Industry is Suffering from Fragmentation, What Can Your Organiza...
The Security Industry is Suffering from Fragmentation, What Can Your Organiza...
 
SplunkLive! Cincinnati - Hurricane Labs - Oct 2012
SplunkLive! Cincinnati - Hurricane Labs - Oct 2012SplunkLive! Cincinnati - Hurricane Labs - Oct 2012
SplunkLive! Cincinnati - Hurricane Labs - Oct 2012
 
Sqrrl 2.0 Launch Webinar
Sqrrl 2.0 Launch WebinarSqrrl 2.0 Launch Webinar
Sqrrl 2.0 Launch Webinar
 
SplunkLive! Customer Presentation – Covance Inc"
SplunkLive! Customer Presentation – Covance Inc"SplunkLive! Customer Presentation – Covance Inc"
SplunkLive! Customer Presentation – Covance Inc"
 
Join2017_Deep Dive_AWS Operations
Join2017_Deep Dive_AWS OperationsJoin2017_Deep Dive_AWS Operations
Join2017_Deep Dive_AWS Operations
 
Palestra de abertura: Evolução e visão do Elastic Observability
Palestra de abertura: Evolução e visão do Elastic ObservabilityPalestra de abertura: Evolução e visão do Elastic Observability
Palestra de abertura: Evolução e visão do Elastic Observability
 
Splunk @ Adobe
Splunk @ AdobeSplunk @ Adobe
Splunk @ Adobe
 
Advanced Threat Hunting - Botconf 2017
Advanced Threat Hunting - Botconf 2017Advanced Threat Hunting - Botconf 2017
Advanced Threat Hunting - Botconf 2017
 

Viewers also liked

Wrangle 2016: Driving Healthcare Operations with Small Data
Wrangle 2016: Driving Healthcare Operations with Small DataWrangle 2016: Driving Healthcare Operations with Small Data
Wrangle 2016: Driving Healthcare Operations with Small DataWrangleConf
 
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...WrangleConf
 
Wrangle 2016: Staying Hippocratic with High Stakes Data
Wrangle 2016: Staying Hippocratic with High Stakes DataWrangle 2016: Staying Hippocratic with High Stakes Data
Wrangle 2016: Staying Hippocratic with High Stakes DataWrangleConf
 
Sensor Data Wrangling: From Metal to Cloud
Sensor Data Wrangling: From Metal to CloudSensor Data Wrangling: From Metal to Cloud
Sensor Data Wrangling: From Metal to CloudWrangleConf
 
From Science to Product (Company)
From Science to Product (Company)From Science to Product (Company)
From Science to Product (Company)WrangleConf
 
Condense Fact from the Vapor of Nuance
Condense Fact from the Vapor of Nuance Condense Fact from the Vapor of Nuance
Condense Fact from the Vapor of Nuance WrangleConf
 
Wrangle 2016: Data Science for HR
Wrangle 2016: Data Science for HRWrangle 2016: Data Science for HR
Wrangle 2016: Data Science for HRWrangleConf
 
The Unreasonable Effectiveness of Product Sense
The Unreasonable Effectiveness of Product SenseThe Unreasonable Effectiveness of Product Sense
The Unreasonable Effectiveness of Product SenseWrangleConf
 
Data Science in Drug Discovery
Data Science in Drug DiscoveryData Science in Drug Discovery
Data Science in Drug DiscoveryWrangleConf
 
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlowWrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlowWrangleConf
 
A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation WrangleConf
 
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016MLconf
 

Viewers also liked (12)

Wrangle 2016: Driving Healthcare Operations with Small Data
Wrangle 2016: Driving Healthcare Operations with Small DataWrangle 2016: Driving Healthcare Operations with Small Data
Wrangle 2016: Driving Healthcare Operations with Small Data
 
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
Wrangle 2016 - Digital Vulnerability: Characterizing Risks and Contemplating ...
 
Wrangle 2016: Staying Hippocratic with High Stakes Data
Wrangle 2016: Staying Hippocratic with High Stakes DataWrangle 2016: Staying Hippocratic with High Stakes Data
Wrangle 2016: Staying Hippocratic with High Stakes Data
 
Sensor Data Wrangling: From Metal to Cloud
Sensor Data Wrangling: From Metal to CloudSensor Data Wrangling: From Metal to Cloud
Sensor Data Wrangling: From Metal to Cloud
 
From Science to Product (Company)
From Science to Product (Company)From Science to Product (Company)
From Science to Product (Company)
 
Condense Fact from the Vapor of Nuance
Condense Fact from the Vapor of Nuance Condense Fact from the Vapor of Nuance
Condense Fact from the Vapor of Nuance
 
Wrangle 2016: Data Science for HR
Wrangle 2016: Data Science for HRWrangle 2016: Data Science for HR
Wrangle 2016: Data Science for HR
 
The Unreasonable Effectiveness of Product Sense
The Unreasonable Effectiveness of Product SenseThe Unreasonable Effectiveness of Product Sense
The Unreasonable Effectiveness of Product Sense
 
Data Science in Drug Discovery
Data Science in Drug DiscoveryData Science in Drug Discovery
Data Science in Drug Discovery
 
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlowWrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
Wrangle 2016: (Lightning Talk) FizzBuzz in TensorFlow
 
A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation
 
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
 

Similar to Wrangle 2016: Malware Tracking at Scale

Monitoring Attack Surface to Secure DevOps Pipelines
Monitoring Attack Surface to Secure DevOps PipelinesMonitoring Attack Surface to Secure DevOps Pipelines
Monitoring Attack Surface to Secure DevOps PipelinesDenim Group
 
The Times They Are a-Changin’: Domino Applications in the New World of HCL No...
The Times They Are a-Changin’: Domino Applications in the New World of HCL No...The Times They Are a-Changin’: Domino Applications in the New World of HCL No...
The Times They Are a-Changin’: Domino Applications in the New World of HCL No...panagenda
 
Dominoapplikationen im Wandel der Zeit: Alles neu mit HCL Nomad Web
Dominoapplikationen im Wandel der Zeit: Alles neu mit HCL Nomad WebDominoapplikationen im Wandel der Zeit: Alles neu mit HCL Nomad Web
Dominoapplikationen im Wandel der Zeit: Alles neu mit HCL Nomad Webpanagenda
 
Programming languages and techniques for today’s embedded andIoT world
Programming languages and techniques for today’s embedded andIoT worldProgramming languages and techniques for today’s embedded andIoT world
Programming languages and techniques for today’s embedded andIoT worldRogue Wave Software
 
Monitoring Application Attack Surface to Integrate Security into DevOps Pipel...
Monitoring Application Attack Surface to Integrate Security into DevOps Pipel...Monitoring Application Attack Surface to Integrate Security into DevOps Pipel...
Monitoring Application Attack Surface to Integrate Security into DevOps Pipel...Denim Group
 
How enterprises learned to stop worrying and love open source
How enterprises learned to stop worrying and love open sourceHow enterprises learned to stop worrying and love open source
How enterprises learned to stop worrying and love open sourceRogue Wave Software
 
Collaborative security : Securing open source software
Collaborative security : Securing open source softwareCollaborative security : Securing open source software
Collaborative security : Securing open source softwarePriyanka Aash
 
EMC World 2016 - code.10 Jumpstart your Open Source Presence through new Coll...
EMC World 2016 - code.10 Jumpstart your Open Source Presence through new Coll...EMC World 2016 - code.10 Jumpstart your Open Source Presence through new Coll...
EMC World 2016 - code.10 Jumpstart your Open Source Presence through new Coll...{code}
 
Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Cloudera, Inc.
 
(Isc)² secure johannesburg
(Isc)² secure johannesburg (Isc)² secure johannesburg
(Isc)² secure johannesburg Tunde Ogunkoya
 
Rapid software testing and conformance with static code analysis
Rapid software testing and conformance with static code analysisRapid software testing and conformance with static code analysis
Rapid software testing and conformance with static code analysisRogue Wave Software
 
20th Anniversary - OWASP Top 10 2021.pptx
20th Anniversary - OWASP Top 10 2021.pptx20th Anniversary - OWASP Top 10 2021.pptx
20th Anniversary - OWASP Top 10 2021.pptxDedy Hariyadi
 
Webinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph DatabasesWebinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph DatabasesDataStax
 
Couchbase usage at Symantec
Couchbase usage at SymantecCouchbase usage at Symantec
Couchbase usage at Symantecgauravchandna
 
Cyber security - It starts with the embedded system
Cyber security - It starts with the embedded systemCyber security - It starts with the embedded system
Cyber security - It starts with the embedded systemRogue Wave Software
 
AppSphere 15 - How AppDynamics is Shaking up the Synthetic Monitoring Product...
AppSphere 15 - How AppDynamics is Shaking up the Synthetic Monitoring Product...AppSphere 15 - How AppDynamics is Shaking up the Synthetic Monitoring Product...
AppSphere 15 - How AppDynamics is Shaking up the Synthetic Monitoring Product...AppDynamics
 
Stack overflow code_laundering
Stack overflow code_launderingStack overflow code_laundering
Stack overflow code_launderingFoutse Khomh
 

Similar to Wrangle 2016: Malware Tracking at Scale (20)

Monitoring Attack Surface to Secure DevOps Pipelines
Monitoring Attack Surface to Secure DevOps PipelinesMonitoring Attack Surface to Secure DevOps Pipelines
Monitoring Attack Surface to Secure DevOps Pipelines
 
The Times They Are a-Changin’: Domino Applications in the New World of HCL No...
The Times They Are a-Changin’: Domino Applications in the New World of HCL No...The Times They Are a-Changin’: Domino Applications in the New World of HCL No...
The Times They Are a-Changin’: Domino Applications in the New World of HCL No...
 
Dominoapplikationen im Wandel der Zeit: Alles neu mit HCL Nomad Web
Dominoapplikationen im Wandel der Zeit: Alles neu mit HCL Nomad WebDominoapplikationen im Wandel der Zeit: Alles neu mit HCL Nomad Web
Dominoapplikationen im Wandel der Zeit: Alles neu mit HCL Nomad Web
 
Programming languages and techniques for today’s embedded andIoT world
Programming languages and techniques for today’s embedded andIoT worldProgramming languages and techniques for today’s embedded andIoT world
Programming languages and techniques for today’s embedded andIoT world
 
Monitoring Application Attack Surface to Integrate Security into DevOps Pipel...
Monitoring Application Attack Surface to Integrate Security into DevOps Pipel...Monitoring Application Attack Surface to Integrate Security into DevOps Pipel...
Monitoring Application Attack Surface to Integrate Security into DevOps Pipel...
 
How enterprises learned to stop worrying and love open source
How enterprises learned to stop worrying and love open sourceHow enterprises learned to stop worrying and love open source
How enterprises learned to stop worrying and love open source
 
Collaborative security : Securing open source software
Collaborative security : Securing open source softwareCollaborative security : Securing open source software
Collaborative security : Securing open source software
 
EMC World 2016 - code.10 Jumpstart your Open Source Presence through new Coll...
EMC World 2016 - code.10 Jumpstart your Open Source Presence through new Coll...EMC World 2016 - code.10 Jumpstart your Open Source Presence through new Coll...
EMC World 2016 - code.10 Jumpstart your Open Source Presence through new Coll...
 
Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
(Isc)² secure johannesburg
(Isc)² secure johannesburg (Isc)² secure johannesburg
(Isc)² secure johannesburg
 
Rails tools
Rails toolsRails tools
Rails tools
 
Découvrez le Rugged DevOps
Découvrez le Rugged DevOpsDécouvrez le Rugged DevOps
Découvrez le Rugged DevOps
 
Rapid software testing and conformance with static code analysis
Rapid software testing and conformance with static code analysisRapid software testing and conformance with static code analysis
Rapid software testing and conformance with static code analysis
 
20th Anniversary - OWASP Top 10 2021.pptx
20th Anniversary - OWASP Top 10 2021.pptx20th Anniversary - OWASP Top 10 2021.pptx
20th Anniversary - OWASP Top 10 2021.pptx
 
Webinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph DatabasesWebinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph Databases
 
Couchbase usage at Symantec
Couchbase usage at SymantecCouchbase usage at Symantec
Couchbase usage at Symantec
 
Cyber security - It starts with the embedded system
Cyber security - It starts with the embedded systemCyber security - It starts with the embedded system
Cyber security - It starts with the embedded system
 
AppSphere 15 - How AppDynamics is Shaking up the Synthetic Monitoring Product...
AppSphere 15 - How AppDynamics is Shaking up the Synthetic Monitoring Product...AppSphere 15 - How AppDynamics is Shaking up the Synthetic Monitoring Product...
AppSphere 15 - How AppDynamics is Shaking up the Synthetic Monitoring Product...
 
Stack overflow code_laundering
Stack overflow code_launderingStack overflow code_laundering
Stack overflow code_laundering
 

Recently uploaded

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Wrangle 2016: Malware Tracking at Scale

  • 1. © 2016 Cloudera, Inc. All rights reserved. 1 Malware Tracking at Scale
  • 2. © 2016 Cloudera, Inc. All rights reserved. 2 About me • Michael Bentley • Formerly Director of Research and Response @ Lookout • Currently working on data mining projects • KK6WCN • michael@setnorth.com
  • 3. © 2016 Cloudera, Inc. All rights reserved. 3 Agenda • What we are trying to accomplish • How basic heuristics work • Where basic heuristics don’t work • Tracking with pairwise similarity and EMR • Visualizations to help extract more information • Mistakes and caveats
  • 4. © 2016 Cloudera, Inc. All rights reserved. 4 What are we trying to accomplish • Searching for major versions of software (malware) • Find ways to detect it with simple heuristics • Find ways to track it • Dataset discovery
  • 5. © 2016 Cloudera, Inc. All rights reserved. 5 Simple heuristics • Detect on static data • Detect on analysis stack created metadata applications analysisacquisition Hashes Strings Who signed it / certificate
  • 6. © 2016 Cloudera, Inc. All rights reserved. 6 Simple heuristics - hashes APK file Hashes Icon Dex File
  • 7. © 2016 Cloudera, Inc. All rights reserved. 7 Simple heuristics - string detection • Nice ASCII string delimited by null bytes • Malicious class path • Byte code • Exact match in one or both directions of string • Ctrl + F Null byte
  • 8. © 2016 Cloudera, Inc. All rights reserved. 8 Simple heuristics- certificates • Same malware • Different certificates
  • 9. © 2016 Cloudera, Inc. All rights reserved. 9 Where simple heuristics are good • Good for things that don’t change • Computationally cheap • About the same scenario for network (IDS) or application inspection (malware detection)
  • 10. © 2016 Cloudera, Inc. All rights reserved. 10 Where it’s problematic • Anything with funding/making money. • Malware created in Eastern Europe, Asia, Italy (Hacking Team) • Mass creation of certificates • Code taken from Stack Overflow • Anything with basic string obfuscation • Hunting for new major versions
  • 11. © 2016 Cloudera, Inc. All rights reserved. 11 Enter pairwise similarity You’re about to see a spreadsheet at a big data conference http://gunshowcomic.com/648
  • 12. © 2016 Cloudera, Inc. All rights reserved. 12 Application pairwise similarity
  • 13. © 2016 Cloudera, Inc. All rights reserved. 13 Go from pick one app and rescan corpus
  • 14. © 2016 Cloudera, Inc. All rights reserved. 14 Pick one application – Rescan corpus • Examine one app • Find heuristic • Rescan corpus • Rinse repeat ad infinitum • Throw people at the problem http://bit.ly/2a0zcZR
  • 15. © 2016 Cloudera, Inc. All rights reserved. 15 Decoding what you already have • Pairwise similarity defines the relationships for us • Dots represent unique (SHA1) applications • Colors represent major versions of malware • Each color is within ~85% match of code distance
  • 16. © 2016 Cloudera, Inc. All rights reserved. 16 Clustering and intelligence APK APK APK APK APK APK APK Nearest neighbor 95% similar Cluster 1 85% similar Cluster 2 85% similar Cluster 0 < 85% similar • APKs are nodes and edges • Clusters are neighborhoods
  • 17. © 2016 Cloudera, Inc. All rights reserved. 17 Clustering and intelligence
  • 18. © 2016 Cloudera, Inc. All rights reserved. 18 Clustering versus heuristics
  • 19. © 2016 Cloudera, Inc. All rights reserved. 19 Evolution of malware over time • By taking the clustering data and then overlaying it with the packaged at data we can watch malware evolve over time. • Color represents major version • Time is a 4 month sliding window • Shows iterations from malware writers
  • 20. © 2016 Cloudera, Inc. All rights reserved. 20 Pairwise problems and options • Comparing 3500 applications is 12,250,000 operations • As you bring more applications in, expect to scale EMR cluster or reduce n. • You can overmatch on similarity – outlier issue
  • 21. © 2016 Cloudera, Inc. All rights reserved. 21 Tripping over the bar • Pairwise similarity for 7k apps is about 5gB. • So is S3 • Things go bad when you don’t respect the bucket size • Troubleshooting CSV sizes is a thing • Doesn’t work well on small applications • Temporary files on your local machine that are 70gB cause problems
  • 22. © 2016 Cloudera, Inc. All rights reserved. 22 Knowledge • I had never used NetworkX before ~2014 • I had no idea how to go from what we had into a decent format for visualizing this (GraphML). • Almost no experience in graph theory before ~2014 • Gilad Lotan had a great PyCon talk which got me started. I still reference his talks. • Gephi is a great shortcut for visualizing in 2D if you aren’t familiar with D3 • Seth Hardy who gave tons of amazing feedback while I was learning • Jack Urban who proved that it was possible to track applications as a network • Gensim library is a great way to get started in doing comparisons of applications • Lots of inspiration from the Defcon 22 OpenDNS talk (theirs is better)