SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
BinaryPig: Scalable Binary Data
Extraction in Hadoop
Created By:

Jason Trost, Telvis Calhoun, Zach Hanif
Bringing	
  data	
  science	
  to	
  cyber	
  security,	
  
allowing	
  you	
  to	
  sense,	
  analyze	
  and	
  act	
  in	
  
real	
  5me.	
  
	
  
Agenda	
  
•  • The Problem
•  • BinaryPig Architecture
•  • Code and Implementation Details
•  • Analysis and Results
•  • Demo
•  • Wrap-Up
•  A
Background	
  
2.5 years
20M samples
9.5TB of malware
• 
Malware data mining is useful
•  • Threat intel feeds

• Contextual enrichment on events

• Machine learning models
Pre-­‐BinaryPig:	
  Architecture	
  
Pre-­‐BinaryPig:	
  Storage	
  Issues	
  
•  • We kept running out of disk!
•  • We lost samples when NFS nodes failed.
Pre-­‐BinaryPig	
  -­‐	
  Processing	
  Issues	
  
•  • No Data Locality.
•  • Node failures were catastrophic.
•  • Hard to add new analysis scripts.
Pre-­‐Binary	
  Pig	
  -­‐	
  Data	
  Explora=on	
  Issues	
  
•  • How can I share my findings for greater fame and glory?
•  • Create a table schema for every analysis script?
•  • RDBMS failure is worse than zombie apocalypse.
We	
  needed	
  a	
  system	
  that...	
  
•  • Scales to our historical data 
•  • Recovers from failures 
•  • Grows through scripting 
•  • Supports dynamic schemas 
•  • Searchable via the web
BinaryPig

FRAMEWORK FOR
PROCESSING SMALL
BINARY FILES USING
APACHE HADOOP AND
APACHE PIG
Bi
BinaryPig	
  
•  • Simple DSL
•  • Pluggable analytics
•  • Plays nice with existing tools
•  • Enables rapid iteration
BinaryPig	
  	
  
BinaryPig	
  -­‐	
  Storage	
  
•  • HDFS, scalable, replicated
•  • Aggregate malware samples into sequence files
BinaryPig	
  -­‐	
  Processing	
  
•  • Hadoop - robust, distributed, with data locality
•  • Apache Pig - Extensible, simple
BinaryPig	
  -­‐	
  Results	
  Explora=on	
  
•  • UI - turns your grandma into a data scientist
•  • Elasticsearch - schemaless, replicated, awesome
Yet	
  Another	
  Framework?	
  
•  Malware	
  tools	
  didn't	
  scale	
  
•  Hadoop	
  does	
  not	
  play	
  well	
  with	
  small	
  binary	
  
files	
  
•  Hadoop	
  did	
  not	
  integrate	
  exis5ng	
  malware	
  
analysis	
  tools	
  
	
  
Code	
  and	
  Implementa5on	
  Details	
  
BinaryPig	
  is	
  easy	
  to	
  use!	
  
BinaryPig	
  Ingest	
  Tools	
  
•  Generate	
  sequence	
  file	
  from	
  directory	
  
containing	
  malware	
  samples	
  
	
  
./bin/dir_to_sequencefile <malwareDir> <hdfsOutputFile>
•  Generate	
  sequence	
  file	
  from	
  archive	
  
	
  
./bin/archive_to_sequencefile <archive>
<hdfsOutputFile>
BinaryPig	
  Loaders	
  
•  Converts	
  raw	
  data	
  to	
  a	
  tuple	
  
•  Execu5ng	
  Loader	
  
o  Executes	
  a	
  specific	
  script/program	
  on	
  a	
  file	
  
wriIen	
  to	
  a	
  logical	
  path	
  
o  Example:	
  Hashing	
  
•  Daemon	
  Loader	
  
o  Writes	
  binaries	
  to	
  a	
  path,	
  and	
  provides	
  those	
  
paths	
  to	
  an	
  already	
  running	
  analysis	
  process	
  
o  Example:	
  Clamd	
  
	
  
Op=miza=ons	
  in	
  BinaryPig	
  
•  To	
  leverage	
  pre-­‐exis5ng	
  tools,	
  we	
  had	
  to	
  write	
  
malware	
  binaries	
  to	
  the	
  local	
  filesystem	
  on	
  the	
  
worker	
  nodes	
  
o  Note:	
  local	
  copy,	
  not	
  network	
  copy	
  
o  We	
  op5mized	
  this	
  to	
  use	
  /dev/shm/	
  instead	
  
•  Quick	
  scripts	
  are	
  great	
  for	
  rapid	
  itera5on,	
  but...	
  
o  Interpreter	
  startup	
  5me	
  can	
  dominate	
  execu5on	
  5me	
  
o  Crea5ng	
  small,	
  long	
  running,	
  analy5c	
  daemons	
  provides	
  a	
  
huge	
  speedup	
  for	
  frequently	
  used	
  tasks	
  
o  i.e.	
  the	
  clamscand	
  model	
  of	
  execu5on	
  
	
  
BinaryPig:	
  Loader	
  Implementa=ons	
  
•  Generic Script Loader
•  Generic Daemon Loader
•  ClamAV Loader
•  Yara Loader
•  Hashing Loader
strings.sh:	
  
	
  
#!/bin/bash
strings "$@"
BinaryPig:	
  Scrip=ng	
  
strings.pig:	
  
	
  
define Loader com.endgame.binarypig.loaders.ExecutingTextLoader;
data = LOAD '$INPUT' USING Loader('strings.sh');
DUMP data;
BinaryPig	
  supports	
  non-­‐PE32	
  files	
  
•  Handles more than just malware...
o  Image analysis
o  PDF data extraction
o  APK extraction
o  Any small binary files
Web	
  Interface	
  
Analysis	
  and	
  Results	
  
Malware	
  Census	
  
•  •	
  20	
  Million	
  unique	
  binaries	
  
o  •	
  ~94%	
  PE	
  format	
  
o  •	
  ~6%	
  are	
  mostly	
  Android	
  APK's	
  
•  •	
  5	
  hours	
  to	
  run	
  historical	
  set	
  
General	
  Findings	
  
Feature	
  Extrac=on	
  
•  Our	
  core	
  mo5va5on	
  was	
  to	
  dras5cally	
  improve	
  the	
  
experience	
  of	
  valida5ng	
  research.	
  
•  Packer	
  iden5fica5on	
  
o  Overall	
  and	
  Sec5onal	
  Entropy	
  
o  Kolmogorov	
  Complexity	
  
o  Sec5on	
  and	
  resource	
  names	
  
o  Sec5on	
  flags	
  
•  Import	
  tables	
  
•  Func5on	
  Calls	
  
•  Resource	
  hashes	
  and	
  subfeatures	
  
Feature	
  Depth	
  
•  PEHeaders	
  are	
  shallow	
  
o  Easy	
  to	
  manipulate	
  
o  Less	
  resolu5on	
  than	
  reverse	
  engineering	
  features	
  
o  File	
  metadata	
  is	
  also	
  low	
  resolu5on	
  
•  Headers	
  provide	
  excellent	
  fast	
  features	
  
•  Headers	
  are	
  oben	
  ignored	
  
•  Work	
  the	
  analysis	
  around	
  the	
  feature	
  resolu5on	
  
o  Ignore	
  5ght	
  clusters,	
  go	
  for	
  wide	
  ones	
  
o  Triage,	
  not	
  true	
  classifica5on	
  
Clustering	
  Results	
  
•  Triage	
  for	
  dynamic	
  analysis	
  winnowing	
  
•  Largest	
  cluster:	
  377,882	
  samples	
  
o  Three	
  malware	
  families	
  contained	
  within	
  	
  
o  Second	
  largest:	
  124,894	
  samples	
  
•  Valida5on	
  is	
  tricky	
  
o  Manual	
  valida5on	
  cannot	
  be	
  en5rely	
  avoided	
  
o  Cluster	
  meanings	
  change	
  with	
  feature	
  sets	
  
o  Cannot	
  just	
  go	
  off	
  of	
  AV	
  results	
  
ICO	
  Extrac=on	
  
Icon	
  Features	
  
•  Pixel	
  based	
  features	
  
o  Brightness	
  	
  
o  Color	
  values	
  
o  Pixel	
  density	
  
•  Cryptographic	
  and	
  fuzzy	
  hashing	
  
o  Perceptual	
  hashes	
  
•  Edge	
  detec5on	
  
Icon	
  Results	
  
•  Icon	
  clustering	
  	
  
o  Groups	
  do	
  not	
  just	
  include	
  family	
  lines	
  
o  Copycat	
  malware	
  is	
  shown	
  as	
  well	
  
o  Clear	
  indica5ons	
  of	
  malicious	
  intent	
  
•  Method	
  of	
  infec5on	
  can	
  be	
  extrapolated	
  
o  Phishing	
  
o  Obfuscated	
  executables	
  
o  Adware	
  (more	
  than	
  we	
  expected)	
  
o  False	
  posi5ves	
  -­‐	
  popular	
  sobware	
  detec5on	
  
Lessons	
  Learned	
  
•  Feature	
  Selec5on	
  
o  Over	
  500	
  features	
  in	
  PEheader	
  alone	
  
o  Abundance	
  of	
  features	
  requires	
  pruning	
  
•  Interpreta5on	
  and	
  Valida5on	
  of	
  Results	
  
o  Manual	
  valida5on	
  is	
  an	
  unfortunate	
  reality	
  
o  Care	
  has	
  to	
  be	
  taken	
  to	
  ensure	
  that	
  unsupervised	
  
learning	
  provides	
  meaningful	
  results	
  
DEMO	
  
WRAP	
  UP	
  
•  Rapid Iteration
•  Feature extraction
•  Clustering analysis for rapid malware triage
•  Enables weekly AV scans with latest signatures over
previous malware.
•  Created binary classifier to improve sample collection
and categorize new samples
So	
  What?	
  
Future	
  work	
  
•  • Compatibility with Pig 0.10.* and 0.11.*
•  • EC2 tutorial
•  • More examples/starter scripts
o  • Inclusion of some of our Mahout tasks
o  • Open source process for that is moving forward
•  • Better error logging and handling
o  Messages should be stored in a separate DB
•  • Easier deployments 
o  • Analytic daemons 
o  • Dependency libs
o  • Fabric/Salt/Puppet/Chef
BinaryPig	
  is	
  Open	
  Source!	
  
https://github.com/endgameinc/binarypig

Apache 2 License
We	
  are	
  hiring!	
  
http://endgame.com/careers
QUESTIONS	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...Big Data Spain
 
Red Team Apocalypse
Red Team ApocalypseRed Team Apocalypse
Red Team ApocalypseBeau Bullock
 
Wipro Customer Presentation
Wipro Customer PresentationWipro Customer Presentation
Wipro Customer PresentationSplunk
 
Big Data security: Facing the challenge by Carlos Gómez at Big Data Spain 2017
Big Data security: Facing the challenge by Carlos Gómez at Big Data Spain 2017Big Data security: Facing the challenge by Carlos Gómez at Big Data Spain 2017
Big Data security: Facing the challenge by Carlos Gómez at Big Data Spain 2017Big Data Spain
 
The future of Hadoop security and its evolution by Alejandro González at Big ...
The future of Hadoop security and its evolution by Alejandro González at Big ...The future of Hadoop security and its evolution by Alejandro González at Big ...
The future of Hadoop security and its evolution by Alejandro González at Big ...Big Data Spain
 
Data Onboarding Breakout Session
Data Onboarding Breakout SessionData Onboarding Breakout Session
Data Onboarding Breakout SessionSplunk
 
Cisco and Splunk: Under the Hood of Cisco IT Breakout Session
Cisco and Splunk: Under the Hood of Cisco IT Breakout SessionCisco and Splunk: Under the Hood of Cisco IT Breakout Session
Cisco and Splunk: Under the Hood of Cisco IT Breakout SessionSplunk
 
Splunk workshop-Machine Data 101
Splunk workshop-Machine Data 101Splunk workshop-Machine Data 101
Splunk workshop-Machine Data 101Splunk
 
EDR, ETDR, Next Gen AV is all the rage, so why am I ENRAGED?
EDR, ETDR, Next Gen AV is all the rage, so why am I ENRAGED?EDR, ETDR, Next Gen AV is all the rage, so why am I ENRAGED?
EDR, ETDR, Next Gen AV is all the rage, so why am I ENRAGED?Michael Gough
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoopNiel Dunnage
 
(SACON) Sudarshan Pisupati & Sahir Hidayatullah - active deception sacon
(SACON) Sudarshan Pisupati & Sahir Hidayatullah - active deception sacon(SACON) Sudarshan Pisupati & Sahir Hidayatullah - active deception sacon
(SACON) Sudarshan Pisupati & Sahir Hidayatullah - active deception saconPriyanka Aash
 
Drive more value through data source and use case optimization
Drive more value through data source and use case optimization Drive more value through data source and use case optimization
Drive more value through data source and use case optimization Splunk
 
Splunk for Developers
Splunk for DevelopersSplunk for Developers
Splunk for DevelopersSplunk
 
Splunk .conf18 Updates, Config Add-on, SplDevOps
Splunk .conf18 Updates, Config Add-on, SplDevOpsSplunk .conf18 Updates, Config Add-on, SplDevOps
Splunk .conf18 Updates, Config Add-on, SplDevOpsHarry McLaren
 
Splunk and Cisco UCS Breakout Session
Splunk and Cisco UCS Breakout SessionSplunk and Cisco UCS Breakout Session
Splunk and Cisco UCS Breakout SessionSplunk
 
Workshop splunk 6.5-saint-louis-mo
Workshop splunk 6.5-saint-louis-moWorkshop splunk 6.5-saint-louis-mo
Workshop splunk 6.5-saint-louis-moMohamad Hassan
 
Virtual Splunk User Group - Phantom Workbook Automation & Threat Hunting with...
Virtual Splunk User Group - Phantom Workbook Automation & Threat Hunting with...Virtual Splunk User Group - Phantom Workbook Automation & Threat Hunting with...
Virtual Splunk User Group - Phantom Workbook Automation & Threat Hunting with...Harry McLaren
 
Malware Management - HouSecCon 2014
Malware Management - HouSecCon 2014Malware Management - HouSecCon 2014
Malware Management - HouSecCon 2014Michael Gough
 
Splunk Enterpise for Information Security Hands-On
Splunk Enterpise for Information Security Hands-OnSplunk Enterpise for Information Security Hands-On
Splunk Enterpise for Information Security Hands-OnSplunk
 

Was ist angesagt? (20)

Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
 
Apache Spot
Apache SpotApache Spot
Apache Spot
 
Red Team Apocalypse
Red Team ApocalypseRed Team Apocalypse
Red Team Apocalypse
 
Wipro Customer Presentation
Wipro Customer PresentationWipro Customer Presentation
Wipro Customer Presentation
 
Big Data security: Facing the challenge by Carlos Gómez at Big Data Spain 2017
Big Data security: Facing the challenge by Carlos Gómez at Big Data Spain 2017Big Data security: Facing the challenge by Carlos Gómez at Big Data Spain 2017
Big Data security: Facing the challenge by Carlos Gómez at Big Data Spain 2017
 
The future of Hadoop security and its evolution by Alejandro González at Big ...
The future of Hadoop security and its evolution by Alejandro González at Big ...The future of Hadoop security and its evolution by Alejandro González at Big ...
The future of Hadoop security and its evolution by Alejandro González at Big ...
 
Data Onboarding Breakout Session
Data Onboarding Breakout SessionData Onboarding Breakout Session
Data Onboarding Breakout Session
 
Cisco and Splunk: Under the Hood of Cisco IT Breakout Session
Cisco and Splunk: Under the Hood of Cisco IT Breakout SessionCisco and Splunk: Under the Hood of Cisco IT Breakout Session
Cisco and Splunk: Under the Hood of Cisco IT Breakout Session
 
Splunk workshop-Machine Data 101
Splunk workshop-Machine Data 101Splunk workshop-Machine Data 101
Splunk workshop-Machine Data 101
 
EDR, ETDR, Next Gen AV is all the rage, so why am I ENRAGED?
EDR, ETDR, Next Gen AV is all the rage, so why am I ENRAGED?EDR, ETDR, Next Gen AV is all the rage, so why am I ENRAGED?
EDR, ETDR, Next Gen AV is all the rage, so why am I ENRAGED?
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 
(SACON) Sudarshan Pisupati & Sahir Hidayatullah - active deception sacon
(SACON) Sudarshan Pisupati & Sahir Hidayatullah - active deception sacon(SACON) Sudarshan Pisupati & Sahir Hidayatullah - active deception sacon
(SACON) Sudarshan Pisupati & Sahir Hidayatullah - active deception sacon
 
Drive more value through data source and use case optimization
Drive more value through data source and use case optimization Drive more value through data source and use case optimization
Drive more value through data source and use case optimization
 
Splunk for Developers
Splunk for DevelopersSplunk for Developers
Splunk for Developers
 
Splunk .conf18 Updates, Config Add-on, SplDevOps
Splunk .conf18 Updates, Config Add-on, SplDevOpsSplunk .conf18 Updates, Config Add-on, SplDevOps
Splunk .conf18 Updates, Config Add-on, SplDevOps
 
Splunk and Cisco UCS Breakout Session
Splunk and Cisco UCS Breakout SessionSplunk and Cisco UCS Breakout Session
Splunk and Cisco UCS Breakout Session
 
Workshop splunk 6.5-saint-louis-mo
Workshop splunk 6.5-saint-louis-moWorkshop splunk 6.5-saint-louis-mo
Workshop splunk 6.5-saint-louis-mo
 
Virtual Splunk User Group - Phantom Workbook Automation & Threat Hunting with...
Virtual Splunk User Group - Phantom Workbook Automation & Threat Hunting with...Virtual Splunk User Group - Phantom Workbook Automation & Threat Hunting with...
Virtual Splunk User Group - Phantom Workbook Automation & Threat Hunting with...
 
Malware Management - HouSecCon 2014
Malware Management - HouSecCon 2014Malware Management - HouSecCon 2014
Malware Management - HouSecCon 2014
 
Splunk Enterpise for Information Security Hands-On
Splunk Enterpise for Information Security Hands-OnSplunk Enterpise for Information Security Hands-On
Splunk Enterpise for Information Security Hands-On
 

Ähnlich wie BinaryPig - Scalable Malware Analytics in Hadoop

13 practical tips for writing secure golang applications
13 practical tips for writing secure golang applications13 practical tips for writing secure golang applications
13 practical tips for writing secure golang applicationsKarthik Gaekwad
 
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDaveEdwards12
 
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapDEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapFelipe Prado
 
Tool Up Your LAMP Stack
Tool Up Your LAMP StackTool Up Your LAMP Stack
Tool Up Your LAMP StackLorna Mitchell
 
DCRUG: Achieving Development-Production Parity
DCRUG: Achieving Development-Production ParityDCRUG: Achieving Development-Production Parity
DCRUG: Achieving Development-Production ParityGeoff Harcourt
 
"Hands Off! Best Practices for Code Hand Offs"
"Hands Off!  Best Practices for Code Hand Offs""Hands Off!  Best Practices for Code Hand Offs"
"Hands Off! Best Practices for Code Hand Offs"Naomi Dushay
 
Developing Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonDeveloping Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonSmartBear
 
Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...
Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...
Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...Flink Forward
 
A Lap Around Visual Studio 11
A Lap Around Visual Studio 11A Lap Around Visual Studio 11
A Lap Around Visual Studio 11Chad Green
 
Recon-Fu @BsidesKyiv 2016
Recon-Fu @BsidesKyiv 2016Recon-Fu @BsidesKyiv 2016
Recon-Fu @BsidesKyiv 2016Vlad Styran
 
Xcode, Basics and Beyond
Xcode, Basics and BeyondXcode, Basics and Beyond
Xcode, Basics and Beyondrsebbe
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caringsporst
 
Profiling and Tuning a Web Application - The Dirty Details
Profiling and Tuning a Web Application - The Dirty DetailsProfiling and Tuning a Web Application - The Dirty Details
Profiling and Tuning a Web Application - The Dirty DetailsAchievers Tech
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSSri Ambati
 
Midwest php 2013 deploying php on paas- why & how
Midwest php 2013   deploying php on paas- why & howMidwest php 2013   deploying php on paas- why & how
Midwest php 2013 deploying php on paas- why & howdotCloud
 
JUC Europe 2015: Continuous Integration and Distribution in the Cloud with DE...
JUC Europe 2015: Continuous Integration and Distribution in the Cloud with DE...JUC Europe 2015: Continuous Integration and Distribution in the Cloud with DE...
JUC Europe 2015: Continuous Integration and Distribution in the Cloud with DE...CloudBees
 
Blue Teaming on a Budget of Zero
Blue Teaming on a Budget of ZeroBlue Teaming on a Budget of Zero
Blue Teaming on a Budget of ZeroKyle Bubp
 

Ähnlich wie BinaryPig - Scalable Malware Analytics in Hadoop (20)

13 practical tips for writing secure golang applications
13 practical tips for writing secure golang applications13 practical tips for writing secure golang applications
13 practical tips for writing secure golang applications
 
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDefcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware
 
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapDEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
 
Tool up your lamp stack
Tool up your lamp stackTool up your lamp stack
Tool up your lamp stack
 
Tool Up Your LAMP Stack
Tool Up Your LAMP StackTool Up Your LAMP Stack
Tool Up Your LAMP Stack
 
DCRUG: Achieving Development-Production Parity
DCRUG: Achieving Development-Production ParityDCRUG: Achieving Development-Production Parity
DCRUG: Achieving Development-Production Parity
 
"Hands Off! Best Practices for Code Hand Offs"
"Hands Off!  Best Practices for Code Hand Offs""Hands Off!  Best Practices for Code Hand Offs"
"Hands Off! Best Practices for Code Hand Offs"
 
Developing Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & PythonDeveloping Brilliant and Powerful APIs in Ruby & Python
Developing Brilliant and Powerful APIs in Ruby & Python
 
Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...
Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...
Marc Schwering – Using Flink with MongoDB to enhance relevancy in personaliza...
 
Web-App Remote Code Execution Via Scripting Engines
Web-App Remote Code Execution Via Scripting EnginesWeb-App Remote Code Execution Via Scripting Engines
Web-App Remote Code Execution Via Scripting Engines
 
Stegano Forensics
Stegano ForensicsStegano Forensics
Stegano Forensics
 
A Lap Around Visual Studio 11
A Lap Around Visual Studio 11A Lap Around Visual Studio 11
A Lap Around Visual Studio 11
 
Recon-Fu @BsidesKyiv 2016
Recon-Fu @BsidesKyiv 2016Recon-Fu @BsidesKyiv 2016
Recon-Fu @BsidesKyiv 2016
 
Xcode, Basics and Beyond
Xcode, Basics and BeyondXcode, Basics and Beyond
Xcode, Basics and Beyond
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caring
 
Profiling and Tuning a Web Application - The Dirty Details
Profiling and Tuning a Web Application - The Dirty DetailsProfiling and Tuning a Web Application - The Dirty Details
Profiling and Tuning a Web Application - The Dirty Details
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Midwest php 2013 deploying php on paas- why & how
Midwest php 2013   deploying php on paas- why & howMidwest php 2013   deploying php on paas- why & how
Midwest php 2013 deploying php on paas- why & how
 
JUC Europe 2015: Continuous Integration and Distribution in the Cloud with DE...
JUC Europe 2015: Continuous Integration and Distribution in the Cloud with DE...JUC Europe 2015: Continuous Integration and Distribution in the Cloud with DE...
JUC Europe 2015: Continuous Integration and Distribution in the Cloud with DE...
 
Blue Teaming on a Budget of Zero
Blue Teaming on a Budget of ZeroBlue Teaming on a Budget of Zero
Blue Teaming on a Budget of Zero
 

Mehr von Jason Trost

Anomali Detect 2016 - Borderless Threat Intelligence
Anomali Detect 2016 - Borderless Threat IntelligenceAnomali Detect 2016 - Borderless Threat Intelligence
Anomali Detect 2016 - Borderless Threat IntelligenceJason Trost
 
R-CISC Summit 2016 Borderless Threat Intelligence
R-CISC Summit 2016 Borderless Threat IntelligenceR-CISC Summit 2016 Borderless Threat Intelligence
R-CISC Summit 2016 Borderless Threat IntelligenceJason Trost
 
SANS CTI Summit 2016 Borderless Threat Intelligence
SANS CTI Summit 2016 Borderless Threat IntelligenceSANS CTI Summit 2016 Borderless Threat Intelligence
SANS CTI Summit 2016 Borderless Threat IntelligenceJason Trost
 
BSidesNYC 2016 - An Adversarial View of SaaS Malware Sandboxes
BSidesNYC 2016 - An Adversarial View of SaaS Malware SandboxesBSidesNYC 2016 - An Adversarial View of SaaS Malware Sandboxes
BSidesNYC 2016 - An Adversarial View of SaaS Malware SandboxesJason Trost
 
Lessons Learned from Building and Running MHN, the World's Largest Crowdsourc...
Lessons Learned from Building and Running MHN, the World's Largest Crowdsourc...Lessons Learned from Building and Running MHN, the World's Largest Crowdsourc...
Lessons Learned from Building and Running MHN, the World's Largest Crowdsourc...Jason Trost
 
Modern Honey Network at Bay Area Open Source Security Hackers
Modern Honey Network at Bay Area Open Source Security HackersModern Honey Network at Bay Area Open Source Security Hackers
Modern Honey Network at Bay Area Open Source Security HackersJason Trost
 
Clairvoyant Squirrel: Large Scale Malicious Domain Classification
Clairvoyant Squirrel: Large Scale Malicious Domain ClassificationClairvoyant Squirrel: Large Scale Malicious Domain Classification
Clairvoyant Squirrel: Large Scale Malicious Domain ClassificationJason Trost
 
Accumulo Nutch/GORA, Storm, and Pig
Accumulo Nutch/GORA, Storm, and PigAccumulo Nutch/GORA, Storm, and Pig
Accumulo Nutch/GORA, Storm, and PigJason Trost
 

Mehr von Jason Trost (8)

Anomali Detect 2016 - Borderless Threat Intelligence
Anomali Detect 2016 - Borderless Threat IntelligenceAnomali Detect 2016 - Borderless Threat Intelligence
Anomali Detect 2016 - Borderless Threat Intelligence
 
R-CISC Summit 2016 Borderless Threat Intelligence
R-CISC Summit 2016 Borderless Threat IntelligenceR-CISC Summit 2016 Borderless Threat Intelligence
R-CISC Summit 2016 Borderless Threat Intelligence
 
SANS CTI Summit 2016 Borderless Threat Intelligence
SANS CTI Summit 2016 Borderless Threat IntelligenceSANS CTI Summit 2016 Borderless Threat Intelligence
SANS CTI Summit 2016 Borderless Threat Intelligence
 
BSidesNYC 2016 - An Adversarial View of SaaS Malware Sandboxes
BSidesNYC 2016 - An Adversarial View of SaaS Malware SandboxesBSidesNYC 2016 - An Adversarial View of SaaS Malware Sandboxes
BSidesNYC 2016 - An Adversarial View of SaaS Malware Sandboxes
 
Lessons Learned from Building and Running MHN, the World's Largest Crowdsourc...
Lessons Learned from Building and Running MHN, the World's Largest Crowdsourc...Lessons Learned from Building and Running MHN, the World's Largest Crowdsourc...
Lessons Learned from Building and Running MHN, the World's Largest Crowdsourc...
 
Modern Honey Network at Bay Area Open Source Security Hackers
Modern Honey Network at Bay Area Open Source Security HackersModern Honey Network at Bay Area Open Source Security Hackers
Modern Honey Network at Bay Area Open Source Security Hackers
 
Clairvoyant Squirrel: Large Scale Malicious Domain Classification
Clairvoyant Squirrel: Large Scale Malicious Domain ClassificationClairvoyant Squirrel: Large Scale Malicious Domain Classification
Clairvoyant Squirrel: Large Scale Malicious Domain Classification
 
Accumulo Nutch/GORA, Storm, and Pig
Accumulo Nutch/GORA, Storm, and PigAccumulo Nutch/GORA, Storm, and Pig
Accumulo Nutch/GORA, Storm, and Pig
 

Kürzlich hochgeladen

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 

Kürzlich hochgeladen (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 

BinaryPig - Scalable Malware Analytics in Hadoop

  • 1. BinaryPig: Scalable Binary Data Extraction in Hadoop Created By:
 Jason Trost, Telvis Calhoun, Zach Hanif
  • 2. Bringing  data  science  to  cyber  security,   allowing  you  to  sense,  analyze  and  act  in   real  5me.    
  • 3. Agenda   •  • The Problem •  • BinaryPig Architecture •  • Code and Implementation Details •  • Analysis and Results •  • Demo •  • Wrap-Up •  A
  • 4. Background   2.5 years 20M samples 9.5TB of malware • 
  • 5.
  • 6. Malware data mining is useful •  • Threat intel feeds
 • Contextual enrichment on events
 • Machine learning models
  • 8. Pre-­‐BinaryPig:  Storage  Issues   •  • We kept running out of disk! •  • We lost samples when NFS nodes failed.
  • 9. Pre-­‐BinaryPig  -­‐  Processing  Issues   •  • No Data Locality. •  • Node failures were catastrophic. •  • Hard to add new analysis scripts.
  • 10. Pre-­‐Binary  Pig  -­‐  Data  Explora=on  Issues   •  • How can I share my findings for greater fame and glory? •  • Create a table schema for every analysis script? •  • RDBMS failure is worse than zombie apocalypse.
  • 11. We  needed  a  system  that...   •  • Scales to our historical data •  • Recovers from failures •  • Grows through scripting •  • Supports dynamic schemas •  • Searchable via the web
  • 12. BinaryPig FRAMEWORK FOR PROCESSING SMALL BINARY FILES USING APACHE HADOOP AND APACHE PIG Bi
  • 13. BinaryPig   •  • Simple DSL •  • Pluggable analytics •  • Plays nice with existing tools •  • Enables rapid iteration
  • 15. BinaryPig  -­‐  Storage   •  • HDFS, scalable, replicated •  • Aggregate malware samples into sequence files
  • 16. BinaryPig  -­‐  Processing   •  • Hadoop - robust, distributed, with data locality •  • Apache Pig - Extensible, simple
  • 17. BinaryPig  -­‐  Results  Explora=on   •  • UI - turns your grandma into a data scientist •  • Elasticsearch - schemaless, replicated, awesome
  • 18. Yet  Another  Framework?   •  Malware  tools  didn't  scale   •  Hadoop  does  not  play  well  with  small  binary   files   •  Hadoop  did  not  integrate  exis5ng  malware   analysis  tools    
  • 20. BinaryPig  is  easy  to  use!  
  • 21. BinaryPig  Ingest  Tools   •  Generate  sequence  file  from  directory   containing  malware  samples     ./bin/dir_to_sequencefile <malwareDir> <hdfsOutputFile> •  Generate  sequence  file  from  archive     ./bin/archive_to_sequencefile <archive> <hdfsOutputFile>
  • 22. BinaryPig  Loaders   •  Converts  raw  data  to  a  tuple   •  Execu5ng  Loader   o  Executes  a  specific  script/program  on  a  file   wriIen  to  a  logical  path   o  Example:  Hashing   •  Daemon  Loader   o  Writes  binaries  to  a  path,  and  provides  those   paths  to  an  already  running  analysis  process   o  Example:  Clamd    
  • 23. Op=miza=ons  in  BinaryPig   •  To  leverage  pre-­‐exis5ng  tools,  we  had  to  write   malware  binaries  to  the  local  filesystem  on  the   worker  nodes   o  Note:  local  copy,  not  network  copy   o  We  op5mized  this  to  use  /dev/shm/  instead   •  Quick  scripts  are  great  for  rapid  itera5on,  but...   o  Interpreter  startup  5me  can  dominate  execu5on  5me   o  Crea5ng  small,  long  running,  analy5c  daemons  provides  a   huge  speedup  for  frequently  used  tasks   o  i.e.  the  clamscand  model  of  execu5on    
  • 24. BinaryPig:  Loader  Implementa=ons   •  Generic Script Loader •  Generic Daemon Loader •  ClamAV Loader •  Yara Loader •  Hashing Loader
  • 25. strings.sh:     #!/bin/bash strings "$@" BinaryPig:  Scrip=ng   strings.pig:     define Loader com.endgame.binarypig.loaders.ExecutingTextLoader; data = LOAD '$INPUT' USING Loader('strings.sh'); DUMP data;
  • 26. BinaryPig  supports  non-­‐PE32  files   •  Handles more than just malware... o  Image analysis o  PDF data extraction o  APK extraction o  Any small binary files
  • 29. Malware  Census   •  •  20  Million  unique  binaries   o  •  ~94%  PE  format   o  •  ~6%  are  mostly  Android  APK's   •  •  5  hours  to  run  historical  set  
  • 31. Feature  Extrac=on   •  Our  core  mo5va5on  was  to  dras5cally  improve  the   experience  of  valida5ng  research.   •  Packer  iden5fica5on   o  Overall  and  Sec5onal  Entropy   o  Kolmogorov  Complexity   o  Sec5on  and  resource  names   o  Sec5on  flags   •  Import  tables   •  Func5on  Calls   •  Resource  hashes  and  subfeatures  
  • 32. Feature  Depth   •  PEHeaders  are  shallow   o  Easy  to  manipulate   o  Less  resolu5on  than  reverse  engineering  features   o  File  metadata  is  also  low  resolu5on   •  Headers  provide  excellent  fast  features   •  Headers  are  oben  ignored   •  Work  the  analysis  around  the  feature  resolu5on   o  Ignore  5ght  clusters,  go  for  wide  ones   o  Triage,  not  true  classifica5on  
  • 33. Clustering  Results   •  Triage  for  dynamic  analysis  winnowing   •  Largest  cluster:  377,882  samples   o  Three  malware  families  contained  within     o  Second  largest:  124,894  samples   •  Valida5on  is  tricky   o  Manual  valida5on  cannot  be  en5rely  avoided   o  Cluster  meanings  change  with  feature  sets   o  Cannot  just  go  off  of  AV  results  
  • 35. Icon  Features   •  Pixel  based  features   o  Brightness     o  Color  values   o  Pixel  density   •  Cryptographic  and  fuzzy  hashing   o  Perceptual  hashes   •  Edge  detec5on  
  • 36. Icon  Results   •  Icon  clustering     o  Groups  do  not  just  include  family  lines   o  Copycat  malware  is  shown  as  well   o  Clear  indica5ons  of  malicious  intent   •  Method  of  infec5on  can  be  extrapolated   o  Phishing   o  Obfuscated  executables   o  Adware  (more  than  we  expected)   o  False  posi5ves  -­‐  popular  sobware  detec5on  
  • 37. Lessons  Learned   •  Feature  Selec5on   o  Over  500  features  in  PEheader  alone   o  Abundance  of  features  requires  pruning   •  Interpreta5on  and  Valida5on  of  Results   o  Manual  valida5on  is  an  unfortunate  reality   o  Care  has  to  be  taken  to  ensure  that  unsupervised   learning  provides  meaningful  results  
  • 40. •  Rapid Iteration •  Feature extraction •  Clustering analysis for rapid malware triage •  Enables weekly AV scans with latest signatures over previous malware. •  Created binary classifier to improve sample collection and categorize new samples So  What?  
  • 41. Future  work   •  • Compatibility with Pig 0.10.* and 0.11.* •  • EC2 tutorial •  • More examples/starter scripts o  • Inclusion of some of our Mahout tasks o  • Open source process for that is moving forward •  • Better error logging and handling o  Messages should be stored in a separate DB •  • Easier deployments o  • Analytic daemons o  • Dependency libs o  • Fabric/Salt/Puppet/Chef
  • 42. BinaryPig  is  Open  Source!   https://github.com/endgameinc/binarypig
 Apache 2 License
  • 43. We  are  hiring!   http://endgame.com/careers