SlideShare ist ein Scribd-Unternehmen logo
1 von 69
Downloaden Sie, um offline zu lesen
Drilling Security Data
Charles S. Givre, CISSP

charles.givre@gtkcyber.com

www.thedataist.com
• Lead Data Scientist at Deutsche Bank
• PMC Member of Apache Drill Project
• Co-founder GTK Cyber
• Senior Lead Data Scientist @ Booz Allen
• 5 Years @ CIA
• Working on Apache Drill Book
• Masters Degree in Middle Eastern Studies
• Undergraduate in Comp.Sci & Music
Charles Givre
The Problem: Analyzing
Security Data is Hard
gtkcyber.com
Security Data Analysis is Hard…Really Hard
Why?
Security Data comes in Many
Forms
Security Data Comes in Many Forms
• Standard Types such as JSON/CSV/XML

• Log Files: Event Logs, Database, Web Server etc.

• Syslog

• PCAP / PCAP-NG: Binary, raw network traffic
Security Data Lives In Many Places
• File systems

• Cloud Storage: Azure / S3

• Databases

• Real Time Event Streams

• Other sources
Few tools can effectively analyze
these data types effectively
Even fewer tools can effectively analyze
ALL these data types effectively
Splunk and ELK are solid SIEM
platforms
Wireshark is good for PCAP
analysis
gtkcyber.com
Security data is not arranged in an
optimal way for ad-hoc analysis
gtkcyber.com
Security data is not arranged in an optimal way
for ad-hoc analysis
ETL Data Warehouse
ETL is expensive and wasteful
gtkcyber.com
Analytics teams spend
between 50%-90% of their
time preparing their data.
gtkcyber.com
76% of Data Scientists say
this is the least enjoyable
part of their job.
http://visit.crowdflower.com/rs/416-ZBE-142/images/CrowdFlower_DataScienceReport_2016.pdf
gtkcyber.com
The ETL Process consumes the
most time and contributes almost
no value to the end product.
gtkcyber.com https://goleansixsigma.com/8-wastes/
So where does Drill fit in?
Apache Drill is an SQL Engine
for self-describing data.
Drill lets you query anything*,
wherever it is*, no matter its size**
using standard SQL.
* well.. almost anything
** within reason
Drill acts as a universal
translator for data.
gtkcyber.com
ETL Data Warehouse
gtkcyber.com
gtkcyber.com
SELECT src_ip, COUNT(*)

FROM dfs.test.`bad.pcap`
GROUP BY src_ip
Table
Whoa! I can query PCAP as if it
was a database!!
gtkcyber.com
So let's take a look...
gtkcyber.com
For instructions on how to
use Apache Drill with Apache
Superset (Incubating)
http://thedataist.com/visualize-anything-with-superset-and-drill/
gtkcyber.com
158.222.5.157 - - [25/Oct/2015:04:24:37 +0100] "GET /acl_users/
credentials_cookie_auth/require_login?came_from=http%3A//
howto.basjes.nl/join_form HTTP/1.1" 200 10716 "http://
howto.basjes.nl/join_form" "Mozilla/5.0 (Windows NT 6.3; WOW64;
rv:34.0) Gecko/20100101 Firefox/34.0 AlexaToolbar/alxf-2.21"
158.222.5.157 - - [25/Oct/2015:04:24:39 +0100] "GET /login_form
HTTP/1.1" 200 10543 "http://howto.basjes.nl/" "Mozilla/5.0
(Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0
AlexaToolbar/alxf-2.21"
gtkcyber.com https://github.com/nielsbasjes/logparser/tree/master/examples/demolog
gtkcyber.com https://github.com/nielsbasjes/logparser/tree/master/examples/demolog
Who is Trying to Hack This Site?
• The url /join_form is not public so anyone attempting to access this site,
so almost anyone trying to access this probably a hacker... 

• Let's see who is looking...
https://github.com/nielsbasjes/logparser/tree/master/examples/demolog
Who is Trying to Hack This Site?
SELECT request_receive_time,
connection_client_host,
request_useragent
FROM dfs.test.`hackers-access.httpd`
WHERE request_firstline_uri = '/join_form'
https://github.com/nielsbasjes/logparser/tree/master/examples/demolog
Who is Trying to Hack This Site?
SELECT request_receive_time,
connection_client_host,
request_useragent
FROM dfs.test.`hackers-access.httpd`
WHERE request_firstline_uri = '/join_form'
Who is Trying to Hack This Site?
Let's take a look at those IP
addresses shall we?
Who is Trying to Hack This Site?
SELECT request_receive_time,
connection_client_host,
request_useragent
FROM dfs.test.`hackers-access.httpd`
WHERE request_firstline_uri = '/join_form'
Drill's Flexible UDF Interface Allows
you to write your own functions.
A collection of GeoIP Functions
is available on GitHub.
https://github.com/cgivre/drill-geoip-functions
Drill GeoIP Functions
• getCountryName( <ip> ): This function returns the country name of the IP address, "Unknown" if the IP is unknown or invalid.

• getCountryConfidence( <ip> ): This function returns the confidence score of the country ISO code of the IP address.

• getCountryISOCode( <ip> ): This function returns the country ISO code of the IP address, "Unknown" if the IP is unknown or invalid.

• getCityName( <ip> ): This function returns the city name of the IP address, "Unknown" if the IP is unknown or invalid.

• getCityConfidence( <ip> ): This function returns confidence score of the city name of the IP address.

• getLatitude( <ip> ): This function returns the latitude associated with the IP address.

• getLongitude( <ip> ): This function returns the longitude associated with the IP address.

• getTimezone( <ip> ): This function returns the timezone associated with the IP address.

• getAccuracyRadius( <ip> ): This function returns the accuracy radius associated with the IP address, 0 if unknown.

• getAverageIncome( <ip> ): This function returns the average income of the region associated with the IP address, 0 if unknown.

• getMetroCode( <ip> ): This function returns the metro code of the region associated with the IP address, 0 if unknown.

• getPopulationDensity( <ip> ): This function returns the population density associated with the IP address.

• getPostalCode( <ip> ): This function returns the postal code associated with the IP address.

• getCoordPoint( <ip> ): This function returns a point for use in GIS functions of the lat/long of associated with the IP address.

• getASN( <ip> ): This function returns the autonomous system of the IP address, "Unknown" if the IP is unknown or invalid.

• getASNOrganization( <ip> ): This function returns the autonomous system organization of the IP address, "Unknown" if the IP is
unknown or invalid.

• isEU( <ip> ), isEuropeanUnion( <ip> ): This function returns true if the ip address is located in the European Union, false if not.

• isAnonymous( <ip> ): This function returns true if the ip address is anonymous, false if not.

• isAnonymousVPN( <ip> ): This function returns true if the ip address is an anonymous virtual private network (VPN), false if not.

• isHostingProvider( <ip> ): This function returns true if the ip address is a hosting provider, false if not.

• isPublciProxy( <ip> ): This function returns true if the ip address is a public proxy, false if not.

• isTORExitNode( <ip> ): This function returns true if the ip address is a known TOR exit node, false if not.
https://github.com/cgivre/drill-geoip-functions
Who is Trying to Hack This Site?
SELECT request_receive_time, connection_client_host,
request_useragent,
getCountryName(connection_client_host ) as countryName,
getCityName(connection_client_host ) as cityName
FROM dfs.test.`hackers-access.httpd`
WHERE request_firstline_uri = '/join_form'
Who is Trying to Hack This Site?
Who is Trying to Hack This Site?
SELECT request_receive_time, connection_client_host, request_useragent,
getCountryName(connection_client_host ) as countryName,
getCityName(connection_client_host ) as cityName,
getLatitude(connection_client_host ) as latitude,
getLongitude(connection_client_host ) as longitude
FROM dfs.test.`hackers-access.httpd`
WHERE request_firstline_uri = '/join_form'
Who is Trying to Hack This Site?
What equipment are they using?
What equipment are they using?
• Drill has support for multidimensional data structures including KV Pairs
and Lists.

• There is a pre-existing UDF (https://github.com/nielsbasjes/yauaa) for
Apache Drill which can parse User Agent Strings and get you a lot of
useful information from the UA string.
Who is Trying to Hack This Site?
SELECT parse_user_agent(request_useragent) AS ua
FROM dfs.test.`hackers-access.httpd`
WHERE request_firstline_uri = '/join_form'
Who is Trying to Hack This Site?
SELECT table1.ua.OperatingSystemNameVersion as os,
table1.ua.AgentNameVersionMajor as browser
FROM
(
SELECT
parse_user_agent(request_useragent) AS ua
FROM dfs.test.`hackers-access.httpd`
WHERE request_firstline_uri = '/join_form'
) AS table1
Questions so far?
Let's have some fun with PCAP
Let's have some fun with PCAP
• PCAP is short for Packet Capture and represent raw network traffic. 

• Files are encoded in binary format, but various tools exist to analyze
PCAP files or convert them into more easily accessible formats, such as
JSON.
PCAP Analysis
PCAP Analysis
• Drill has a comprehensive collection of networking functions to facilitate
network analysis

• is_private_ip(), is_valid_ip(), in_network() and many others. 

• inet_aton() and inet_ntoa() exist to facilitate sorting IP ranges.
PCAP Analysis
Let's look for who is talking to
whom...
PCAP Analysis
SELECT DISTINCT src_ip, dst_ip
FROM dfs.test.`slowdownload.pcap`
WHERE src_ip is not null
SynFlood Detection
SYN
SYN/ACK
ACK
SynFlood Detection
SYN
SYN/ACK
SYN
Drill can automate SYN Flood
detection with a custom UDF.
https://github.com/cgivre/drill-synflood-udf
SYN Flood Detection
SELECT tcp_session
FROM dfs.test.`synscan.pcap`
GROUP BY tcp_session
HAVING is_syn_flood(tcp_session, tcp_flags_syn, tcp_flags_ack)
SYN Flood Detection
SELECT *
FROM dfs.test.`synscan.pcap`
WHERE tcp_session IN
(
SELECT tcp_session
FROM dfs.test.`synscan.pcap`
GROUP BY tcp_session
HAVING is_syn_flood(tcp_session, tcp_flags_syn, tcp_flags_ack)
)
Questions so far?
How about log files?
Dec 12 2018 06:50:25 sshd[36669]: Failed password for root from 61.160.251.136 port 1313 ssh2
Dec 12 2018 06:50:25 sshd[36669]: Failed password for root from 61.160.251.136 port 1313 ssh2
Dec 12 2018 06:50:24 sshd[36669]: Failed password for root from 61.160.251.136 port 1313 ssh2
Dec 12 2018 03:36:23 sshd[41875]: Failed password for root from 222.189.239.10 port 1350 ssh2
Dec 12 2018 03:36:22 sshd[41875]: Failed password for root from 222.189.239.10 port 1350 ssh2
Dec 12 2018 03:36:22 sshlockout[15383]: Locking out 222.189.239.10 after 15 invalid attempts
Dec 12 2018 03:36:22 sshd[41875]: Failed password for root from 222.189.239.10 port 1350 ssh2
Dec 12 2018 03:36:22 sshlockout[15383]: Locking out 222.189.239.10 after 15 invalid attempts
Dec 12 2018 03:36:22 sshd[42419]: Failed password for root from 222.189.239.10 port 2646 ssh2
Drill can natively query Syslog
formatted data*
* This feature will be available in Drill version 1.16
Questions?
https://amzn.to/2HDwE92
Thank You!!
Charles S. Givre, CISSP

charles.givre@gtkcyber.com

www.thedataist.com

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In Elasticsearch
 
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Building an Authorization Solution for Microservices Using Neo4j and OPA
Building an Authorization Solution for Microservices Using Neo4j and OPABuilding an Authorization Solution for Microservices Using Neo4j and OPA
Building an Authorization Solution for Microservices Using Neo4j and OPA
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Minio Cloud Storage
Minio Cloud StorageMinio Cloud Storage
Minio Cloud Storage
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
A Practical Introduction to Handling Log Data in ClickHouse, by Robert Hodges...
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)Prometheus (Prometheus London, 2016)
Prometheus (Prometheus London, 2016)
 

Ähnlich wie Drilling Cyber Security Data With Apache Drill

How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
琛琳 饶
 

Ähnlich wie Drilling Cyber Security Data With Apache Drill (20)

Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 
Apache Spark v3.0.0
Apache Spark v3.0.0Apache Spark v3.0.0
Apache Spark v3.0.0
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
 
Who pulls the strings?
Who pulls the strings?Who pulls the strings?
Who pulls the strings?
 
The basics of hacking and penetration testing 이제 시작이야 해킹과 침투 테스트 kenneth.s.kwon
The basics of hacking and penetration testing 이제 시작이야 해킹과 침투 테스트 kenneth.s.kwonThe basics of hacking and penetration testing 이제 시작이야 해킹과 침투 테스트 kenneth.s.kwon
The basics of hacking and penetration testing 이제 시작이야 해킹과 침투 테스트 kenneth.s.kwon
 
Logstash
LogstashLogstash
Logstash
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
 
System insight without Interference
System insight without InterferenceSystem insight without Interference
System insight without Interference
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
BIG DATA ANALYSIS
BIG DATA ANALYSISBIG DATA ANALYSIS
BIG DATA ANALYSIS
 
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
 
Running High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclioRunning High-Speed Serverless with nuclio
Running High-Speed Serverless with nuclio
 
Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js
 
SwampDragon presentation: The Copenhagen Django Meetup Group
SwampDragon presentation: The Copenhagen Django Meetup GroupSwampDragon presentation: The Copenhagen Django Meetup Group
SwampDragon presentation: The Copenhagen Django Meetup Group
 
Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
Being HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeBeing HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on Purpose
 

Mehr von Charles Givre

Mehr von Charles Givre (9)

Blockchain and UDFs
Blockchain and UDFsBlockchain and UDFs
Blockchain and UDFs
 
Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2
 
Data Exploration with Apache Drill: Day 1
Data Exploration with Apache Drill:  Day 1Data Exploration with Apache Drill:  Day 1
Data Exploration with Apache Drill: Day 1
 
Killing ETL with Apache Drill
Killing ETL with Apache DrillKilling ETL with Apache Drill
Killing ETL with Apache Drill
 
What Does Your Smart Car Know About You? Strata London 2016
What Does Your Smart Car Know About You?  Strata London 2016What Does Your Smart Car Know About You?  Strata London 2016
What Does Your Smart Car Know About You? Strata London 2016
 
Apache Drill Workshop
Apache Drill WorkshopApache Drill Workshop
Apache Drill Workshop
 
Merlin: The Ultimate Data Science Environment
Merlin: The Ultimate Data Science EnvironmentMerlin: The Ultimate Data Science Environment
Merlin: The Ultimate Data Science Environment
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
 
Strata NYC 2015 What does your smart device know about you?
Strata NYC 2015 What does your smart device know about you?Strata NYC 2015 What does your smart device know about you?
Strata NYC 2015 What does your smart device know about you?
 

Kürzlich hochgeladen

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
anilsa9823
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Kürzlich hochgeladen (20)

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 

Drilling Cyber Security Data With Apache Drill

  • 1. Drilling Security Data Charles S. Givre, CISSP charles.givre@gtkcyber.com www.thedataist.com
  • 2. • Lead Data Scientist at Deutsche Bank • PMC Member of Apache Drill Project • Co-founder GTK Cyber • Senior Lead Data Scientist @ Booz Allen • 5 Years @ CIA • Working on Apache Drill Book • Masters Degree in Middle Eastern Studies • Undergraduate in Comp.Sci & Music Charles Givre
  • 4. gtkcyber.com Security Data Analysis is Hard…Really Hard
  • 6. Security Data comes in Many Forms
  • 7. Security Data Comes in Many Forms • Standard Types such as JSON/CSV/XML • Log Files: Event Logs, Database, Web Server etc. • Syslog • PCAP / PCAP-NG: Binary, raw network traffic
  • 8. Security Data Lives In Many Places • File systems • Cloud Storage: Azure / S3 • Databases • Real Time Event Streams • Other sources
  • 9. Few tools can effectively analyze these data types effectively
  • 10. Even fewer tools can effectively analyze ALL these data types effectively
  • 11. Splunk and ELK are solid SIEM platforms
  • 12. Wireshark is good for PCAP analysis
  • 13. gtkcyber.com Security data is not arranged in an optimal way for ad-hoc analysis
  • 14. gtkcyber.com Security data is not arranged in an optimal way for ad-hoc analysis ETL Data Warehouse
  • 15. ETL is expensive and wasteful
  • 16. gtkcyber.com Analytics teams spend between 50%-90% of their time preparing their data.
  • 17. gtkcyber.com 76% of Data Scientists say this is the least enjoyable part of their job. http://visit.crowdflower.com/rs/416-ZBE-142/images/CrowdFlower_DataScienceReport_2016.pdf
  • 18. gtkcyber.com The ETL Process consumes the most time and contributes almost no value to the end product.
  • 20. So where does Drill fit in?
  • 21. Apache Drill is an SQL Engine for self-describing data.
  • 22. Drill lets you query anything*, wherever it is*, no matter its size** using standard SQL. * well.. almost anything ** within reason
  • 23. Drill acts as a universal translator for data.
  • 26. gtkcyber.com SELECT src_ip, COUNT(*)
 FROM dfs.test.`bad.pcap` GROUP BY src_ip Table Whoa! I can query PCAP as if it was a database!!
  • 28. gtkcyber.com For instructions on how to use Apache Drill with Apache Superset (Incubating) http://thedataist.com/visualize-anything-with-superset-and-drill/
  • 29. gtkcyber.com 158.222.5.157 - - [25/Oct/2015:04:24:37 +0100] "GET /acl_users/ credentials_cookie_auth/require_login?came_from=http%3A// howto.basjes.nl/join_form HTTP/1.1" 200 10716 "http:// howto.basjes.nl/join_form" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 AlexaToolbar/alxf-2.21" 158.222.5.157 - - [25/Oct/2015:04:24:39 +0100] "GET /login_form HTTP/1.1" 200 10543 "http://howto.basjes.nl/" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 AlexaToolbar/alxf-2.21"
  • 32. Who is Trying to Hack This Site? • The url /join_form is not public so anyone attempting to access this site, so almost anyone trying to access this probably a hacker... • Let's see who is looking... https://github.com/nielsbasjes/logparser/tree/master/examples/demolog
  • 33. Who is Trying to Hack This Site? SELECT request_receive_time, connection_client_host, request_useragent FROM dfs.test.`hackers-access.httpd` WHERE request_firstline_uri = '/join_form' https://github.com/nielsbasjes/logparser/tree/master/examples/demolog
  • 34. Who is Trying to Hack This Site? SELECT request_receive_time, connection_client_host, request_useragent FROM dfs.test.`hackers-access.httpd` WHERE request_firstline_uri = '/join_form'
  • 35. Who is Trying to Hack This Site?
  • 36. Let's take a look at those IP addresses shall we?
  • 37. Who is Trying to Hack This Site? SELECT request_receive_time, connection_client_host, request_useragent FROM dfs.test.`hackers-access.httpd` WHERE request_firstline_uri = '/join_form'
  • 38. Drill's Flexible UDF Interface Allows you to write your own functions.
  • 39. A collection of GeoIP Functions is available on GitHub. https://github.com/cgivre/drill-geoip-functions
  • 40. Drill GeoIP Functions • getCountryName( <ip> ): This function returns the country name of the IP address, "Unknown" if the IP is unknown or invalid. • getCountryConfidence( <ip> ): This function returns the confidence score of the country ISO code of the IP address. • getCountryISOCode( <ip> ): This function returns the country ISO code of the IP address, "Unknown" if the IP is unknown or invalid. • getCityName( <ip> ): This function returns the city name of the IP address, "Unknown" if the IP is unknown or invalid. • getCityConfidence( <ip> ): This function returns confidence score of the city name of the IP address. • getLatitude( <ip> ): This function returns the latitude associated with the IP address. • getLongitude( <ip> ): This function returns the longitude associated with the IP address. • getTimezone( <ip> ): This function returns the timezone associated with the IP address. • getAccuracyRadius( <ip> ): This function returns the accuracy radius associated with the IP address, 0 if unknown. • getAverageIncome( <ip> ): This function returns the average income of the region associated with the IP address, 0 if unknown. • getMetroCode( <ip> ): This function returns the metro code of the region associated with the IP address, 0 if unknown. • getPopulationDensity( <ip> ): This function returns the population density associated with the IP address. • getPostalCode( <ip> ): This function returns the postal code associated with the IP address. • getCoordPoint( <ip> ): This function returns a point for use in GIS functions of the lat/long of associated with the IP address. • getASN( <ip> ): This function returns the autonomous system of the IP address, "Unknown" if the IP is unknown or invalid. • getASNOrganization( <ip> ): This function returns the autonomous system organization of the IP address, "Unknown" if the IP is unknown or invalid. • isEU( <ip> ), isEuropeanUnion( <ip> ): This function returns true if the ip address is located in the European Union, false if not. • isAnonymous( <ip> ): This function returns true if the ip address is anonymous, false if not. • isAnonymousVPN( <ip> ): This function returns true if the ip address is an anonymous virtual private network (VPN), false if not. • isHostingProvider( <ip> ): This function returns true if the ip address is a hosting provider, false if not. • isPublciProxy( <ip> ): This function returns true if the ip address is a public proxy, false if not. • isTORExitNode( <ip> ): This function returns true if the ip address is a known TOR exit node, false if not. https://github.com/cgivre/drill-geoip-functions
  • 41. Who is Trying to Hack This Site? SELECT request_receive_time, connection_client_host, request_useragent, getCountryName(connection_client_host ) as countryName, getCityName(connection_client_host ) as cityName FROM dfs.test.`hackers-access.httpd` WHERE request_firstline_uri = '/join_form'
  • 42. Who is Trying to Hack This Site?
  • 43. Who is Trying to Hack This Site? SELECT request_receive_time, connection_client_host, request_useragent, getCountryName(connection_client_host ) as countryName, getCityName(connection_client_host ) as cityName, getLatitude(connection_client_host ) as latitude, getLongitude(connection_client_host ) as longitude FROM dfs.test.`hackers-access.httpd` WHERE request_firstline_uri = '/join_form'
  • 44. Who is Trying to Hack This Site?
  • 45. What equipment are they using?
  • 46. What equipment are they using? • Drill has support for multidimensional data structures including KV Pairs and Lists. • There is a pre-existing UDF (https://github.com/nielsbasjes/yauaa) for Apache Drill which can parse User Agent Strings and get you a lot of useful information from the UA string.
  • 47. Who is Trying to Hack This Site? SELECT parse_user_agent(request_useragent) AS ua FROM dfs.test.`hackers-access.httpd` WHERE request_firstline_uri = '/join_form'
  • 48. Who is Trying to Hack This Site? SELECT table1.ua.OperatingSystemNameVersion as os, table1.ua.AgentNameVersionMajor as browser FROM ( SELECT parse_user_agent(request_useragent) AS ua FROM dfs.test.`hackers-access.httpd` WHERE request_firstline_uri = '/join_form' ) AS table1
  • 50. Let's have some fun with PCAP
  • 51. Let's have some fun with PCAP • PCAP is short for Packet Capture and represent raw network traffic. • Files are encoded in binary format, but various tools exist to analyze PCAP files or convert them into more easily accessible formats, such as JSON.
  • 53. PCAP Analysis • Drill has a comprehensive collection of networking functions to facilitate network analysis • is_private_ip(), is_valid_ip(), in_network() and many others. • inet_aton() and inet_ntoa() exist to facilitate sorting IP ranges.
  • 55. Let's look for who is talking to whom...
  • 56. PCAP Analysis SELECT DISTINCT src_ip, dst_ip FROM dfs.test.`slowdownload.pcap` WHERE src_ip is not null
  • 59. Drill can automate SYN Flood detection with a custom UDF. https://github.com/cgivre/drill-synflood-udf
  • 60. SYN Flood Detection SELECT tcp_session FROM dfs.test.`synscan.pcap` GROUP BY tcp_session HAVING is_syn_flood(tcp_session, tcp_flags_syn, tcp_flags_ack)
  • 61. SYN Flood Detection SELECT * FROM dfs.test.`synscan.pcap` WHERE tcp_session IN ( SELECT tcp_session FROM dfs.test.`synscan.pcap` GROUP BY tcp_session HAVING is_syn_flood(tcp_session, tcp_flags_syn, tcp_flags_ack) )
  • 63. How about log files?
  • 64. Dec 12 2018 06:50:25 sshd[36669]: Failed password for root from 61.160.251.136 port 1313 ssh2 Dec 12 2018 06:50:25 sshd[36669]: Failed password for root from 61.160.251.136 port 1313 ssh2 Dec 12 2018 06:50:24 sshd[36669]: Failed password for root from 61.160.251.136 port 1313 ssh2 Dec 12 2018 03:36:23 sshd[41875]: Failed password for root from 222.189.239.10 port 1350 ssh2 Dec 12 2018 03:36:22 sshd[41875]: Failed password for root from 222.189.239.10 port 1350 ssh2 Dec 12 2018 03:36:22 sshlockout[15383]: Locking out 222.189.239.10 after 15 invalid attempts Dec 12 2018 03:36:22 sshd[41875]: Failed password for root from 222.189.239.10 port 1350 ssh2 Dec 12 2018 03:36:22 sshlockout[15383]: Locking out 222.189.239.10 after 15 invalid attempts Dec 12 2018 03:36:22 sshd[42419]: Failed password for root from 222.189.239.10 port 2646 ssh2
  • 65.
  • 66. Drill can natively query Syslog formatted data* * This feature will be available in Drill version 1.16
  • 69. Thank You!! Charles S. Givre, CISSP charles.givre@gtkcyber.com www.thedataist.com