SlideShare a Scribd company logo
1 of 12
Download to read offline
Building	
  enterprise	
  analy0cs	
  from	
  	
  
Open	
  Source	
  
Lecole	
  Cole,	
  Founder	
  &	
  CEO	
  	
  
1	
  
A little about me.

• 
• 
• 
• 
• 
• 

My	
  name	
  is	
  Lecole	
  Cole	
  
Worked	
  in	
  data	
  analysis	
  for	
  +15	
  years	
  
TwiEer:	
  @lecole	
  
Email:	
  lecole@skydera.com	
  
Company	
  Skydera	
  Inc.	
  
Projects:	
  Chartleaf	
  

(2)	
  
Interes0ng	
  Packages
	
  
•  Real-­‐0me	
  Stream	
  
–  AWS	
  Kinesis	
  
–  TwiEer	
  Storm	
  
–  Hadoop	
  v2?	
  

•  Analy0cs	
  package	
  
–  Apache	
  Mahout	
  
–  R-­‐project	
  
–  Pandas	
  (Python)	
  
–  pyBrain	
  (Python)	
  
–  Custom	
  Python	
  

•  Visualiza0on	
  
–  D3.js	
  
–  R-­‐project	
  
(#)	
  
Interes0ng	
  Packages
	
  
•  Batch	
  Processors	
  
–  Hadoop	
  V1	
  
–  EMR	
  (AWS)	
  

•  	
  NoSQL:	
  
–  MongoDB	
  
–  DynamoDB	
  (AWS)	
  
–  BigTable	
  (Google)	
  
–  Cassandra	
  
(#)	
  
Example	
  Stack
	
  
•  Compute:	
  
–  Google	
  Compute	
  

•  Database	
  
–  MySQL	
  
–  BigTable	
  

•  Analy0cs	
  

•  Language:	
  
–  Java	
  applica0on	
  for	
  
Hadoop	
  

•  Data	
  access	
  
–  Apache	
  Pig	
  
–  Apache	
  Hive	
  

–  Hadoop	
  	
  	
  

(#)	
  
Example	
  Stack
	
  
•  Compute:	
  
–  AWS	
  EC2	
  

•  Database	
  
–  MySQL	
  
–  DynamoDB	
  

•  Analy0cs	
  Batch	
  
–  EMR	
  	
  

•  Analy0cs	
  Real-­‐0me:	
  
–  AWS	
  Kinesis	
  

•  Data	
  warehouse	
  
–  Redshib	
  

(#)	
  
Screen	
  Shots
	
  

(#)	
  
Screen	
  Shots
	
  

(#)	
  
Screen	
  Shots
	
  

(#)	
  
Screen	
  Shots
	
  

(#)	
  
Screen	
  Shots
	
  

(#)	
  
Screen	
  Shots
	
  

(#)	
  

More Related Content

What's hot

Building a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache SparkBuilding a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache Spark
DataWorks Summit
 

What's hot (19)

Storm at Spotify
Storm at SpotifyStorm at Spotify
Storm at Spotify
 
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYCSpotify in the Cloud - An evolution of data infrastructure - Strata NYC
Spotify in the Cloud - An evolution of data infrastructure - Strata NYC
 
Scalable Scientific Computing with Dask
Scalable Scientific Computing with DaskScalable Scientific Computing with Dask
Scalable Scientific Computing with Dask
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Presto @ Netflix: Interactive Queries at Petabyte Scale
Presto @ Netflix: Interactive Queries at Petabyte ScalePresto @ Netflix: Interactive Queries at Petabyte Scale
Presto @ Netflix: Interactive Queries at Petabyte Scale
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at Spotify
 
Graph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and GremlinGraph Processing with Apache TinkerPop and Gremlin
Graph Processing with Apache TinkerPop and Gremlin
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
 
IBM Open by Design: Graph Technology
IBM Open by Design: Graph TechnologyIBM Open by Design: Graph Technology
IBM Open by Design: Graph Technology
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
 
hotdog a TD tool for DD
hotdog a TD tool for DDhotdog a TD tool for DD
hotdog a TD tool for DD
 
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely Chen Spark Talk At Spark Gathering in Taiwan Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely Chen Spark Talk At Spark Gathering in Taiwan
 
Cassandra eu
Cassandra euCassandra eu
Cassandra eu
 
Amazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto MeetupAmazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto Meetup
 
Building a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache SparkBuilding a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache Spark
 
Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018Dataflow in 104corp - DataConTW2018
Dataflow in 104corp - DataConTW2018
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop
 

Similar to How to Build a Real Time Analytics Enterprise with Open Source

Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
Jesus Rodriguez
 

Similar to How to Build a Real Time Analytics Enterprise with Open Source (20)

Apache hadoop
Apache hadoopApache hadoop
Apache hadoop
 
How Serverless Computing Enables Microservices and Micropayment
How Serverless Computing Enables Microservices and Micropayment  How Serverless Computing Enables Microservices and Micropayment
How Serverless Computing Enables Microservices and Micropayment
 
DneprPy #0: Openstack
DneprPy #0: OpenstackDneprPy #0: Openstack
DneprPy #0: Openstack
 
How Serverless Computing Enables Microservices and Micropayment
How Serverless Computing Enables Microservices and Micropayment  How Serverless Computing Enables Microservices and Micropayment
How Serverless Computing Enables Microservices and Micropayment
 
Python Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the FuturePython Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the Future
 
The Polyglot Data Scientist - Exploring R, Python, and SQL Server
The Polyglot Data Scientist - Exploring R, Python, and SQL ServerThe Polyglot Data Scientist - Exploring R, Python, and SQL Server
The Polyglot Data Scientist - Exploring R, Python, and SQL Server
 
How Serverless Computing Enables Microservices and Micropayment 
How Serverless Computing Enables Microservices and Micropayment  How Serverless Computing Enables Microservices and Micropayment 
How Serverless Computing Enables Microservices and Micropayment 
 
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsKoalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIs
 
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
Apache Spark™ + IBM Watson + Twitter DataPalooza SF 2015
 
Matt Chung (Independent) - Serverless application with AWS Lambda
Matt Chung (Independent) - Serverless application with AWS Lambda Matt Chung (Independent) - Serverless application with AWS Lambda
Matt Chung (Independent) - Serverless application with AWS Lambda
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Cloudsolutionday 2016: DevOps workflow with Docker on AWS
Cloudsolutionday 2016: DevOps workflow with Docker on AWSCloudsolutionday 2016: DevOps workflow with Docker on AWS
Cloudsolutionday 2016: DevOps workflow with Docker on AWS
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
16 months @ SoundCloud
16 months @ SoundCloud16 months @ SoundCloud
16 months @ SoundCloud
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
How Web APIs and Data Centric Tools Power the Materials Project (PyData SV 2013)
How Web APIs and Data Centric Tools Power the Materials Project (PyData SV 2013)How Web APIs and Data Centric Tools Power the Materials Project (PyData SV 2013)
How Web APIs and Data Centric Tools Power the Materials Project (PyData SV 2013)
 
Latest Developments in H2O
Latest Developments in H2OLatest Developments in H2O
Latest Developments in H2O
 

More from ecobold

Partnerships & Is Now a Great Time to Buy Real Estate?
Partnerships & Is Now a Great Time to Buy Real Estate?Partnerships & Is Now a Great Time to Buy Real Estate?
Partnerships & Is Now a Great Time to Buy Real Estate?
ecobold
 
Learn CSS From Scratch
Learn CSS From ScratchLearn CSS From Scratch
Learn CSS From Scratch
ecobold
 
Startup 102
Startup 102Startup 102
Startup 102
ecobold
 

More from ecobold (9)

Partnerships & Is Now a Great Time to Buy Real Estate?
Partnerships & Is Now a Great Time to Buy Real Estate?Partnerships & Is Now a Great Time to Buy Real Estate?
Partnerships & Is Now a Great Time to Buy Real Estate?
 
How to Run the Perfect Demo for Founders
How to Run the Perfect Demo for FoundersHow to Run the Perfect Demo for Founders
How to Run the Perfect Demo for Founders
 
How to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless EditionHow to Build a Big Data Application: Serverless Edition
How to Build a Big Data Application: Serverless Edition
 
Is Open Source a Good Strategy for your Startup?
Is Open Source a Good Strategy for your Startup?Is Open Source a Good Strategy for your Startup?
Is Open Source a Good Strategy for your Startup?
 
SaaS Metrics for Startups
SaaS Metrics for StartupsSaaS Metrics for Startups
SaaS Metrics for Startups
 
Learn CSS From Scratch
Learn CSS From ScratchLearn CSS From Scratch
Learn CSS From Scratch
 
Startup 102
Startup 102Startup 102
Startup 102
 
Ecobold entrepreneurs roundtable ert
Ecobold entrepreneurs roundtable ertEcobold entrepreneurs roundtable ert
Ecobold entrepreneurs roundtable ert
 
Startup 101
Startup 101Startup 101
Startup 101
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

How to Build a Real Time Analytics Enterprise with Open Source

  • 1. Building  enterprise  analy0cs  from     Open  Source   Lecole  Cole,  Founder  &  CEO     1  
  • 2. A little about me. •  •  •  •  •  •  My  name  is  Lecole  Cole   Worked  in  data  analysis  for  +15  years   TwiEer:  @lecole   Email:  lecole@skydera.com   Company  Skydera  Inc.   Projects:  Chartleaf   (2)  
  • 3. Interes0ng  Packages   •  Real-­‐0me  Stream   –  AWS  Kinesis   –  TwiEer  Storm   –  Hadoop  v2?   •  Analy0cs  package   –  Apache  Mahout   –  R-­‐project   –  Pandas  (Python)   –  pyBrain  (Python)   –  Custom  Python   •  Visualiza0on   –  D3.js   –  R-­‐project   (#)  
  • 4. Interes0ng  Packages   •  Batch  Processors   –  Hadoop  V1   –  EMR  (AWS)   •   NoSQL:   –  MongoDB   –  DynamoDB  (AWS)   –  BigTable  (Google)   –  Cassandra   (#)  
  • 5. Example  Stack   •  Compute:   –  Google  Compute   •  Database   –  MySQL   –  BigTable   •  Analy0cs   •  Language:   –  Java  applica0on  for   Hadoop   •  Data  access   –  Apache  Pig   –  Apache  Hive   –  Hadoop       (#)  
  • 6. Example  Stack   •  Compute:   –  AWS  EC2   •  Database   –  MySQL   –  DynamoDB   •  Analy0cs  Batch   –  EMR     •  Analy0cs  Real-­‐0me:   –  AWS  Kinesis   •  Data  warehouse   –  Redshib   (#)