SlideShare a Scribd company logo
1 of 23
Download to read offline
Big Data Analytics 
to the masses 
Why it has failed and how we can fix it 
Jose Luis Lopez Pino @jllopezpino
Who am I? 
BI Consultant 
Large-Scale & Distributed 
Founding 
Data Engineer
Big Data is like Tourism 
But if you aren’t an expert, 
you can’t make the most of it 
It seems easy to do
Struggle to analyze Big Data 
Harlan Harris, Sean Murphy, and Marck Vaisman. Analyzing the Analyzers: An Introspective Survey of Data 
Scientists and Their Work. O’Reilly Media, Inc., 2013 
Also: Sean Kandel, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. Enterprise data analysis and 
visualization: An interview study. Visualization and Computer Graphics, IEEE Transactions
Tools 
Volker Markl. Breaking the chains: On declarative data analysis and data independence in the big data era. 
Proceedings of the VLDB Endowment, 7(13), 2014
Tools (Now) 
Original: Volker Markl. Breaking the chains: On declarative data analysis and data independence in the 
big data era. Proceedings of the VLDB Endowment, 7(13), 2014
Deep analytics
We need libraries... 
Libraries! 
Query languages 
Write your own 
MR/RDD/Transformations
… comprehensive ones!
Say it with memes! 
When you do 
Deep analytics in small data 
using R and CRAN packages 
When you do 
deep analytics in BIG data 
using R and CRAN packages
When you try to program it 
using MapReduce 
When you try to program it 
using Apache Spark / 
Apache Flink 
When you try to use a library 
scalable to large data sets
Can’t we do it better? 
- Make it similar to normal R 
programs. 
- Hide complexity. 
- Make file manipulation easier. 
- Part of the computing in the 
cluster and part of the 
computer in the client.
Our approach
Our approach
Behind the scenes: Before
Behind the scenes: After
Without writing significantly different code
Competitive or even faster than R native code in small data
Competitive even in highly iterative programs in small data
And it scales
Some relevant findings 
- Transmission time was not significant. 
- Stratosphere/Flink was competitive even in 
small datasets. 
- Changes in the code were required. 
- Ensemble scenarios are the most exciting 
ones.
4 Takeaways from this talk 
- We still need to bring Big Data to the right 
people in the right place. 
- We need comprehensive libraries. 
- We need to move data back and forth. 
- Use a syntax that the users are familiar with.
That’s all! 
- Have you found this talk interesting? 
- Follow me: @jllopezpino 
- Looking for a job? (SEM Data Analyst, 
Senior Analyst) 
- GYG is hiring: 
- Are you interested in Data + Energy? 
- Keep in touch:

More Related Content

What's hot

Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Simplilearn
 

What's hot (20)

Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
 
Big data
Big dataBig data
Big data
 
Democratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryDemocratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data Discovery
 
Data Discoverability at SpotHero
Data Discoverability at SpotHeroData Discoverability at SpotHero
Data Discoverability at SpotHero
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 
Intro to Python for Data Science
Intro to Python for Data ScienceIntro to Python for Data Science
Intro to Python for Data Science
 
Big data and data science
Big data and data scienceBig data and data science
Big data and data science
 
How the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeedHow the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeed
 
Data Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch SeminarData Curation @ SpazioDati - NEXA Lunch Seminar
Data Curation @ SpazioDati - NEXA Lunch Seminar
 
Paper presentation
Paper presentationPaper presentation
Paper presentation
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)Introduction to data science intro,ch(1,2,3)
Introduction to data science intro,ch(1,2,3)
 
DataHub
DataHubDataHub
DataHub
 
Top 6 Information Management and Data podcasts
Top 6 Information Management and Data podcastsTop 6 Information Management and Data podcasts
Top 6 Information Management and Data podcasts
 
20181108 abecon klantendag - vernieuwing - breinwave - peter de haas - incl...
20181108   abecon klantendag - vernieuwing - breinwave - peter de haas - incl...20181108   abecon klantendag - vernieuwing - breinwave - peter de haas - incl...
20181108 abecon klantendag - vernieuwing - breinwave - peter de haas - incl...
 
Data Science Popup Austin: Data Meet Product
Data Science Popup Austin: Data Meet Product Data Science Popup Austin: Data Meet Product
Data Science Popup Austin: Data Meet Product
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Text analytics for Google Spreadsheets using Text Mining add-on
Text analytics for Google Spreadsheets using Text Mining add-on Text analytics for Google Spreadsheets using Text Mining add-on
Text analytics for Google Spreadsheets using Text Mining add-on
 
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...
 

Similar to BDS14 Big Data Analytics to the masses

Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation
Sally Sadosky
 

Similar to BDS14 Big Data Analytics to the masses (20)

Business in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationBusiness in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for Integration
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Data analytics course 3
Data analytics course 3Data analytics course 3
Data analytics course 3
 
Big data and you
Big data and you Big data and you
Big data and you
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
Is Hadoop a Necessity for Data Science
Is Hadoop a Necessity for Data ScienceIs Hadoop a Necessity for Data Science
Is Hadoop a Necessity for Data Science
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix
OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation MatrixOWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix
OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
 
Joe C
Joe CJoe C
Joe C
 
Weaving a Web of Linked Data - September 26th, 2019
Weaving a Web of Linked Data - September 26th, 2019Weaving a Web of Linked Data - September 26th, 2019
Weaving a Web of Linked Data - September 26th, 2019
 
(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf
 
(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf(R17A0528) BIG DATA ANALYTICS.pdf
(R17A0528) BIG DATA ANALYTICS.pdf
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
 
Big Data in Action – Real-World Solution Showcase
 Big Data in Action – Real-World Solution Showcase Big Data in Action – Real-World Solution Showcase
Big Data in Action – Real-World Solution Showcase
 

More from Jose Luis Lopez Pino

Firefox Vs. Chromium: Guerra de los navegadores libres
Firefox Vs. Chromium: Guerra de los navegadores libresFirefox Vs. Chromium: Guerra de los navegadores libres
Firefox Vs. Chromium: Guerra de los navegadores libres
Jose Luis Lopez Pino
 
Presentacion Proyecto Fin De Carrera
Presentacion Proyecto Fin De CarreraPresentacion Proyecto Fin De Carrera
Presentacion Proyecto Fin De Carrera
Jose Luis Lopez Pino
 

More from Jose Luis Lopez Pino (20)

Lessons learnt from applying PyData to GetYourGuide marketing
Lessons learnt from applying PyData to GetYourGuide marketingLessons learnt from applying PyData to GetYourGuide marketing
Lessons learnt from applying PyData to GetYourGuide marketing
 
Massive scale analytics with Stratosphere using R
Massive scale analytics with Stratosphere using RMassive scale analytics with Stratosphere using R
Massive scale analytics with Stratosphere using R
 
Metadata in Business Intelligence
Metadata in Business IntelligenceMetadata in Business Intelligence
Metadata in Business Intelligence
 
Scheduling and sharing resources in Data Clusters
Scheduling and sharing resources in Data ClustersScheduling and sharing resources in Data Clusters
Scheduling and sharing resources in Data Clusters
 
Distributed streaming k means
Distributed streaming k meansDistributed streaming k means
Distributed streaming k means
 
High level languages for Big Data Analytics (Report)
High level languages for Big Data Analytics (Report)High level languages for Big Data Analytics (Report)
High level languages for Big Data Analytics (Report)
 
High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)High-level languages for Big Data Analytics (Presentation)
High-level languages for Big Data Analytics (Presentation)
 
RDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use itRDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use it
 
RDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use itRDFa: introduction, comparison with microdata and microformats and how to use it
RDFa: introduction, comparison with microdata and microformats and how to use it
 
Firefox Vs. Chromium: Guerra de los navegadores libres
Firefox Vs. Chromium: Guerra de los navegadores libresFirefox Vs. Chromium: Guerra de los navegadores libres
Firefox Vs. Chromium: Guerra de los navegadores libres
 
Esteganografia
EsteganografiaEsteganografia
Esteganografia
 
Presentacion Proyecto Fin De Carrera
Presentacion Proyecto Fin De CarreraPresentacion Proyecto Fin De Carrera
Presentacion Proyecto Fin De Carrera
 
Memoria Proyecto Fin de Carrera
Memoria Proyecto Fin de CarreraMemoria Proyecto Fin de Carrera
Memoria Proyecto Fin de Carrera
 
Presentacion CUSL nacional
Presentacion CUSL nacionalPresentacion CUSL nacional
Presentacion CUSL nacional
 
Resumen del proyecto Visuse
Resumen del proyecto VisuseResumen del proyecto Visuse
Resumen del proyecto Visuse
 
Presentacion cusl granadino
Presentacion cusl granadinoPresentacion cusl granadino
Presentacion cusl granadino
 
Como hacer un módulo para Visuse
Como hacer un módulo para VisuseComo hacer un módulo para Visuse
Como hacer un módulo para Visuse
 
Visuse: resumen del I Hackathon
Visuse: resumen del I HackathonVisuse: resumen del I Hackathon
Visuse: resumen del I Hackathon
 
Presentacion Visuse para el Hachathón
Presentacion Visuse para el HachathónPresentacion Visuse para el Hachathón
Presentacion Visuse para el Hachathón
 
Desarrollar un módulo para Visuse
Desarrollar un módulo para VisuseDesarrollar un módulo para Visuse
Desarrollar un módulo para Visuse
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

BDS14 Big Data Analytics to the masses

  • 1. Big Data Analytics to the masses Why it has failed and how we can fix it Jose Luis Lopez Pino @jllopezpino
  • 2. Who am I? BI Consultant Large-Scale & Distributed Founding Data Engineer
  • 3. Big Data is like Tourism But if you aren’t an expert, you can’t make the most of it It seems easy to do
  • 4. Struggle to analyze Big Data Harlan Harris, Sean Murphy, and Marck Vaisman. Analyzing the Analyzers: An Introspective Survey of Data Scientists and Their Work. O’Reilly Media, Inc., 2013 Also: Sean Kandel, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. Enterprise data analysis and visualization: An interview study. Visualization and Computer Graphics, IEEE Transactions
  • 5. Tools Volker Markl. Breaking the chains: On declarative data analysis and data independence in the big data era. Proceedings of the VLDB Endowment, 7(13), 2014
  • 6. Tools (Now) Original: Volker Markl. Breaking the chains: On declarative data analysis and data independence in the big data era. Proceedings of the VLDB Endowment, 7(13), 2014
  • 8. We need libraries... Libraries! Query languages Write your own MR/RDD/Transformations
  • 10. Say it with memes! When you do Deep analytics in small data using R and CRAN packages When you do deep analytics in BIG data using R and CRAN packages
  • 11. When you try to program it using MapReduce When you try to program it using Apache Spark / Apache Flink When you try to use a library scalable to large data sets
  • 12. Can’t we do it better? - Make it similar to normal R programs. - Hide complexity. - Make file manipulation easier. - Part of the computing in the cluster and part of the computer in the client.
  • 18. Competitive or even faster than R native code in small data
  • 19. Competitive even in highly iterative programs in small data
  • 21. Some relevant findings - Transmission time was not significant. - Stratosphere/Flink was competitive even in small datasets. - Changes in the code were required. - Ensemble scenarios are the most exciting ones.
  • 22. 4 Takeaways from this talk - We still need to bring Big Data to the right people in the right place. - We need comprehensive libraries. - We need to move data back and forth. - Use a syntax that the users are familiar with.
  • 23. That’s all! - Have you found this talk interesting? - Follow me: @jllopezpino - Looking for a job? (SEM Data Analyst, Senior Analyst) - GYG is hiring: - Are you interested in Data + Energy? - Keep in touch: