BDS14 Big Data Analytics to the masses

Big Data Analytics
to the masses
Why it has failed and how we can fix it
Jose Luis Lopez Pino @jllopezpino

Who am I?
BI Consultant
Large-Scale & Distributed
Founding
Data Engineer

Big Data is like Tourism
But if you aren’t an expert,
you can’t make the most of it
It seems easy to do

Struggle to analyze Big Data
Harlan Harris, Sean Murphy, and Marck Vaisman. Analyzing the Analyzers: An Introspective Survey of Data
Scientists and Their Work. O’Reilly Media, Inc., 2013
Also: Sean Kandel, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. Enterprise data analysis and
visualization: An interview study. Visualization and Computer Graphics, IEEE Transactions

Tools
Volker Markl. Breaking the chains: On declarative data analysis and data independence in the big data era.
Proceedings of the VLDB Endowment, 7(13), 2014

Tools (Now)
Original: Volker Markl. Breaking the chains: On declarative data analysis and data independence in the
big data era. Proceedings of the VLDB Endowment, 7(13), 2014

We need libraries...
Libraries!
Query languages
Write your own
MR/RDD/Transformations

Say it with memes!
When you do
Deep analytics in small data
using R and CRAN packages
When you do
deep analytics in BIG data
using R and CRAN packages

When you try to program it
using MapReduce
When you try to program it
using Apache Spark /
Apache Flink
When you try to use a library
scalable to large data sets

Can’t we do it better?
- Make it similar to normal R
programs.
- Hide complexity.
- Make file manipulation easier.
- Part of the computing in the
cluster and part of the
computer in the client.

Without writing significantly different code

Competitive or even faster than R native code in small data

Competitive even in highly iterative programs in small data

Some relevant findings
- Transmission time was not significant.
- Stratosphere/Flink was competitive even in
small datasets.
- Changes in the code were required.
- Ensemble scenarios are the most exciting
ones.

4 Takeaways from this talk
- We still need to bring Big Data to the right
people in the right place.
- We need comprehensive libraries.
- We need to move data back and forth.
- Use a syntax that the users are familiar with.

That’s all!
- Have you found this talk interesting?
- Follow me: @jllopezpino
- Looking for a job? (SEM Data Analyst,
Senior Analyst)
- GYG is hiring:
- Are you interested in Data + Energy?
- Keep in touch:

BDS14 Big Data Analytics to the masses

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to BDS14 Big Data Analytics to the masses

Similar to BDS14 Big Data Analytics to the masses (20)

More from Jose Luis Lopez Pino

More from Jose Luis Lopez Pino (20)

Recently uploaded

Recently uploaded (20)

BDS14 Big Data Analytics to the masses