SlideShare a Scribd company logo
1 of 56
Download to read offline
Roman Nikitchenko, 22.02.2015
SUBJECTIVE
BIG DATA
NO BOOK FOR YOU
ANYMORE
FRAMEWORKS
2
WHAT WE WANT
CHEAPER No bike
reinventions anymore
FASTER time to
marked — part of job
is done
BETTER Quality of
proven approaches
FRAMEWORKS
3
WHAT WE GET
FRAMEWORKS
OFTEN
4
CAN CHIMPS
DO BIG DATA?
Real shocking title book
available for pre-order. This is
exactly what happens now in
Big Data industry.
Roses are red.
Violets are blue.
We do Hadoop
What about YOU?
5
SCALE
BIG DATA IS ABOUT...
GET CHMIPS
OUT OF
DATACENTER
6
BIG DATA
SO HOW TO DO FRAMEWORKING...
WHEN YOU DO
7
YARN
we do Big Data
with Hadoop
8
FRAMEWORK
Is an essential supporting
structure of a building, vehicle, or
object.
In computer programming, a
software framework is an
abstraction in which software
providing generic functionality can
be selectively changed by
additional user-written code, thus
providing application-specific
software.
9
FRAMEWORKS
DICTATE APPROACH
Frameworks are to
lower amount of job by
reusing. The more you
can reuse the better. But complex framework are
too massive to be flexible.
They limit your solutions.
Doing Big Data you
usually build unique
solution.
10
SO DO I NEED UNIQUE
FRAMEWORKS
FOR EVERY BIG DATA PROJECT?
11
x MAX
+
=
BIG
DATA
BIG
DATA
BIG
DATA
HADOOP as INFRASTRUCTURE
12
LOOKS LIKE THIS
13
OPEN SOURCE framework
for big data. Both distributed
storage and processing.
Provides RELIABILITY
and fault tolerance by
SOFTWARE design.
Example — File system
as replication factor 3 as
default one.Horisontal scalability from
single computer up to
thousands of nodes.
INFRASTRUCTURE
3 SIMPLE HADOOP PRINCIPLES
14
HADOOP
INFRASTRUCTURE AS
A FRAMEWORK
● Is formed from large
number of unified nodes.
● Nodes are replaceable.
● Simple hardware without
sophisticated I/O.
● Reliability by software.
● Horizontal scalability.
15
FRAMEWORKS INFRASTRUCTURE
APPROACH COMPLEXITY
LIMITATIONS
OVERHEAD
16
How everyone (who usually
sells something) depicts
Hadoop complexity
GREAT BIG
INFRASTRUCTURE
AROUND
SMALL
CUTE
CORE
YOUR
APPLICATION
SAFE and
FRIENDLY
17
How it looks from the real
user point of view
Feeling of something wrong
CORE
HADOOP
COMPLETELY
UNKNOWN
INFRASTRUCTURE
SOMETHINGYOU
UNDERSTAND
YOUR
APPLICATION
FEAR OF
18
But... imagine we have BIG DATA
bricks. How should they look like?
19
WHAT BRICKS SHOULD WE TAKE
TO BUILD BIG DATA SOLUTION?
● We should build
unique solutions
using the same
approaches.
● So bricks are to
be flexible.
20
WHAT BRICKS SHOULD WE TAKE
TO BUILD BIG DATA SOLUTION?
● We should build
robust solution with
high reliability.
● Bricks are to be
simple and
replacable.
21
WHAT BRICKS SHOULD WE TAKE
TO BUILD BIG DATA SOLUTION?
● We should be able
to change our
solution over the
time.
● Bricks are to be
small.
22
WHAT BRICKS SHOULD WE TAKE
TO BUILD BIG DATA SOLUTION?
● As flexible as it is
possible.
● Focused on specific
aspect without
large infrastructure
required.
● Simple and
interchangable.
23
HADOOP 2.x CORE AS A FRAMEWORK
BASIC BLOCKS
● ZooKeeeper as coordinational service.
● HDFS as file system layer.
● YARN as resource management.
● MapReduce as basic distributed processing option.
24
HADOOP HAS LAYERS
RESOURCE MANAGEMENT
DISTRIBUTED PROCESSING
FILE SYSTEM
COORDINATION
HADOOP
2.x CORE
25
PACKAGING ...
RUBIK's CUBE
STYLE
● Hadoop packaging is non-trivial task.
● It gets more complex when you add
Apache Spark, SOLR or Hbase indexer.
26
Hadoop: don't do it yourself
REUSE AS IS
● BASIC infrastructure is pretty reusable to build
with it. At least unless you know it well.
● Do you have manpower to re-implement it?
You'd beeeter contribute in this case.
27
WHERE TO GO FROM HERE?
28
HERE PEOPLE START TO ADD
EVERY FRAMEWORK THEY
KNOW ABOUT...
29
YARNAT LEAST
WE DO IT
ONE BY ONE
30
WHAT DO WE USUALLY
EXPECT FROM NEW
FRAMEWORK?
BETTER
CHEAPER
FASTER
frameworks provide
higher layer of
abstraction so
coding go faster
some part of
work is
already done
top framework
contributors are
usually top
engineers
31
OOOPS...
BETTER
CHEAPER
FASTER
frameworks provide
higher layer of
abstraction so
coding go faster
some part of
work is
already done
top framework
contributors are
usually top
engineersAdditional cost of
new framework
maintenance
Additional time
of learning new
approach
Lot of
defects due
to lack of
experience
with new
framework
32
BETTER
CHEAPER
FASTER
frameworks provide
higher layer of
abstraction so
coding go faster
some part of
work is
already done
top framework
contributors are
usually top
engineersAdditional cost of
new framework
maintenance
Additional time
of learning new
approach
Lot of
defects due
to lack of
experience
with new
framework
NONEXISTENT
ONLY TWO?
33
JUST FEW EXAMPLES
● Spring batch — main thread who
started spring context forgot to check
task accomplishment status.
● Apache Spark — persistence to disk
was limited to 2GB due to ByteBuffer
int limitation.
● Apaceh Hbase has by now no effective
guard against client RPC timeout.
● What about binary data like hashes? No
effective out-of-the-box support by now.
ONLY
REAL
EXPERIENCE
NEW FRAMEWORKS ARE
ALWAYS HEADACHE
34
%^#@#^&@#&#%@ !!!
35
JUST LONGER
PERSPECTIVE?
When you use the same
approach for a long time
you do it more and more
effective.
36
JAVA MESSAGE
SERVICE
APACHE
SPARK
1.0.2b (June 25, 2001)
1.1 (April 12, 2002)
2.0 (May 21, 2013)
0.9.0 (Feb 2, 2014)
1.0 (May 30, 2014)
1.1 (Sep 11, 2014)
1.2 (Dec 18, 2014)
JUST FEEL SPEED DIFFERENCE
BUT
37
FULL DATA PROCESSING
PLATFORM SUPPORTING YARN
38
SO BIG DATA TECHNOLOGY
BOOKS ARE ALWAYS OUTDATED
Great books but when they are printed they are
already old. Read original E-books with updates.
39
DO NOT HIDE YOUR
EXPERIENCE
40
FRAMEWORKS IN BIG DATA
HAMSTERS vs HIPSTERS
We hate
frameworks! Only
hardcore, only JDK!
Give me
framework for
every step!
41
FRAMEWORKS IN BIG DATA
HAMSTERS vs HIPSTERS
Significant overhead even
comparing to MapReduce
access
Most simple way to access
your Hbase data for
analytics.
Apache Hbase is top OLTP solution for Hadoop.
Hive can provide SQL connector to it.
Hbase direct RPC for OLTP, MapReduce or Spark when you need
performance and Hive when you need faster implementation.
Crazy idea: Hive running over Hbase table snapshots.
42
FAST FEATURE
DEVELOPMENT
ACTIVE
COMMUNITY
STABLE REUSABLE
ARCHITECTURE
OUR BIG DATA
FRAMEWORKS
CRITERIA
43
ETL: FRAMEWORKS COST
● We do object transformations when we do ETL
from SQL to NoSQL objects.
● Practically any ORM framework eats at least 10%
of CPU resource.
● Is it small or big amount? Depends who pays...
SQL
server
JOIN
Table1
Table2
Table3
Table4 BIG DATA shard
BIG DATA shard
BIG DATA shardETL stream
ETL stream
ETL stream
ETL stream
44
10% overhead...
● Single desktop
application -
computers usually
have unused CPU
power. 10% overhead
is not so notable for
user so user accepts it.
● User pays for
electricity and
hardware.
45
● Lot of mobile
clients. Can tolerate
10% performance
degradation.
Application still
works.
● All users pay for
your 10%
performance
overhead.
10% overhead...
46
● Single server solution.
OK, usually you have
10% spare.
● So you pay for overhead
but you don't notice it
before it is needed. You
have the same 1 server.
10% overhead...
47
● 10% overhead of
1000 servers with
properly distributed
job means up to 100
servers additionaly
needed.
● This is your direct
maintenance costs.
10% overhead...
IN CLUSTERS YOU DIRECTLY PAY
FOR OVERHEAD WITH ADDITIONAL
CLUSTER NODES.
48
WHAT
FRAMEWORK
IS REALLY
GOOD FOR
YOU?
● If you know amount (and
cost) of job to replace
framework, this is really
good for you.
49
MAKING YOUR OWN
FRAMEWORK
● Most common reason for your
own framework is … growing
complexity and support cost.
● New framework development
and migration can be cheeper
than support of existing
solutions.
● You don't want to depend on
existing framework development.
50
MAKING FRAMEWORK
LAZY STYLE
● First do multiple
solutions than
integrate them into
single approach.
● GOOD You only
integrate what is
already used so less
unused work.
● BAD Your act reactive.
51
MAKING FRAMEWORK
PROACTIVE STYLE
● You improve framework
before actual need.
● GOOD You are guided
by approach, not need,
so usually you have
more clear design.
● BAD Your have more
probability to do not
needed things.
52
OUTSIDE YOUR TEAM
● Great, you have additional
workforce. But from now you
have external support tickets.
● Usually you can control your
users so major changes are
yet possible but harder.
● Pay more attention to
documentation and trainings
for other teams. It pays back.
53
OUTSIDE YOUR COMPANY
● You receive additional
workforce. People start
contributing into your
framwork. Don't be so
optimistic.
● Community support is good
but you need to support
community applications.
● You are no longer flexible. You
don't control users of your
framework.
54
LESSONS LEARNED
CORE
● Avoid inventing unique approach
for every Big Data solution. It is
critical to have good relatively
stable ground.
● Your Big Data CORE architecture
is to be layered infrastructure
constructed from small, simple,
unified, replaceable components
(UNIX way).
● Be ready for packaging issues
but try to reuse as maximum as
possible on CORE layer.
55
LESSONS LEARNED
● Selecting frameworks to extend your big
data core prefer solutions with stable
approach, flexible functionality and
healthy community. Revise your
approaches as world changes fast.
● Prefer to contribute to good existing
solution rather than start your own.
● The more frequent you change
something, the more higher layer tool
you need for this. But in big data you
directly pay for any performance
overhead.
● If you have started your own framework,
the more popular it is, the fewer freedom
to modify you have so the only flexibility
is bad reason to start.
BEYOND
THE
CORE
56
Questions and discussion

More Related Content

What's hot

Using AWS, Eucalyptus and Chef for the Optimal Hybrid Cloud
Using AWS, Eucalyptus and Chef for the Optimal Hybrid CloudUsing AWS, Eucalyptus and Chef for the Optimal Hybrid Cloud
Using AWS, Eucalyptus and Chef for the Optimal Hybrid Clouddboze
 
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!ScaleBase
 
Continuous Database Delivery - 7/12/2018
Continuous Database Delivery - 7/12/2018Continuous Database Delivery - 7/12/2018
Continuous Database Delivery - 7/12/2018David P. Moore
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance ComputingDell World
 
Accelerating Devops via Data Virtualization | Delphix
Accelerating Devops via Data Virtualization | DelphixAccelerating Devops via Data Virtualization | Delphix
Accelerating Devops via Data Virtualization | DelphixDelphixCorp
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Amy W. Tang
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQLKonstantin Gredeskoul
 
J2EE Performance And Scalability Bp
J2EE Performance And Scalability BpJ2EE Performance And Scalability Bp
J2EE Performance And Scalability BpChris Adkin
 
Delphix for DBAs by Jonathan Lewis
Delphix for DBAs by Jonathan LewisDelphix for DBAs by Jonathan Lewis
Delphix for DBAs by Jonathan LewisKyle Hailey
 
Wrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with HadoopWrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with HadoopDataWorks Summit
 
Hadoop's Problem and How to Fix it
Hadoop's Problem and How to Fix itHadoop's Problem and How to Fix it
Hadoop's Problem and How to Fix itKognitio
 
Container Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupContainer Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupMayaData Inc
 
SQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setSQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setKognitio
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLDataStax
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
Give Your Organization Better, Faster Insights & Answers with High Performanc...
Give Your Organization Better, Faster Insights & Answers with High Performanc...Give Your Organization Better, Faster Insights & Answers with High Performanc...
Give Your Organization Better, Faster Insights & Answers with High Performanc...Dell World
 

What's hot (20)

Using AWS, Eucalyptus and Chef for the Optimal Hybrid Cloud
Using AWS, Eucalyptus and Chef for the Optimal Hybrid CloudUsing AWS, Eucalyptus and Chef for the Optimal Hybrid Cloud
Using AWS, Eucalyptus and Chef for the Optimal Hybrid Cloud
 
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
 
Continuous Database Delivery - 7/12/2018
Continuous Database Delivery - 7/12/2018Continuous Database Delivery - 7/12/2018
Continuous Database Delivery - 7/12/2018
 
Value Based Testing
Value Based Testing Value Based Testing
Value Based Testing
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
 
No sql3 rmoug
No sql3 rmougNo sql3 rmoug
No sql3 rmoug
 
Accelerating Devops via Data Virtualization | Delphix
Accelerating Devops via Data Virtualization | DelphixAccelerating Devops via Data Virtualization | Delphix
Accelerating Devops via Data Virtualization | Delphix
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
 
12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQL
 
J2EE Performance And Scalability Bp
J2EE Performance And Scalability BpJ2EE Performance And Scalability Bp
J2EE Performance And Scalability Bp
 
Delphix for DBAs by Jonathan Lewis
Delphix for DBAs by Jonathan LewisDelphix for DBAs by Jonathan Lewis
Delphix for DBAs by Jonathan Lewis
 
Wrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with HadoopWrangling Customer Usage Data with Hadoop
Wrangling Customer Usage Data with Hadoop
 
Delphix
DelphixDelphix
Delphix
 
Hadoop's Problem and How to Fix it
Hadoop's Problem and How to Fix itHadoop's Problem and How to Fix it
Hadoop's Problem and How to Fix it
 
Container Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupContainer Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris Meetup
 
SQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setSQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query set
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Give Your Organization Better, Faster Insights & Answers with High Performanc...
Give Your Organization Better, Faster Insights & Answers with High Performanc...Give Your Organization Better, Faster Insights & Answers with High Performanc...
Give Your Organization Better, Faster Insights & Answers with High Performanc...
 

Similar to Big data & frameworks: no book for you anymore

Elephant grooming: quality with Hadoop
Elephant grooming: quality with HadoopElephant grooming: quality with Hadoop
Elephant grooming: quality with HadoopRoman Nikitchenko
 
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.GeeksLab Odessa
 
Polyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherPolyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherJohn Wood
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantSwiss Data Forum Swiss Data Forum
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureRoman Nikitchenko
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackAnant Corporation
 
The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark
The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and SparkThe Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark
The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and SparkAkshay Rai
 
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São PauloMini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São PauloOCTO Technology
 
EDB Postgres with Containers
EDB Postgres with ContainersEDB Postgres with Containers
EDB Postgres with ContainersEDB
 
Idi2017 - Cloud DB: strengths and weaknesses
Idi2017 - Cloud DB: strengths and weaknessesIdi2017 - Cloud DB: strengths and weaknesses
Idi2017 - Cloud DB: strengths and weaknessesLinuxaria.com
 
Ceph: A decade in the making and still going strong
Ceph: A decade in the making and still going strongCeph: A decade in the making and still going strong
Ceph: A decade in the making and still going strongPatrick McGarry
 
Rakuten Ichiba_Rakuten Technology Conference 2016
Rakuten Ichiba_Rakuten Technology Conference 2016Rakuten Ichiba_Rakuten Technology Conference 2016
Rakuten Ichiba_Rakuten Technology Conference 2016Rakuten Group, Inc.
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionBrian Enochson
 
Google Cloud - Stand Out Features
Google Cloud - Stand Out FeaturesGoogle Cloud - Stand Out Features
Google Cloud - Stand Out FeaturesGDG Cloud Bengaluru
 
Getting more into GCP.pdf
Getting more into GCP.pdfGetting more into GCP.pdf
Getting more into GCP.pdfKnoldus Inc.
 

Similar to Big data & frameworks: no book for you anymore (20)

Elephant grooming: quality with Hadoop
Elephant grooming: quality with HadoopElephant grooming: quality with Hadoop
Elephant grooming: quality with Hadoop
 
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
Java/Scala Lab: Роман Никитченко - Big Data - Big Pitfalls.
 
Big Data - Big Pitfalls.
Big Data - Big Pitfalls.Big Data - Big Pitfalls.
Big Data - Big Pitfalls.
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Polyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherPolyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great Together
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenant
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark
The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and SparkThe Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark
The Fifth Elephant 2016: Self-Serve Performance Tuning for Hadoop and Spark
 
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São PauloMini-course "Practices of the Web Giants" at Global Code - São Paulo
Mini-course "Practices of the Web Giants" at Global Code - São Paulo
 
EDB Postgres with Containers
EDB Postgres with ContainersEDB Postgres with Containers
EDB Postgres with Containers
 
Idi2017 - Cloud DB: strengths and weaknesses
Idi2017 - Cloud DB: strengths and weaknessesIdi2017 - Cloud DB: strengths and weaknesses
Idi2017 - Cloud DB: strengths and weaknesses
 
Ceph: A decade in the making and still going strong
Ceph: A decade in the making and still going strongCeph: A decade in the making and still going strong
Ceph: A decade in the making and still going strong
 
Rakuten Ichiba_Rakuten Technology Conference 2016
Rakuten Ichiba_Rakuten Technology Conference 2016Rakuten Ichiba_Rakuten Technology Conference 2016
Rakuten Ichiba_Rakuten Technology Conference 2016
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
 
Google Cloud - Stand Out Features
Google Cloud - Stand Out FeaturesGoogle Cloud - Stand Out Features
Google Cloud - Stand Out Features
 
Getting more into GCP.pdf
Getting more into GCP.pdfGetting more into GCP.pdf
Getting more into GCP.pdf
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 

More from Stfalcon Meetups

Conversion centered design 3
Conversion centered design 3Conversion centered design 3
Conversion centered design 3Stfalcon Meetups
 
Stfalcon QA Meetup 31.01.2020
Stfalcon QA Meetup 31.01.2020Stfalcon QA Meetup 31.01.2020
Stfalcon QA Meetup 31.01.2020Stfalcon Meetups
 
Stfalcon QA Meetup 31.01.2020
Stfalcon QA Meetup 31.01.2020Stfalcon QA Meetup 31.01.2020
Stfalcon QA Meetup 31.01.2020Stfalcon Meetups
 
Design of the_future_30_05_2019
Design of the_future_30_05_2019Design of the_future_30_05_2019
Design of the_future_30_05_2019Stfalcon Meetups
 
Global sales - a few insights
Global sales - a few insightsGlobal sales - a few insights
Global sales - a few insightsStfalcon Meetups
 
How to build your own startup
How to build your own startupHow to build your own startup
How to build your own startupStfalcon Meetups
 
Первая и последняя встреча с клиентом
Первая и последняя встреча с клиентом Первая и последняя встреча с клиентом
Первая и последняя встреча с клиентом Stfalcon Meetups
 
Парнерство нидерланды
Парнерство нидерландыПарнерство нидерланды
Парнерство нидерландыStfalcon Meetups
 
Риси гарного менеджера
Риси гарного менеджераРиси гарного менеджера
Риси гарного менеджераStfalcon Meetups
 
Между заказчиком и разработчиком
Между заказчиком и разработчикомМежду заказчиком и разработчиком
Между заказчиком и разработчикомStfalcon Meetups
 
майстер-клас “Управління ризиками”
майстер-клас “Управління ризиками”майстер-клас “Управління ризиками”
майстер-клас “Управління ризиками”Stfalcon Meetups
 
Kubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CDKubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CDStfalcon Meetups
 

More from Stfalcon Meetups (20)

Conversion centered design 3
Conversion centered design 3Conversion centered design 3
Conversion centered design 3
 
Discovery phase
Discovery phaseDiscovery phase
Discovery phase
 
Stfalcon QA Meetup 31.01.2020
Stfalcon QA Meetup 31.01.2020Stfalcon QA Meetup 31.01.2020
Stfalcon QA Meetup 31.01.2020
 
Stfalcon QA Meetup 31.01.2020
Stfalcon QA Meetup 31.01.2020Stfalcon QA Meetup 31.01.2020
Stfalcon QA Meetup 31.01.2020
 
Stfalcon PM Meetup 21.11
Stfalcon PM Meetup 21.11Stfalcon PM Meetup 21.11
Stfalcon PM Meetup 21.11
 
Stfalcon PM Meetup 21.11
Stfalcon PM Meetup 21.11Stfalcon PM Meetup 21.11
Stfalcon PM Meetup 21.11
 
Design of the_future_30_05_2019
Design of the_future_30_05_2019Design of the_future_30_05_2019
Design of the_future_30_05_2019
 
2 5404811386729530203
2 54048113867295302032 5404811386729530203
2 5404811386729530203
 
Team evolution
Team evolutionTeam evolution
Team evolution
 
Mobile&Privacy
Mobile&PrivacyMobile&Privacy
Mobile&Privacy
 
Global sales - a few insights
Global sales - a few insightsGlobal sales - a few insights
Global sales - a few insights
 
How to build your own startup
How to build your own startupHow to build your own startup
How to build your own startup
 
Первая и последняя встреча с клиентом
Первая и последняя встреча с клиентом Первая и последняя встреча с клиентом
Первая и последняя встреча с клиентом
 
Парнерство нидерланды
Парнерство нидерландыПарнерство нидерланды
Парнерство нидерланды
 
Риси гарного менеджера
Риси гарного менеджераРиси гарного менеджера
Риси гарного менеджера
 
Между заказчиком и разработчиком
Между заказчиком и разработчикомМежду заказчиком и разработчиком
Между заказчиком и разработчиком
 
Cv vs resume
Cv vs resumeCv vs resume
Cv vs resume
 
Vue.js
Vue.jsVue.js
Vue.js
 
майстер-клас “Управління ризиками”
майстер-клас “Управління ризиками”майстер-клас “Управління ризиками”
майстер-клас “Управління ризиками”
 
Kubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CDKubernetes: від знайомства до використання у CI/CD
Kubernetes: від знайомства до використання у CI/CD
 

Recently uploaded

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...software pro Development
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 

Recently uploaded (20)

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 

Big data & frameworks: no book for you anymore

  • 1. Roman Nikitchenko, 22.02.2015 SUBJECTIVE BIG DATA NO BOOK FOR YOU ANYMORE FRAMEWORKS
  • 2. 2 WHAT WE WANT CHEAPER No bike reinventions anymore FASTER time to marked — part of job is done BETTER Quality of proven approaches FRAMEWORKS
  • 4. 4 CAN CHIMPS DO BIG DATA? Real shocking title book available for pre-order. This is exactly what happens now in Big Data industry. Roses are red. Violets are blue. We do Hadoop What about YOU?
  • 5. 5 SCALE BIG DATA IS ABOUT... GET CHMIPS OUT OF DATACENTER
  • 6. 6 BIG DATA SO HOW TO DO FRAMEWORKING... WHEN YOU DO
  • 7. 7 YARN we do Big Data with Hadoop
  • 8. 8 FRAMEWORK Is an essential supporting structure of a building, vehicle, or object. In computer programming, a software framework is an abstraction in which software providing generic functionality can be selectively changed by additional user-written code, thus providing application-specific software.
  • 9. 9 FRAMEWORKS DICTATE APPROACH Frameworks are to lower amount of job by reusing. The more you can reuse the better. But complex framework are too massive to be flexible. They limit your solutions. Doing Big Data you usually build unique solution.
  • 10. 10 SO DO I NEED UNIQUE FRAMEWORKS FOR EVERY BIG DATA PROJECT?
  • 13. 13 OPEN SOURCE framework for big data. Both distributed storage and processing. Provides RELIABILITY and fault tolerance by SOFTWARE design. Example — File system as replication factor 3 as default one.Horisontal scalability from single computer up to thousands of nodes. INFRASTRUCTURE 3 SIMPLE HADOOP PRINCIPLES
  • 14. 14 HADOOP INFRASTRUCTURE AS A FRAMEWORK ● Is formed from large number of unified nodes. ● Nodes are replaceable. ● Simple hardware without sophisticated I/O. ● Reliability by software. ● Horizontal scalability.
  • 16. 16 How everyone (who usually sells something) depicts Hadoop complexity GREAT BIG INFRASTRUCTURE AROUND SMALL CUTE CORE YOUR APPLICATION SAFE and FRIENDLY
  • 17. 17 How it looks from the real user point of view Feeling of something wrong CORE HADOOP COMPLETELY UNKNOWN INFRASTRUCTURE SOMETHINGYOU UNDERSTAND YOUR APPLICATION FEAR OF
  • 18. 18 But... imagine we have BIG DATA bricks. How should they look like?
  • 19. 19 WHAT BRICKS SHOULD WE TAKE TO BUILD BIG DATA SOLUTION? ● We should build unique solutions using the same approaches. ● So bricks are to be flexible.
  • 20. 20 WHAT BRICKS SHOULD WE TAKE TO BUILD BIG DATA SOLUTION? ● We should build robust solution with high reliability. ● Bricks are to be simple and replacable.
  • 21. 21 WHAT BRICKS SHOULD WE TAKE TO BUILD BIG DATA SOLUTION? ● We should be able to change our solution over the time. ● Bricks are to be small.
  • 22. 22 WHAT BRICKS SHOULD WE TAKE TO BUILD BIG DATA SOLUTION? ● As flexible as it is possible. ● Focused on specific aspect without large infrastructure required. ● Simple and interchangable.
  • 23. 23 HADOOP 2.x CORE AS A FRAMEWORK BASIC BLOCKS ● ZooKeeeper as coordinational service. ● HDFS as file system layer. ● YARN as resource management. ● MapReduce as basic distributed processing option.
  • 24. 24 HADOOP HAS LAYERS RESOURCE MANAGEMENT DISTRIBUTED PROCESSING FILE SYSTEM COORDINATION HADOOP 2.x CORE
  • 25. 25 PACKAGING ... RUBIK's CUBE STYLE ● Hadoop packaging is non-trivial task. ● It gets more complex when you add Apache Spark, SOLR or Hbase indexer.
  • 26. 26 Hadoop: don't do it yourself REUSE AS IS ● BASIC infrastructure is pretty reusable to build with it. At least unless you know it well. ● Do you have manpower to re-implement it? You'd beeeter contribute in this case.
  • 27. 27 WHERE TO GO FROM HERE?
  • 28. 28 HERE PEOPLE START TO ADD EVERY FRAMEWORK THEY KNOW ABOUT...
  • 29. 29 YARNAT LEAST WE DO IT ONE BY ONE
  • 30. 30 WHAT DO WE USUALLY EXPECT FROM NEW FRAMEWORK? BETTER CHEAPER FASTER frameworks provide higher layer of abstraction so coding go faster some part of work is already done top framework contributors are usually top engineers
  • 31. 31 OOOPS... BETTER CHEAPER FASTER frameworks provide higher layer of abstraction so coding go faster some part of work is already done top framework contributors are usually top engineersAdditional cost of new framework maintenance Additional time of learning new approach Lot of defects due to lack of experience with new framework
  • 32. 32 BETTER CHEAPER FASTER frameworks provide higher layer of abstraction so coding go faster some part of work is already done top framework contributors are usually top engineersAdditional cost of new framework maintenance Additional time of learning new approach Lot of defects due to lack of experience with new framework NONEXISTENT ONLY TWO?
  • 33. 33 JUST FEW EXAMPLES ● Spring batch — main thread who started spring context forgot to check task accomplishment status. ● Apache Spark — persistence to disk was limited to 2GB due to ByteBuffer int limitation. ● Apaceh Hbase has by now no effective guard against client RPC timeout. ● What about binary data like hashes? No effective out-of-the-box support by now. ONLY REAL EXPERIENCE NEW FRAMEWORKS ARE ALWAYS HEADACHE
  • 35. 35 JUST LONGER PERSPECTIVE? When you use the same approach for a long time you do it more and more effective.
  • 36. 36 JAVA MESSAGE SERVICE APACHE SPARK 1.0.2b (June 25, 2001) 1.1 (April 12, 2002) 2.0 (May 21, 2013) 0.9.0 (Feb 2, 2014) 1.0 (May 30, 2014) 1.1 (Sep 11, 2014) 1.2 (Dec 18, 2014) JUST FEEL SPEED DIFFERENCE BUT
  • 38. 38 SO BIG DATA TECHNOLOGY BOOKS ARE ALWAYS OUTDATED Great books but when they are printed they are already old. Read original E-books with updates.
  • 39. 39 DO NOT HIDE YOUR EXPERIENCE
  • 40. 40 FRAMEWORKS IN BIG DATA HAMSTERS vs HIPSTERS We hate frameworks! Only hardcore, only JDK! Give me framework for every step!
  • 41. 41 FRAMEWORKS IN BIG DATA HAMSTERS vs HIPSTERS Significant overhead even comparing to MapReduce access Most simple way to access your Hbase data for analytics. Apache Hbase is top OLTP solution for Hadoop. Hive can provide SQL connector to it. Hbase direct RPC for OLTP, MapReduce or Spark when you need performance and Hive when you need faster implementation. Crazy idea: Hive running over Hbase table snapshots.
  • 43. 43 ETL: FRAMEWORKS COST ● We do object transformations when we do ETL from SQL to NoSQL objects. ● Practically any ORM framework eats at least 10% of CPU resource. ● Is it small or big amount? Depends who pays... SQL server JOIN Table1 Table2 Table3 Table4 BIG DATA shard BIG DATA shard BIG DATA shardETL stream ETL stream ETL stream ETL stream
  • 44. 44 10% overhead... ● Single desktop application - computers usually have unused CPU power. 10% overhead is not so notable for user so user accepts it. ● User pays for electricity and hardware.
  • 45. 45 ● Lot of mobile clients. Can tolerate 10% performance degradation. Application still works. ● All users pay for your 10% performance overhead. 10% overhead...
  • 46. 46 ● Single server solution. OK, usually you have 10% spare. ● So you pay for overhead but you don't notice it before it is needed. You have the same 1 server. 10% overhead...
  • 47. 47 ● 10% overhead of 1000 servers with properly distributed job means up to 100 servers additionaly needed. ● This is your direct maintenance costs. 10% overhead... IN CLUSTERS YOU DIRECTLY PAY FOR OVERHEAD WITH ADDITIONAL CLUSTER NODES.
  • 48. 48 WHAT FRAMEWORK IS REALLY GOOD FOR YOU? ● If you know amount (and cost) of job to replace framework, this is really good for you.
  • 49. 49 MAKING YOUR OWN FRAMEWORK ● Most common reason for your own framework is … growing complexity and support cost. ● New framework development and migration can be cheeper than support of existing solutions. ● You don't want to depend on existing framework development.
  • 50. 50 MAKING FRAMEWORK LAZY STYLE ● First do multiple solutions than integrate them into single approach. ● GOOD You only integrate what is already used so less unused work. ● BAD Your act reactive.
  • 51. 51 MAKING FRAMEWORK PROACTIVE STYLE ● You improve framework before actual need. ● GOOD You are guided by approach, not need, so usually you have more clear design. ● BAD Your have more probability to do not needed things.
  • 52. 52 OUTSIDE YOUR TEAM ● Great, you have additional workforce. But from now you have external support tickets. ● Usually you can control your users so major changes are yet possible but harder. ● Pay more attention to documentation and trainings for other teams. It pays back.
  • 53. 53 OUTSIDE YOUR COMPANY ● You receive additional workforce. People start contributing into your framwork. Don't be so optimistic. ● Community support is good but you need to support community applications. ● You are no longer flexible. You don't control users of your framework.
  • 54. 54 LESSONS LEARNED CORE ● Avoid inventing unique approach for every Big Data solution. It is critical to have good relatively stable ground. ● Your Big Data CORE architecture is to be layered infrastructure constructed from small, simple, unified, replaceable components (UNIX way). ● Be ready for packaging issues but try to reuse as maximum as possible on CORE layer.
  • 55. 55 LESSONS LEARNED ● Selecting frameworks to extend your big data core prefer solutions with stable approach, flexible functionality and healthy community. Revise your approaches as world changes fast. ● Prefer to contribute to good existing solution rather than start your own. ● The more frequent you change something, the more higher layer tool you need for this. But in big data you directly pay for any performance overhead. ● If you have started your own framework, the more popular it is, the fewer freedom to modify you have so the only flexibility is bad reason to start. BEYOND THE CORE