SlideShare ist ein Scribd-Unternehmen logo
1 von 37
THE FUTURE OF
HADOOP: CHOOSING
THE RIGHT OPTIONS

Subash D’Souza
Hadoop Innovation Summit
2014
WHO AM I?
 Recognized as a Champion of Big Data by Cloudera
 Co-Organizer - Los Angeles Hadoop User Group
 Organizer - Los Angeles HBase User Group
 Organizer – Los Angeles Big Data Users Group
 Organizer - Big Data Camp LA
 Speaker – Big Data Camp LA 2013
 Leading a BOF Session at Hadoop Summit Europe 2014
 Author – HBase Developer’s Cookbook (Out Fall 2014)
 Technical Reviewer – Apache Flume: Distributed Log Collection for Hadoop
HADOOP: OLD & NEW
 Hadoop first released in 2006.
 Based on the GFS and MapReduce papers released by Google
 Ever since adoption has been massive and rapid
 Companies like Facebook, Netflix, EBay, Yahoo, Expedia, Spotify and even the
Social Security Administration are adopting Hadoop
 Hadoop 2.0 AKA YARN went GA in September of 2013
 Is backwards compatible with Hadoop 1.0 API’s
 Replaced Jobtracker and Tasktrackers with Application Master, Resource Manager
and Node Managers
A BRIEF HISTORY
Google
releases GFS
paper

2002

2003

Google
releases
MapReduce
paper

2004

Nutch adds
distributed
file system

Doug Cutting
launches
Nutch project

MapR
founded

2005

Hortonworks
founded

Cloudera
founded

2006

2007

Hadoop spun
out of Nutch
project at
Yahoo

MapReduce
implemented
in Nutch

Stinger/ Tez
to be
released

Hadoop 2.0
w/HA
available

2008

2009

2010

2011

Hadoop
breaks
Terasort
world record

2012

2013

2014

YARN goes
GA

HBase, Zookee
per, Flume and
more added to
CDH

Impala
(SQL on
Hadoop)
launched
PREVIOUSLY, THE STATE OF
DATA
As a data analyst, previously, you were not able to
ask questions you wanted to ask because you did
not have the data points available
Corollary, you couldn’t think of questions to ask of
your data because you didn’t know you had access
to those data points
BIG DATA IMPACT
FOCUS
 No standard way to get to the data
 This is a plus and minus, plus because there is variety to choose from, minus because the
no. of tools to pull the data is huge and evermore expanding

As a company what do you choose?
What do you focus on?
Question – Do you replace your current data
infrastructure or do you augment it?
HADOOP TECHNOLOGIES
DISTRIBUTIONS OF HADOOP
Apache
Hortonworks
Cloudera
MapR
Intel
IBM
Pivotal
HORTONWORKS HDP 2.0

Source: hortonworks.com
CLOUDERA ENTERPRISE
DATA HUB

Source: cloudera.com & techweekly.com
MAPR M7 ENTERPRISE

Source: business-software.com & wn.com
INTEL DISTRIBUTION FOR
APACHE HADOOP

Source: gigaom.com
IBM BIGINSIGHTS
ENTERPRISE EDITION

Source: ndm.net
PIVOTAL HD

Source: infoq.com
CHOICES
 Hortonworks – Completely Open Source – Everything on their platform is available
from Apache Hadoop Distribution. Available as a free download or with paid
support.
 Cloudera – Offers the open source Apache Hadoop Distribution as well as
management tools built for the Cloudera Distribution. Available as a free download
or with paid support with the additional tools
 MapR – Offers a version of Hadoop that replaces the HDFS with a proprietary
MFS(MapR File System). Everything else on their stack is based on the open
source Apache distribution. Offers a free M3 version along with paid M5 and M7
versions.
ADVANTAGES OF YARN
Ability to handle multi tenant clients, i.e. running
multiple
applications
atop
the
same
framework(multi-tenancy)
Splits the work of Job tracker into Resource
Manager and Application master so Job tracker
does not have to allocate resources as well as
manage the tasks
Ability to restart Jobs from the place where they
failed
Scales well beyond the limitations of MR1(4000
SQL-ON-HADOOP
The different
available
Hive
Impala
Drill
Stinger/Tez
HAWQ
Hadapt
Presto
Shark

SQL-On-Hadoop

tools

currently
SQL-ON-HADOOP
BENCHMARK - SCAN

Source:
SQL-ON-HADOOP
BENCHMARK - AGGREGATE

Source:
SQL-ON-HADOOP
BENCHMARK - JOIN

Source:
SQL ON HADOOP VS.
TRADITIONAL RDBMS
Data on Hadoop is not as responsive as a RDBMS
Data in Hadoop can scale much better than an
RDBMS
Data in Hadoop can be accessed using a variety of
mechanisms such as Hive, Imapala, Drill, etc. i.e.
the query engines are abstracted from the
Hadoop(HDFS) storage layer. The same cannot be
said of RDBMS where you would need between
one system to another example, Oracle cannot pull
from SQL Server and vice versa
QUESTION?
Do we augment or replace our current data
infrastructure?
Answer – Augment
Why? – combine the best of both worlds, use
aggregated data in your data stores and all the
detail data and lifetime in Hadoop
Of course, you will different SLA’s based on the
query you ask.
CHALLENGES
Data Protection
Security
SLA’s – Service Level Agreements
Integration w/ applications
Services and support
Training
Performance
Scaling and Administration
STARTUPS VS. MATURE
Startups that are in data should make the
consideration of going with YARN to gain the
advantages of YARN
Mature companies tend to be conservative and
hence will look to the more established use cases of
MR1
Startups and Mature companies should look at the
advantages of YARN as well as applying more near
real-time sql-on-hadoop
GETTING STARTED WITH
HADOOP VS. ESTABLISHED
HADOOP PRACTICES
Getting started with Hadoop – Opportunity to get off
the ground running YARN plus bleeding edge
technologies.
Established companies with a Hadoop practice tend
to be conservative but that shouldn’t prevent them
from coming with a migration plan to YARN
REAL TIME ANALYTICS
 Kiji
 HBase
 Storm
 Shark
 Redshift
 Impala
 Stinger
 Drill
 Accumolo
 Presto
 Hawq
 IBM BigSQL
REAL TIME STREAMING
Flume
Kafka
Scribe
HBase
SECURITY
Kerberos with ACL’s
Cloudera Sentry
Project Knox
Accumolo(BigTable clone)
HBase w/Cell Security
DEVELOPERS TOOLSET
Cloudera CDK renamed to Kite
Java M/R
Spring for Hadoop
Hive
Pig
Scalding
Impala
Others
MANAGEMENT, GUI, MACHIN
E
LEARNING, MONITORING, SC
HEDULING & GRAPH DB
Ambari
Cloudera Manager
HUE
Mahout
Giraph
Zookeeper
Oozie
FUTURE OF HADOOP: YARN &
NEAR REAL TIME SQL-ONHADOOP
Multi Tenancy
HA(High Availability)
Tools for SQL-On-Hadoop
Impala
Stinger/Tez
Drill
Shark
WHAT DO YOU CHOOSE?
The choices are huge
The toolsets are varied
First focus on the problems you are trying to solve. Don’t
choose Hadoop because it is the latest buzz word. Make
sure there is a real need to solve
Focus on developers and administrators and ensure that
whatever toolset you choose, they have the relevant
skillset or training will be provided or relevant resources will
be brought in from outside( whether through hiring or
consulting)
REMEMBER PROBLEMSET!!! i.e what you are trying to
CAVEATS
Work still being done on bringing real time sql-onhadoop to YARN.
Impala has Llama for this.
Stinger for Hive Preview is currently available
HBase on YARN(HOYA) is also actively being
worked on.
Since YARN is a low level API, some abstraction is
needed which is available with tools such as Samza
and Weave
BIG DATA = BIG IMPACT
Ken Rudin, Director of Analytics, Facebook
“You need to go the last mile and evangelize your
insights so that people actually act on them and
there is impact."
“It doesn’t matter how brilliant our analyses are. If
nothing changes we have made no impact”
GIVING BACK
Hadoop is an open source project
Work done on this and the ecosystem tools are by
committers and contributors, some of whom do this in
their own personal time, in reporting and fixing bugs as
well as new functionality.
Please
give
back
either
by
becoming
a
contributor(Testing, filing bugs) or getting out your use
case for Hadoop(at meetups and/or conferences such
as this one) so others can make use of the issues you
have faced as well see the rapid adoption of the
THANKS
Subash D’Souza
Twitter: @sawjd22
Linkedin: www.linkedin.com/in/sawjd/
Email: subashdsouza@gmail.com

Weitere ähnliche Inhalte

Was ist angesagt?

Apache hive essentials
Apache hive essentialsApache hive essentials
Apache hive essentialsSteve Tran
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UKSUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UKhuguk
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkAgnihotriGhosh2
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialDaniel Abadi
 
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!Nicolas Morales
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedDouglas Bernardini
 
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the CloudBest Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the CloudLeons Petražickis
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystemJakub Stransky
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Cloudera, Inc.
 

Was ist angesagt? (20)

Apache hive essentials
Apache hive essentialsApache hive essentials
Apache hive essentials
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
SAP HORTONWORKS
SAP HORTONWORKSSAP HORTONWORKS
SAP HORTONWORKS
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UKSUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
SUSE, Hadoop and Big Data Update. Stephen Mogg, SUSE UK
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
Big SQL 3.0: Datawarehouse-grade Performance on Hadoop - At last!
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integrated
 
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the CloudBest Practices for Deploying Hadoop (BigInsights) in the Cloud
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
SQL Server 2012 and Big Data
SQL Server 2012 and Big DataSQL Server 2012 and Big Data
SQL Server 2012 and Big Data
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
 

Andere mochten auch

Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on HadoopDataWorks Summit
 
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Data Con LA
 
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...Data Con LA
 
Yarn cloudera-kathleenting061414 kate-ting
Yarn cloudera-kathleenting061414 kate-tingYarn cloudera-kathleenting061414 kate-ting
Yarn cloudera-kathleenting061414 kate-tingData Con LA
 
Aziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaAziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaData Con LA
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamskyData Con LA
 
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...Data Con LA
 
Kiji cassandra la june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kellyKiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la june 2014 - v02 clint-kellyData Con LA
 
20140614 introduction to spark-ben white
20140614 introduction to spark-ben white20140614 introduction to spark-ben white
20140614 introduction to spark-ben whiteData Con LA
 
Big datacamp june14_alex_liu
Big datacamp june14_alex_liuBig datacamp june14_alex_liu
Big datacamp june14_alex_liuData Con LA
 
Ag big datacampla-06-14-2014-ajay_gopal
Ag big datacampla-06-14-2014-ajay_gopalAg big datacampla-06-14-2014-ajay_gopal
Ag big datacampla-06-14-2014-ajay_gopalData Con LA
 
Summit v4 dave wolcott
Summit v4 dave wolcottSummit v4 dave wolcott
Summit v4 dave wolcottData Con LA
 
140614 bigdatacamp-la-keynote-jon hsieh
140614 bigdatacamp-la-keynote-jon hsieh140614 bigdatacamp-la-keynote-jon hsieh
140614 bigdatacamp-la-keynote-jon hsiehData Con LA
 
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitLa big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitData Con LA
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA
 
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Data Con LA
 
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...Data Con LA
 
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...Data Con LA
 
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Data Con LA
 

Andere mochten auch (20)

Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on Hadoop
 
Apache Hadoop: DFS and Map Reduce
Apache Hadoop: DFS and Map ReduceApache Hadoop: DFS and Map Reduce
Apache Hadoop: DFS and Map Reduce
 
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
Big Data Day LA 2015 - NoSQL: Doing it wrong before getting it right by Lawre...
 
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
Big Data Day LA 2015 - HBase at Factual: Real time and Batch Uses by Molly O'...
 
Yarn cloudera-kathleenting061414 kate-ting
Yarn cloudera-kathleenting061414 kate-tingYarn cloudera-kathleenting061414 kate-ting
Yarn cloudera-kathleenting061414 kate-ting
 
Aziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jhaAziksa hadoop for buisness users2 santosh jha
Aziksa hadoop for buisness users2 santosh jha
 
2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky2014 bigdatacamp asya_kamsky
2014 bigdatacamp asya_kamsky
 
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
Big Data Day LA 2015 - Solr Search with Spark for Big Data Analytics in Actio...
 
Kiji cassandra la june 2014 - v02 clint-kelly
Kiji cassandra la   june 2014 - v02 clint-kellyKiji cassandra la   june 2014 - v02 clint-kelly
Kiji cassandra la june 2014 - v02 clint-kelly
 
20140614 introduction to spark-ben white
20140614 introduction to spark-ben white20140614 introduction to spark-ben white
20140614 introduction to spark-ben white
 
Big datacamp june14_alex_liu
Big datacamp june14_alex_liuBig datacamp june14_alex_liu
Big datacamp june14_alex_liu
 
Ag big datacampla-06-14-2014-ajay_gopal
Ag big datacampla-06-14-2014-ajay_gopalAg big datacampla-06-14-2014-ajay_gopal
Ag big datacampla-06-14-2014-ajay_gopal
 
Summit v4 dave wolcott
Summit v4 dave wolcottSummit v4 dave wolcott
Summit v4 dave wolcott
 
140614 bigdatacamp-la-keynote-jon hsieh
140614 bigdatacamp-la-keynote-jon hsieh140614 bigdatacamp-la-keynote-jon hsieh
140614 bigdatacamp-la-keynote-jon hsieh
 
La big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixitLa big datacamp2014_vikram_dixit
La big datacamp2014_vikram_dixit
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
Big Data Day LA 2015 - Lessons Learned from Designing Data Ingest Systems by ...
 
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
 
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
Big Data Day LA 2015 - Deep Learning Human Vocalized Animal Sounds by Sabri S...
 
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
Big Data Day LA 2016/ Data Science Track - Decision Making and Lambda Archite...
 

Ähnlich wie Hadoop Innovation Summit 2014

Hadoop training kit from lcc infotech
Hadoop   training kit from lcc infotechHadoop   training kit from lcc infotech
Hadoop training kit from lcc infotechlccinfotech
 
Hadoop essentials by shiva achari - sample chapter
Hadoop essentials by shiva achari - sample chapterHadoop essentials by shiva achari - sample chapter
Hadoop essentials by shiva achari - sample chapterShiva Achari
 
Hadoop Training in Delhi
Hadoop Training in DelhiHadoop Training in Delhi
Hadoop Training in DelhiAPTRON
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop TechnologyRahul Sharma
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersMrigendra Sharma
 
Hadoop administrator certification training
Hadoop administrator certification trainingHadoop administrator certification training
Hadoop administrator certification trainingXoom Trainings
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paperSupratim Ray
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchHortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsDavid Portnoy
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in AmritsarE2MATRIX
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in MohaliE2MATRIX
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in LudhianaE2MATRIX
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksHortonworks
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaSkillspeed
 

Ähnlich wie Hadoop Innovation Summit 2014 (20)

Hadoop training kit from lcc infotech
Hadoop   training kit from lcc infotechHadoop   training kit from lcc infotech
Hadoop training kit from lcc infotech
 
Hadoop essentials by shiva achari - sample chapter
Hadoop essentials by shiva achari - sample chapterHadoop essentials by shiva achari - sample chapter
Hadoop essentials by shiva achari - sample chapter
 
Hadoop Training in Delhi
Hadoop Training in DelhiHadoop Training in Delhi
Hadoop Training in Delhi
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
 
Hadoop administrator certification training
Hadoop administrator certification trainingHadoop administrator certification training
Hadoop administrator certification training
 
Why Hadoop as a Service?
Why Hadoop as a Service?Why Hadoop as a Service?
Why Hadoop as a Service?
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop Implementations
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in Amritsar
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in Mohali
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in Ludhiana
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
 
BIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social MediaBIG Data & Hadoop Applications in Social Media
BIG Data & Hadoop Applications in Social Media
 

Mehr von Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

Mehr von Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Kürzlich hochgeladen

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Kürzlich hochgeladen (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Hadoop Innovation Summit 2014

  • 1. THE FUTURE OF HADOOP: CHOOSING THE RIGHT OPTIONS Subash D’Souza Hadoop Innovation Summit 2014
  • 2. WHO AM I?  Recognized as a Champion of Big Data by Cloudera  Co-Organizer - Los Angeles Hadoop User Group  Organizer - Los Angeles HBase User Group  Organizer – Los Angeles Big Data Users Group  Organizer - Big Data Camp LA  Speaker – Big Data Camp LA 2013  Leading a BOF Session at Hadoop Summit Europe 2014  Author – HBase Developer’s Cookbook (Out Fall 2014)  Technical Reviewer – Apache Flume: Distributed Log Collection for Hadoop
  • 3. HADOOP: OLD & NEW  Hadoop first released in 2006.  Based on the GFS and MapReduce papers released by Google  Ever since adoption has been massive and rapid  Companies like Facebook, Netflix, EBay, Yahoo, Expedia, Spotify and even the Social Security Administration are adopting Hadoop  Hadoop 2.0 AKA YARN went GA in September of 2013  Is backwards compatible with Hadoop 1.0 API’s  Replaced Jobtracker and Tasktrackers with Application Master, Resource Manager and Node Managers
  • 4. A BRIEF HISTORY Google releases GFS paper 2002 2003 Google releases MapReduce paper 2004 Nutch adds distributed file system Doug Cutting launches Nutch project MapR founded 2005 Hortonworks founded Cloudera founded 2006 2007 Hadoop spun out of Nutch project at Yahoo MapReduce implemented in Nutch Stinger/ Tez to be released Hadoop 2.0 w/HA available 2008 2009 2010 2011 Hadoop breaks Terasort world record 2012 2013 2014 YARN goes GA HBase, Zookee per, Flume and more added to CDH Impala (SQL on Hadoop) launched
  • 5. PREVIOUSLY, THE STATE OF DATA As a data analyst, previously, you were not able to ask questions you wanted to ask because you did not have the data points available Corollary, you couldn’t think of questions to ask of your data because you didn’t know you had access to those data points
  • 7. FOCUS  No standard way to get to the data  This is a plus and minus, plus because there is variety to choose from, minus because the no. of tools to pull the data is huge and evermore expanding As a company what do you choose? What do you focus on? Question – Do you replace your current data infrastructure or do you augment it?
  • 10. HORTONWORKS HDP 2.0 Source: hortonworks.com
  • 11. CLOUDERA ENTERPRISE DATA HUB Source: cloudera.com & techweekly.com
  • 12. MAPR M7 ENTERPRISE Source: business-software.com & wn.com
  • 13. INTEL DISTRIBUTION FOR APACHE HADOOP Source: gigaom.com
  • 16. CHOICES  Hortonworks – Completely Open Source – Everything on their platform is available from Apache Hadoop Distribution. Available as a free download or with paid support.  Cloudera – Offers the open source Apache Hadoop Distribution as well as management tools built for the Cloudera Distribution. Available as a free download or with paid support with the additional tools  MapR – Offers a version of Hadoop that replaces the HDFS with a proprietary MFS(MapR File System). Everything else on their stack is based on the open source Apache distribution. Offers a free M3 version along with paid M5 and M7 versions.
  • 17. ADVANTAGES OF YARN Ability to handle multi tenant clients, i.e. running multiple applications atop the same framework(multi-tenancy) Splits the work of Job tracker into Resource Manager and Application master so Job tracker does not have to allocate resources as well as manage the tasks Ability to restart Jobs from the place where they failed Scales well beyond the limitations of MR1(4000
  • 22. SQL ON HADOOP VS. TRADITIONAL RDBMS Data on Hadoop is not as responsive as a RDBMS Data in Hadoop can scale much better than an RDBMS Data in Hadoop can be accessed using a variety of mechanisms such as Hive, Imapala, Drill, etc. i.e. the query engines are abstracted from the Hadoop(HDFS) storage layer. The same cannot be said of RDBMS where you would need between one system to another example, Oracle cannot pull from SQL Server and vice versa
  • 23. QUESTION? Do we augment or replace our current data infrastructure? Answer – Augment Why? – combine the best of both worlds, use aggregated data in your data stores and all the detail data and lifetime in Hadoop Of course, you will different SLA’s based on the query you ask.
  • 24. CHALLENGES Data Protection Security SLA’s – Service Level Agreements Integration w/ applications Services and support Training Performance Scaling and Administration
  • 25. STARTUPS VS. MATURE Startups that are in data should make the consideration of going with YARN to gain the advantages of YARN Mature companies tend to be conservative and hence will look to the more established use cases of MR1 Startups and Mature companies should look at the advantages of YARN as well as applying more near real-time sql-on-hadoop
  • 26. GETTING STARTED WITH HADOOP VS. ESTABLISHED HADOOP PRACTICES Getting started with Hadoop – Opportunity to get off the ground running YARN plus bleeding edge technologies. Established companies with a Hadoop practice tend to be conservative but that shouldn’t prevent them from coming with a migration plan to YARN
  • 27. REAL TIME ANALYTICS  Kiji  HBase  Storm  Shark  Redshift  Impala  Stinger  Drill  Accumolo  Presto  Hawq  IBM BigSQL
  • 29. SECURITY Kerberos with ACL’s Cloudera Sentry Project Knox Accumolo(BigTable clone) HBase w/Cell Security
  • 30. DEVELOPERS TOOLSET Cloudera CDK renamed to Kite Java M/R Spring for Hadoop Hive Pig Scalding Impala Others
  • 31. MANAGEMENT, GUI, MACHIN E LEARNING, MONITORING, SC HEDULING & GRAPH DB Ambari Cloudera Manager HUE Mahout Giraph Zookeeper Oozie
  • 32. FUTURE OF HADOOP: YARN & NEAR REAL TIME SQL-ONHADOOP Multi Tenancy HA(High Availability) Tools for SQL-On-Hadoop Impala Stinger/Tez Drill Shark
  • 33. WHAT DO YOU CHOOSE? The choices are huge The toolsets are varied First focus on the problems you are trying to solve. Don’t choose Hadoop because it is the latest buzz word. Make sure there is a real need to solve Focus on developers and administrators and ensure that whatever toolset you choose, they have the relevant skillset or training will be provided or relevant resources will be brought in from outside( whether through hiring or consulting) REMEMBER PROBLEMSET!!! i.e what you are trying to
  • 34. CAVEATS Work still being done on bringing real time sql-onhadoop to YARN. Impala has Llama for this. Stinger for Hive Preview is currently available HBase on YARN(HOYA) is also actively being worked on. Since YARN is a low level API, some abstraction is needed which is available with tools such as Samza and Weave
  • 35. BIG DATA = BIG IMPACT Ken Rudin, Director of Analytics, Facebook “You need to go the last mile and evangelize your insights so that people actually act on them and there is impact." “It doesn’t matter how brilliant our analyses are. If nothing changes we have made no impact”
  • 36. GIVING BACK Hadoop is an open source project Work done on this and the ecosystem tools are by committers and contributors, some of whom do this in their own personal time, in reporting and fixing bugs as well as new functionality. Please give back either by becoming a contributor(Testing, filing bugs) or getting out your use case for Hadoop(at meetups and/or conferences such as this one) so others can make use of the issues you have faced as well see the rapid adoption of the
  • 37. THANKS Subash D’Souza Twitter: @sawjd22 Linkedin: www.linkedin.com/in/sawjd/ Email: subashdsouza@gmail.com