SlideShare ist ein Scribd-Unternehmen logo
1 von 17
YARN: The Key To Overcoming The
Challenges To Broad Based Hadoop Adoption
1 RedPoint Global Inc.17 June 2014© Confidential
Overview - What is Hadoop/Hadoop 2.0
Lower
cost
scaling
No need
for
structure
Ease of
data
capture
Hadoop 1.0
• All operations based on Map Reduce
• Intrinsic inconsistency of code based
solutions
• Highly skilled and expensive resources
needed
• 3rd party applications constrained by the
need to generate code
Hadoop 2.0
• Introduction of the YARN:
“a general-purpose, distributed, application
management framework that supersedes the classic
Apache Hadoop MapReduce framework for
processing data in Hadoop clusters.”
• Mature applications can now operate
directly on Hadoop
• Reduce skill requirements and increased
consistency
2 RedPoint Global Inc.17 June 2014© Confidential
RedPoint Data Management on Hadoop
Partitioning
AM / Tasks
Execution
AM / Tasks
Data I/O
Key / Split
Analysis
Parallel Section (UI)
YARN
MapReduce
3 RedPoint Global Inc.17 June 2014© Confidential
Top Challenges to Adoption
• Severe shortage of MR
skilled resources
• Very expensive resources
and hard to retain
• Inconsistent skills lead to
inconsistent results
• Under utilizes existing
resources
• Prevents broad leverage
of investments across
enterprise
Skills Gap
• A nascent technology
ecosystem around
Hadoop
• Emerging technologies
only address narrow
slivers of functionality
• New applications are not
enterprise class
• Legacy applications have
built short term
capabilities
Maturity & Governance
• Data is not useful in its
raw state, it must be
turned into information
• Benefit of Hadoop is that
same data can be used
from many perspectives
• Analysts must now do
the structuring of the
data based on intended
use of the data
Data Into Information
4 RedPoint Global Inc.17 June 2014© Confidential
RedPoint Overcomes Challenges
First YARN compliant ETL/data quality
toolset on the market – brings together
both Big Data and traditional data to create
Big Information!
• Customer or Party Data
• Processing Speed
• Match Quality
• Ease of Use
by in:
RANKED
#1 The power to make
your data the biggest
asset your organization
has
5 RedPoint Global Inc.17 June 2014© Confidential
Key features of RedPoint Data Management
Master Key Management
ETL & ELT Data Quality
Web Services Integration
Integration & Matching
Process Automation
& Operations
• Profiling, reads/writes,
transformations
• Single project for all jobs
• Cleanse data
• Parsing, correction
• Geo-spatial analysis
• Grouping
• Fuzzy match
• Create keys
• Track changes
• Maintain matches
over time
• Consume and publish
• HTTP/HTTPS protocols
• XML/JSON/SOAP formats
• Job scheduling, monitoring,
notifications
• Central point of control
All functions
can be used
on both
TRADITIONAL
and
BIG DATA
Creates
clean,
integrated,
actionable
data –
quickly,
reliably and
at low cost
6 RedPoint Global Inc.17 June 2014© Confidential
Monitoring and Management Tools
RedPoint Functional Footprint
AMBARI
MAPREDUCE
REST
DATA REFINEMENT
HIVEPIG
HTTP
STREAM
STRUCTURE
HCATALOG
(metadata services)
Query/Visualization/
Reporting/Analytical
Tools and Apps
SOURCE
DATA
- Sensor Logs
- Clickstream
- Flat Files
- Unstructured
- Sentiment
- Customer
- Inventory
DBs
JMS
Queue’s
Fil
es
Fil
esFiles
Data Sources
RDBMS
EDW
INTERACTIVE
HIVE Server2
LOAD
SQOOP
WebHDFS
Flume
NFS
LOAD
SQOOP/Hive
Web HDFS
YARN
         
          
          
 
 
 n
HDFS
1            

           
           
            
7 RedPoint Global Inc.17 June 2014© Confidential
No Coding Necessary
For data management in Hadoop:
• Easy-to-use interface
• Leverages existing skills
• Executes in Hadoop 2.0
(using YARN architecture)
• Fast – no MapReduce
• Can combine Big Data
with traditional data
• Data becomes actionable
by RedPoint Interaction
WITH REDPOINT
the only pure YARN data management platform
Makes Hadoop data management easy, fast, low-cost.
Makes Big Data clean, integrated, usable.
You get more out of your Big Data investment.
Use MapReduce
 complex
 requires new skills
 inefficient execution
Move data out of Hadoop
 extra time and effort
 extra storage (expensive)
 defeats the purpose of Hadoop
PREVIOUS OPTIONS
8 RedPoint Global Inc.17 June 2014© Confidential
Resource
Manager
Launches
Tasks
Node Manager
DM App Master
DM Task
Node Manager
DM Task
DM Task
Node Manager
DM Task
DM Task
Launches DM
App Master
Data Management
Designer
DM
Execution
Server
Parallel Section
Running DM Task
1
2
3
RedPoint DM for Hadoop: Processing Flow
9 RedPoint Global Inc.17 June 2014© Confidential
The Data Management designer
10 RedPoint Global Inc.17 June 2014© Confidential
DM Parallel Section on Hadoop
11 RedPoint Global Inc.17 June 2014© Confidential
DM Hadoop Settings
12 RedPoint Global Inc.17 June 2014© Confidential
RedPoint
Benchmarks – Project Gutenberg
Map Reduce Pig
Sample MapReduce (small subset of the entire code which totals nearly 150 lines):
public static class MapClass
extends Mapper<WordOffset, Text, Text, IntWritable> {
private final static String delimiters =
"',./<>?;:"[]{}-=_+()&*%^#$!@`~ |«»¡¢£¤¥¦©¬®¯±¶·¿";
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(WordOffset key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer itr = new StringTokenizer(line, delimiters);
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
Sample Pig script without the UDF:
SET pig.maxCombinedSplitSize 67108864
SET pig.splitCombination true
A = LOAD '/testdata/pg/*/*/*';
B = FOREACH A GENERATE FLATTEN(TOKENIZE((chararray)$0
C = FOREACH B GENERATE UPPER(word) AS word;
D = GROUP C BY word;
E = FOREACH D GENERATE COUNT(C) AS occurrences, group
F = ORDER E BY occurrences DESC;
STORE F INTO '/user/cleonardi/pg/pig-count';
>150 Lines of MR Code ~50 Lines of Script Code 0 Lines of Code
6 hours of development 3 hours of development 15 min. of development
6 minutes runtime 15 minutes runtime 3 minutes runtime
Extensive optimization
needed
User Defined Functions
required prior to
running script
No tuning or
optimization required
13 RedPoint Global Inc.17 June 2014© Confidential
RedPoint in a modern data architecture
APPLICATIONS
Data Quality
Data Integration
Identity Resolution
ELT  ETL  Cleanse  Match  De-dupe  Merge/Purge  Household
Partition  Parse  Append  Standardize  Key  Automate  Monitor  Notify
DATASYSTEMSDATASOURCES
Traditional Sources
(RDBMS, OLTP, OLAP)
New Sources
(web logs, email, sensors, social media)
Pure YARN application
No MapReduce needed.
No in-cluster installation.
One application, one graphical user interface for traditional and Big Data
Pre-built native
adapters
Any analytics
Any reporting
Any other application
YARN
1          
          
          
 
 
 n
HDFS
HADOOPTRADITIONAL RESPOSITORIES
+ others
14 RedPoint Global Inc.17 June 2014© Confidential
Who Should Care
Companies interested in exploring the promise of Big Data
Analytics and need an easy way to get started.
Companies already investing heavily investing in Big Data
Analytics technologies but are stuck due to the shortage of
skilled resources
Large organizations that are focused on “Operational
Offloading” and need to achieve it cost effectively
Companies who recognize that much of the data that lands in
Hadoop is external to the organization and need to have Data
Quality and proper data governance applied to their Hadoop
data.
15 RedPoint Global Inc.17 June 2014© Confidential
Users can work across any/all data
Easy to integrate data from any source
No need for extra storage
No time wasted moving data
Minimizes extra computing resources
No compromises in quality or
integration for data in Hadoop
Overcomes the skills gap
Existing staff can start working now
RedPoint benefits and value
Makes Hadoop
data management:
•Faster
•Easier
•Less expensive
•More effective
FEATURES
Pure YARN,
no MapReduce
Graphical UI,
not code-based
All DQ/DI
functions available
Executes in Hadoop,
no data movement
Zero footprint install,
nothing in the cluster
Same product for
Hadoop and database
Top rated
for ease-of-use
BENEFITS VALUE
16 RedPoint Global Inc.17 June 2014© Confidential
For More Information on RedPoint
Visit us in booth P13
Download YARN
article here:
http://bit.ly/YARN-
Article
Email:
contact.us@redpoin
t.net

Weitere ähnliche Inhalte

Was ist angesagt?

information retrieval Techniques and normalization
information retrieval Techniques and normalizationinformation retrieval Techniques and normalization
information retrieval Techniques and normalizationAmeenababs
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
Google Megastore
Google MegastoreGoogle Megastore
Google Megastorebergwolf
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDBvaluebound
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Database backup and recovery basics
Database backup and recovery basicsDatabase backup and recovery basics
Database backup and recovery basicsShahed Mohamed
 
Hadoop Mapreduce Job Execution By Ravi Namboori Babson
Hadoop Mapreduce Job Execution By Ravi Namboori BabsonHadoop Mapreduce Job Execution By Ravi Namboori Babson
Hadoop Mapreduce Job Execution By Ravi Namboori BabsonRavi namboori
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Normalization in a Database
Normalization in a DatabaseNormalization in a Database
Normalization in a DatabaseBishrul Haq
 
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NFDatabase Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NFOum Saokosal
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoopclairvoyantllc
 

Was ist angesagt? (20)

information retrieval Techniques and normalization
information retrieval Techniques and normalizationinformation retrieval Techniques and normalization
information retrieval Techniques and normalization
 
Normalization
NormalizationNormalization
Normalization
 
Normalization in DBMS
Normalization in DBMSNormalization in DBMS
Normalization in DBMS
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Google Megastore
Google MegastoreGoogle Megastore
Google Megastore
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
Dbms Useful PPT
Dbms Useful PPTDbms Useful PPT
Dbms Useful PPT
 
Normalization case
Normalization caseNormalization case
Normalization case
 
CPU Scheduling
CPU SchedulingCPU Scheduling
CPU Scheduling
 
Crystal report
Crystal reportCrystal report
Crystal report
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Database backup and recovery basics
Database backup and recovery basicsDatabase backup and recovery basics
Database backup and recovery basics
 
Hadoop Mapreduce Job Execution By Ravi Namboori Babson
Hadoop Mapreduce Job Execution By Ravi Namboori BabsonHadoop Mapreduce Job Execution By Ravi Namboori Babson
Hadoop Mapreduce Job Execution By Ravi Namboori Babson
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Normalization in a Database
Normalization in a DatabaseNormalization in a Database
Normalization in a Database
 
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NFDatabase Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
 
Polyglot Persistence
Polyglot Persistence Polyglot Persistence
Polyglot Persistence
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Running Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on HadoopRunning Airflow Workflows as ETL Processes on Hadoop
Running Airflow Workflows as ETL Processes on Hadoop
 
Database normalization
Database normalizationDatabase normalization
Database normalization
 

Andere mochten auch

Apache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersApache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersYash Sharma
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarHortonworks
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Hakka Labs
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Hortonworks
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN ApplicationsHortonworks
 

Andere mochten auch (6)

Apache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersApache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCoders
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider Webinar
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 

Ähnlich wie YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption

Game Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingGame Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingInside Analysis
 
Has Traditional MDM Finally Met its Match?
Has Traditional MDM Finally Met its Match?Has Traditional MDM Finally Met its Match?
Has Traditional MDM Finally Met its Match?Inside Analysis
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalCaserta
 
Hadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeHadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeInside Analysis
 
Drive dataqualityatyourcompanycreateadatalake
Drive dataqualityatyourcompanycreateadatalakeDrive dataqualityatyourcompanycreateadatalake
Drive dataqualityatyourcompanycreateadatalakeThe Pathway Group
 
A Tighter Weave – How YARN Changes the Data Quality Game
A Tighter Weave – How YARN Changes the Data Quality GameA Tighter Weave – How YARN Changes the Data Quality Game
A Tighter Weave – How YARN Changes the Data Quality GameInside Analysis
 
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseHadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseDataWorks Summit
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...MapR Technologies
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformEMC
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Talend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewTalend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewRajan Kanitkar
 
Offload, Transform, and Present - the New World of Data Integration
Offload, Transform, and Present - the New World of Data IntegrationOffload, Transform, and Present - the New World of Data Integration
Offload, Transform, and Present - the New World of Data IntegrationMichael Rainey
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 

Ähnlich wie YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption (20)

Game Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingGame Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise Thinking
 
Has Traditional MDM Finally Met its Match?
Has Traditional MDM Finally Met its Match?Has Traditional MDM Finally Met its Match?
Has Traditional MDM Finally Met its Match?
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobal
 
Hadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality ChallengeHadoop 2.0 - Solving the Data Quality Challenge
Hadoop 2.0 - Solving the Data Quality Challenge
 
Drive dataqualityatyourcompanycreateadatalake
Drive dataqualityatyourcompanycreateadatalakeDrive dataqualityatyourcompanycreateadatalake
Drive dataqualityatyourcompanycreateadatalake
 
A Tighter Weave – How YARN Changes the Data Quality Game
A Tighter Weave – How YARN Changes the Data Quality GameA Tighter Weave – How YARN Changes the Data Quality Game
A Tighter Weave – How YARN Changes the Data Quality Game
 
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseHadoop is not an Island in the Enterprise
Hadoop is not an Island in the Enterprise
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Talend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewTalend Big Data Capabilities Overview
Talend Big Data Capabilities Overview
 
Offload, Transform, and Present - the New World of Data Integration
Offload, Transform, and Present - the New World of Data IntegrationOffload, Transform, and Present - the New World of Data Integration
Offload, Transform, and Present - the New World of Data Integration
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 

Mehr von DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mehr von DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Kürzlich hochgeladen

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption

  • 1. YARN: The Key To Overcoming The Challenges To Broad Based Hadoop Adoption
  • 2. 1 RedPoint Global Inc.17 June 2014© Confidential Overview - What is Hadoop/Hadoop 2.0 Lower cost scaling No need for structure Ease of data capture Hadoop 1.0 • All operations based on Map Reduce • Intrinsic inconsistency of code based solutions • Highly skilled and expensive resources needed • 3rd party applications constrained by the need to generate code Hadoop 2.0 • Introduction of the YARN: “a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters.” • Mature applications can now operate directly on Hadoop • Reduce skill requirements and increased consistency
  • 3. 2 RedPoint Global Inc.17 June 2014© Confidential RedPoint Data Management on Hadoop Partitioning AM / Tasks Execution AM / Tasks Data I/O Key / Split Analysis Parallel Section (UI) YARN MapReduce
  • 4. 3 RedPoint Global Inc.17 June 2014© Confidential Top Challenges to Adoption • Severe shortage of MR skilled resources • Very expensive resources and hard to retain • Inconsistent skills lead to inconsistent results • Under utilizes existing resources • Prevents broad leverage of investments across enterprise Skills Gap • A nascent technology ecosystem around Hadoop • Emerging technologies only address narrow slivers of functionality • New applications are not enterprise class • Legacy applications have built short term capabilities Maturity & Governance • Data is not useful in its raw state, it must be turned into information • Benefit of Hadoop is that same data can be used from many perspectives • Analysts must now do the structuring of the data based on intended use of the data Data Into Information
  • 5. 4 RedPoint Global Inc.17 June 2014© Confidential RedPoint Overcomes Challenges First YARN compliant ETL/data quality toolset on the market – brings together both Big Data and traditional data to create Big Information! • Customer or Party Data • Processing Speed • Match Quality • Ease of Use by in: RANKED #1 The power to make your data the biggest asset your organization has
  • 6. 5 RedPoint Global Inc.17 June 2014© Confidential Key features of RedPoint Data Management Master Key Management ETL & ELT Data Quality Web Services Integration Integration & Matching Process Automation & Operations • Profiling, reads/writes, transformations • Single project for all jobs • Cleanse data • Parsing, correction • Geo-spatial analysis • Grouping • Fuzzy match • Create keys • Track changes • Maintain matches over time • Consume and publish • HTTP/HTTPS protocols • XML/JSON/SOAP formats • Job scheduling, monitoring, notifications • Central point of control All functions can be used on both TRADITIONAL and BIG DATA Creates clean, integrated, actionable data – quickly, reliably and at low cost
  • 7. 6 RedPoint Global Inc.17 June 2014© Confidential Monitoring and Management Tools RedPoint Functional Footprint AMBARI MAPREDUCE REST DATA REFINEMENT HIVEPIG HTTP STREAM STRUCTURE HCATALOG (metadata services) Query/Visualization/ Reporting/Analytical Tools and Apps SOURCE DATA - Sensor Logs - Clickstream - Flat Files - Unstructured - Sentiment - Customer - Inventory DBs JMS Queue’s Fil es Fil esFiles Data Sources RDBMS EDW INTERACTIVE HIVE Server2 LOAD SQOOP WebHDFS Flume NFS LOAD SQOOP/Hive Web HDFS YARN                                      n HDFS 1                                                  
  • 8. 7 RedPoint Global Inc.17 June 2014© Confidential No Coding Necessary For data management in Hadoop: • Easy-to-use interface • Leverages existing skills • Executes in Hadoop 2.0 (using YARN architecture) • Fast – no MapReduce • Can combine Big Data with traditional data • Data becomes actionable by RedPoint Interaction WITH REDPOINT the only pure YARN data management platform Makes Hadoop data management easy, fast, low-cost. Makes Big Data clean, integrated, usable. You get more out of your Big Data investment. Use MapReduce  complex  requires new skills  inefficient execution Move data out of Hadoop  extra time and effort  extra storage (expensive)  defeats the purpose of Hadoop PREVIOUS OPTIONS
  • 9. 8 RedPoint Global Inc.17 June 2014© Confidential Resource Manager Launches Tasks Node Manager DM App Master DM Task Node Manager DM Task DM Task Node Manager DM Task DM Task Launches DM App Master Data Management Designer DM Execution Server Parallel Section Running DM Task 1 2 3 RedPoint DM for Hadoop: Processing Flow
  • 10. 9 RedPoint Global Inc.17 June 2014© Confidential The Data Management designer
  • 11. 10 RedPoint Global Inc.17 June 2014© Confidential DM Parallel Section on Hadoop
  • 12. 11 RedPoint Global Inc.17 June 2014© Confidential DM Hadoop Settings
  • 13. 12 RedPoint Global Inc.17 June 2014© Confidential RedPoint Benchmarks – Project Gutenberg Map Reduce Pig Sample MapReduce (small subset of the entire code which totals nearly 150 lines): public static class MapClass extends Mapper<WordOffset, Text, Text, IntWritable> { private final static String delimiters = "',./<>?;:"[]{}-=_+()&*%^#$!@`~ |«»¡¢£¤¥¦©¬®¯±¶·¿"; private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(WordOffset key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line, delimiters); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } Sample Pig script without the UDF: SET pig.maxCombinedSplitSize 67108864 SET pig.splitCombination true A = LOAD '/testdata/pg/*/*/*'; B = FOREACH A GENERATE FLATTEN(TOKENIZE((chararray)$0 C = FOREACH B GENERATE UPPER(word) AS word; D = GROUP C BY word; E = FOREACH D GENERATE COUNT(C) AS occurrences, group F = ORDER E BY occurrences DESC; STORE F INTO '/user/cleonardi/pg/pig-count'; >150 Lines of MR Code ~50 Lines of Script Code 0 Lines of Code 6 hours of development 3 hours of development 15 min. of development 6 minutes runtime 15 minutes runtime 3 minutes runtime Extensive optimization needed User Defined Functions required prior to running script No tuning or optimization required
  • 14. 13 RedPoint Global Inc.17 June 2014© Confidential RedPoint in a modern data architecture APPLICATIONS Data Quality Data Integration Identity Resolution ELT  ETL  Cleanse  Match  De-dupe  Merge/Purge  Household Partition  Parse  Append  Standardize  Key  Automate  Monitor  Notify DATASYSTEMSDATASOURCES Traditional Sources (RDBMS, OLTP, OLAP) New Sources (web logs, email, sensors, social media) Pure YARN application No MapReduce needed. No in-cluster installation. One application, one graphical user interface for traditional and Big Data Pre-built native adapters Any analytics Any reporting Any other application YARN 1                                      n HDFS HADOOPTRADITIONAL RESPOSITORIES + others
  • 15. 14 RedPoint Global Inc.17 June 2014© Confidential Who Should Care Companies interested in exploring the promise of Big Data Analytics and need an easy way to get started. Companies already investing heavily investing in Big Data Analytics technologies but are stuck due to the shortage of skilled resources Large organizations that are focused on “Operational Offloading” and need to achieve it cost effectively Companies who recognize that much of the data that lands in Hadoop is external to the organization and need to have Data Quality and proper data governance applied to their Hadoop data.
  • 16. 15 RedPoint Global Inc.17 June 2014© Confidential Users can work across any/all data Easy to integrate data from any source No need for extra storage No time wasted moving data Minimizes extra computing resources No compromises in quality or integration for data in Hadoop Overcomes the skills gap Existing staff can start working now RedPoint benefits and value Makes Hadoop data management: •Faster •Easier •Less expensive •More effective FEATURES Pure YARN, no MapReduce Graphical UI, not code-based All DQ/DI functions available Executes in Hadoop, no data movement Zero footprint install, nothing in the cluster Same product for Hadoop and database Top rated for ease-of-use BENEFITS VALUE
  • 17. 16 RedPoint Global Inc.17 June 2014© Confidential For More Information on RedPoint Visit us in booth P13 Download YARN article here: http://bit.ly/YARN- Article Email: contact.us@redpoin t.net

Hinweis der Redaktion

  1. I want to take a minute and highlight a new offering that RedPoint recently announced: RedPoint Data Management for Hadoop. Simply put, it’s data management for Big Data – it allows users to perform the same kind of data management functions on Big Data as they are already to do with traditional data – integrate data, clean it, append it, reformat it, etc. [CLICK MOUSE] Previously, if someone wanted to perform these data management functions on data stored in a Hadoop cluster, they had two options: They could use MapReduce, the programming model used with Hadoop. But programming data management processes with MapReduce is complex – it’s real coding, as you can see in this little snippet of MapReduce code. So, it requires new skills that many companies don’t have on staff. And MapReduce, while able to scale and process large volumes of data, isn’t actually a very efficient way to execute data management processes, so it winds up either being slow or being fast and consuming vast computing resources while it executes. So, MapReduce hasn’t been a great option for data management in Hadoop. The other option was to move data out of Hadoop into a more traditional data store and perform data management procedures there. But this takes extra time and effort, and is expensive because you need to buy the extra (often more expensive) storage on top of what you’ve already spent on Hadoop. Really, this approach defeats the entire purpose of Hadoop, which is to keep the data in Hadoop where it’s the most economical. [CLICK MOUSE] But now, with the advent of Hadoop 2.0 and RedPoint Data Management for Hadoop, there’s another option. With RedPoint, you get an easy-to-use interface to perform your data management functions – the same user interface already used and appreciated by many RedPoint Data Management customers. This allows you to leverage your existing data management and data analyst skills, rather than investing in new MapReduce skills. All your data management processes will execute right in Hadoop, using the YARN infrastructure that’s part of Hadoop 2.0. And it’s fast and efficient, since there’s no MapReduce involved. Even more valuable, it’s possible to use RedPoint Data Management Hadoop to combine the Big Data in Hadoop with your traditional data to create a more complete view of your customers, to increase customer insight and make targeted marketing more relevant and effect. And by using RedPoint Data Management for Hadoop the data immediately becomes actionable, because RedPoint’s Data Management functionality is connected to RedPoint Interact, our campaign and interaction management software. All these benefits are only available from RedPoint because RedPoint Data Management for Hadoop is the only pure YARN data management platform. [CLICK MOUSE] In summary, RedPoint Data Management for Hadoop makes Hadoop data management easy, fast, low-cost. And it makes Big Data clean, integrated and usable.