SlideShare ist ein Scribd-Unternehmen logo
1 von 57
© comScore, Inc. Proprietary.
Syncsort & MapR @ comScore
Michael Brown, CTO | July 9th, 2014
© comScore, Inc. Proprietary.© comScore, Inc. Proprietary.
The comScore Story
Analytics for a Digital World™
© comScore, Inc. Proprietary. 3
The Digital World is Complex
V0113
© comScore, Inc. Proprietary. 4
comScore’s Mission
Be the Leader in
Digital Media Analytics.
Measure all forms of
media—content and
advertising—at scale,
across all platforms, in
real-time, globally.
© comScore, Inc. Proprietary. 5
comScore Brings it Together
TabletPC/Mac TV SmartphoneGaming
V0113
© comScore, Inc. Proprietary. 6
comScore is a leading internet technology company that
provides Analytics for a Digital World™
NASDAQ SCOR
Clients 2,400+ Worldwide
Employees 1,200+
Headquarters Reston, Virginia, USA
Global Coverage Measurement from 172 Countries; 44 Markets Reported
Local Presence 32 Locations in 23 Countries
V0113
© comScore, Inc. Proprietary. 7
Providing Analytics For More Than 2,400+ Clients Globally
Media Agencies Telecom/Mobile Financial Retail Travel CPG Health Technology
V0113
© comScore, Inc. Proprietary. 8
Census
Tags & Data Feeds
Panels
PC, iOS, Android
Survey
Non-behavioral elements
Methods
Aggregation
Dictionaries
Taxonomies
Syndicated
Data
Platform
Media Metrix
vCE
Collection Calibration Delivery
Consulting
Analysis
Models
Weighting
Projection
De-Duplication
Attribution
Turning Big Data into Powerful Insight
Client
Analytics
Platform
Digital
Analytix
© comScore, Inc. Proprietary. 9
© comScore, Inc. Proprietary. 10
Panel Heat Map
© comScore, Inc. Proprietary. 11
Average Records Captured per Day (2005-2009)
-
200,000,000
400,000,000
600,000,000
800,000,000
1,000,000,000
1,200,000,000
1,400,000,000
1,600,000,000
1,800,000,000
9/26/2005
10/26/2005
11/26/2005
12/26/2005
1/26/2006
2/26/2006
3/26/2006
4/26/2006
5/26/2006
6/26/2006
7/26/2006
8/26/2006
9/26/2006
10/26/2006
11/26/2006
12/26/2006
1/26/2007
2/26/2007
3/26/2007
4/26/2007
5/26/2007
6/26/2007
7/26/2007
8/26/2007
9/26/2007
10/26/2007
11/26/2007
12/26/2007
1/26/2008
2/26/2008
3/26/2008
4/26/2008
5/26/2008
6/26/2008
7/26/2008
8/26/2008
9/26/2008
10/26/2008
11/26/2008
12/26/2008
1/26/2009
2/26/2009
3/26/2009
© comScore, Inc. Proprietary. 12
CENSUS
Unified Digital Measurement™ (UDM) Establishes Platform For
Panel + Census Data Integration
Adopted by 90% of Top 100 U.S. Media Properties
PANEL
Unified Digital Measurement (UDM)
Patent-Pending Methodology
Global PERSON
Measurement
Global DEVICE
Measurement
V0411
© comScore, Inc. Proprietary. 13
Beacon Heat Map
© comScore, Inc. Proprietary. 14
Monthly Records Collection
Billion
200 Billion
400 Billion
600 Billion
800 Billion
1,000 Billion
1,200 Billion
1,400 Billion
1,600 Billion
1,800 Billion
2,000 Billion
#ofrecords
Beacon Records
Panel Records
Total records collected in June 2014 = 1,726,563,202,649
Total records collected YTD 2014 = 10,037,131,368,475
© comScore, Inc. Proprietary.
DMX @ comScore
© comScore, Inc. Proprietary. 16
DMX use at comScore
Purchased our first 4 licenses in 2000!
We use DMX from Syncsort across hundreds of servers for efficient data
processing and aggregation.
We currently run over 100+ unique jobs every day.
With these jobs we process over 150 billion rows of data through DMX!
Connect
Design
Process Accelerate
© comScore, Inc. Proprietary. 17
Compression w/Sorting
Compress Log Files when processing large volumes of log data
Several advantages to Sorting Data First:
 Reduces the size of the data
 Improves application performance
Examples:
 1 Hour of one source of our data 2,315 GB raw (2.9 billion rows)
 Standard compression of time ordered data is 509 GB (22% of original)
 Standard compression on a sorted set is 324 GB (14% of original)
When applied to all our sources we save
 5.0 TB per day
 155 TB per month
 460 TB per quarter
© comScore, Inc. Proprietary.
Hadoop @ comScore
© comScore, Inc. Proprietary. 19
Why Hadoop?
• comScore built our own distributed
computing stack in 2002.
• In 2009 we decided it was better to leverage
the efforts of the Hadoop community instead
of building our own stack.
• We recognized the benefit of switching to
Hadoop which would allow for seamless
scaling of our infrastructure to meet the
needs of the business.
• Hadoop allows us to add compute, storage
and memory linearly and allows you to
process things at tremendous scale.
• Partnered with SyncSort on their Hadoop
efforts from Oct 2010
• Evaluated the beta of MapR in the fall of 2011
© comScore, Inc. Proprietary. 20
90 Days of Data
1,148
1,919
3,049
4,862
5,084
Trillion
1,000 Trillion
2,000 Trillion
3,000 Trillion
4,000 Trillion
5,000 Trillion
6,000 Trillion
2009 2010 2011 2012 2013 2014 2016
© comScore, Inc. Proprietary. 21
High Level Data Flow
Panel
Census
Custom Code +
ADW
EDW
Delivery
© comScore, Inc. Proprietary. 22
Our Cluster
Production Hadoop Cluster
 400+ nodes: Mix of Dell 720xd, R710 and R510 servers
 Each R720xd has (24x1.2TB drives; 128GB RAM; 32 cores)
 13,800+ total CPUs
 31.6 TB total memory
 8.2 PB total disk space
 Our distro is MapR M5 2.1.3
© comScore, Inc. Proprietary.
Leveraging Partitions from MapR
© comScore, Inc. Proprietary.
© comScore, Inc. Proprietary.
Validation Funnel & Target Effectiveness
© comScore, Inc. Proprietary. 26
Our growth
As our volume has grown we have the following stats:
 Over 683 billion events per month
 Daily Aggregate 1.8 billion
 160 billion aggregate records for 92 days
 146K Campaigns
 Over 50 countries
 We see 15 billion distinct cookies in a month
 We only need to output 26 million rows
© comScore, Inc. Proprietary. 27
Solution to reduce the shuffle
The Problem:
 Most aggregations within comScore can not take advantage of combiners, leading to large shuffles and
job performance issues
The Idea:
 Partition and sort the data by cookie on a daily basis
 Create a custom InputFormat to merge daily partitions for monthly aggregations
© comScore, Inc. Proprietary. 28
Custom Input Format with Map Side Aggregation
CB
Mapper MapperMapperMap Map Map
Reduce ReduceReduce
BA AC
A B C
A B C
Combiner Combiner Combiner
A B C
© comScore, Inc. Proprietary. 29
Risks for Partitioning
Data locality
 Custom InputFormat requires reading blocks of the partitioned data over the network
 This was solved using a feature of the MapR file system. We created volumes and set the chunk size to
zero which guarantees that the data written to a volume will stay on one node
Map failures might result in long run times
 Size of the map inputs is no longer set by block size
 This was solved by creating a large number (10K) of volumes to limit the size of data processed by each
mapper
© comScore, Inc. Proprietary. 30
Partitioning Summary
Benefits:
 A large portion of the aggregation can be completed in the map phase
 Applications can now take advantage of combiners
 Shuffles sizes are minimal
Results:
 Took a job from 35 hours to 3 hours with no hardware changes
© comScore, Inc. Proprietary.
DMX-h @ comScore
© comScore, Inc. Proprietary. 32
Reasons for comScore selecting DMX-h
Performance
• DMX-h as the pluggable sort in Hadoop allows us to increase throughput on
it’s existing platform; this reduces capital and ongoing operational
expenses
• The increase in throughput allows us to also deliver our data more quickly
to our customers. These things make the data more valuable to our clients.
Speed of Development
• The ability to quickly build out applications in the DMX-h GUI allows us to
iterate and respond quicker to the needs of the business.
• The ease of development also allows us to democratize the access to the
Hadoop platform by leveraging a point and click GUI.
© comScore, Inc. Proprietary. 33
Performance - DMx Pluggable Sort Testing Results
First Comparison Run on our Dev Cluster
Pig scripts and called with SyncSort plug in
GroupBy / Distinct Operations
• Counting uniques
• These have large shuffle steps which leads to more data to sort.
• Observed up to a 20% decrease in job runtime
Filter Operations
• Searching for a specific value
• Observed a 5% – 10% decrease in job runtime
• Dependent on type of filter and size of job output
40GB compressed data, base run is 86 min, test run is 68 min; Savings of 20%
Results from 7 Nodes; 56 cores; 433 GB RAM; 28 TB disk; MapR M5 3.0.2; DMX-h 7.12
© comScore, Inc. Proprietary. 34
Speed of Development - POC
We took an existing process that runs in our Hadoop cluster and converted
that to DMX-h to validate the new capabilities.
The existing process:
• Written in 75 lines of Pig with 3 Java UDFs
• Developed in about 25 hours
• Processes 3.5 billion input rows per day
• Takes 35 minutes to run on a daily basis
© comScore, Inc. Proprietary. 35
DMXh-Process
© comScore, Inc. Proprietary. 36
Speed of Development - POC
The new process in DMX-h:
• Developed a new job with 13 tasks
• No Java UDF required
• Runs on the same data and in the same environment.
• Developed in 12 hours.
• Runs in 11 minutes! 1/3 of the time of the Pig & Java code.
© comScore, Inc. Proprietary. 37
Useful Factoids
Visit www.comscoredatamine.com or follow @datagems for the latest gems.
Colorful, bite-sized graphical representations of the best discoveries we unearth.
© comScore, Inc. Proprietary. 38
Thank You!
Michael Brown
CTO
comScore, Inc.
mbrown@comscore.com
© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
Today’s Presenters
Steve Wooledge
VP - Product Marketing
@swooledge
Jorge Lopez
Director - Product Marketing
@zanilli
Mike Brown
CTO
© 2014 MapR Technologies 3© 2014 MapR Technologies
comScore
© comScore, Inc. Proprietary.
Syncsort & MapR @ comScore
• Michael Brown, CTO | July 9th, 2014
© 2014 MapR Technologies 5© 2014 MapR Technologies
Leveraging MapR and Syncsort
© 2014 MapR Technologies 6
Big Data is Overwhelming Traditional Systems
• Mission-critical reliability
• Transaction guarantees
• Deep security
• Real-time performance
• Backup and recovery
• Interactive SQL
• Rich analytics
• Workload management
• Data governance
• Backup and recovery
Enterprise
Data
Architecture
1TRENDTREND
ENTERPRISE
USERS
OPERATIONAL
SYSTEMS
ANALYTICAL
SYSTEMS
PRODUCTION
REQUIREMENTS
PRODUCTION
REQUIREMENTS
OUTSIDE SOURCES
© 2014 MapR Technologies 7
Hadoop: The Disruptive Technology at the Core of Big DataTRENDTREND
JOB TRENDS FROM INDEED.COM
Jan ‘06 Jan ‘12 Jan ‘14Jan ‘07 Jan ‘08 Jan ‘09 Jan ‘10 Jan ‘11 Jan ‘13
2
© 2014 MapR Technologies 8
OPERATIONAL
SYSTEMS
ANALYTICAL
SYSTEMS
ENTERPRISE
USERS
1REALITYREALITY
• Data staging
• Archive
• Data transformation
• Data exploration
• Streaming,
interactions
Hadoop Relieves the Pressure from Enterprise Systems
2 Interoperability
1 Reliability and DR
4
Supports operations
and analytics
3 High performance
Keys for Production Success
© 2014 MapR Technologies 9
FOUNDATION
Architecture Matters for Success2REALITYREALITY
Data protection
& security
High performance
Multi-tenancy
Operational &
Analytical Workloads
Open standards
for integration
NEW APPLICATIONS SLAs TRUSTEDINFORMATION LOWERTCO
© 2014 MapR Technologies 10
The Power of the Open Source Community
ManagementManagement
MapR Data Platform
APACHE HADOOP AND OSS ECOSYSTEM
Security
YARN
Pig
Cascading
Spark
Batch
Spark
Streaming
Storm*
Streaming
HBase
Solr
NoSQL &
Search
Juju
Provisioning
&
coordination
Savannah*
Mahout
MLLib
ML, Graph
GraphX
MapReduce
v1 & v2
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow
& Data
Governance
Tez*
Accumulo*
Hive
Impala
Shark
Drill*
SQL
Sentry* Oozie ZooKeeperSqoop
Knox* WhirrFalcon*Flume
Data
Integration
& Access
HttpFS
Hue
* Certification/support planned for 2014
© 2014 MapR Technologies 11
MapR Distribution for Hadoop
ManagementManagement
MapR Data Platform
APACHE HADOOP AND OSS ECOSYSTEM
Security
YARN
Pig
Cascading
Spark
Batch
Spark
Streaming
Storm*
Streaming
HBase
Solr
NoSQL &
Search
Juju
Provisioning
&
coordination
Savannah*
Mahout
MLLib
ML, Graph
GraphX
MapReduce
v1 & v2
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS
Workflow
& Data
Governance
Tez*
Accumulo*
Hive
Impala
Shark
Drill*
SQL
Sentry* Oozie ZooKeeperSqoop
Knox* WhirrFalcon*Flume
Data
Integration
& Access
HttpFS
Hue
* Certification/support planned for 2014
• High availability
• Data protection
• Disaster recovery
• Standard file access
• Standard database
access
• Pluggable services
• Broad developer
support
• Enterprise security
authorization
• Wire-level
authentication
• Data governance
• Ability to support
predictive analytics,
real-time database
operations, and
support high arrival
rate data
• Ability to logically
divide a cluster to
support different use
cases, job types,
user groups, and
administrators
• 2X to 7X higher
performance
• Consistent, low
latency
Enterprise-grade Security OperationalPerformance Multi-tenancyInteroperability
© 2014 MapR Technologies 12
MapR: Best Solution for Customer Success
Top Ranked
Exponential
Growth
500+
Customers
Premier
Investors
3X3X bookings Q1 ‘13 – Q1 ‘14
80%80% of accounts expand 3X
90%90% software licenses
<1%<1% lifetime churn
>$1B>$1B in incremental revenue
generated by 1 customer
© 2014 MapR Technologies 13
MapR and Syncsort Reference Architecture
Sources
RELATIONAL,
SAAS,
MAINFRAME
DOCUMENTS,
EMAILS
LOG FILES,
CLICKSTREAMS
BLOGS,
TWEETS,
LINK DATA
DATA MARTS DATA WAREHOUSE
MapR Data Platform
Business
Intelligence /
Visualization
MapR-DB MapR-FS
Batch
(MR, Spark, Hive, Pig,
…)
Interactive
(Impala, Drill, …)
Streaming
(Spark Streaming,
Storm…)
MAPR DISTRIBUTION FOR HADOOP
© 2014 MapR Technologies 14
Do You Know Syncsort?
• Syncsort provides fast, secure, enterprise‐grade 
software spanning “Big Iron to Big Data” 
• Fastest sort technology in the market
• Powering 50% of mainframes’ sort
• A history of innovation
• 25+ issued & pending patents
• Large global customer base
• 12,000+ deployments in 80 countries and serving 87 of 
the Fortune 100
• First‐to‐market, fully integrated approach to Hadoop 
ETL
• Top 7 contributors to Hadoop. Based on number of 
lines of code changed in 2013
Our customers are achieving the impossible, every 
day!
Our customers are achieving the impossible, every 
day!
Key Partners
© 2014 MapR Technologies 15
The Hadoop Challenge
PROCESS
Sort
JoinAggregate Copy
Merge
DISTRIBUTECOLLECT
Most organizations use Hadoop to…
EExtract
TTransform
LLoad
© 2014 MapR Technologies 16
Turning Hadoop into a Feature-rich ETL Solution
Collect
• Broad based connectivity with automated parallelism 
• Best in class mainframe data access & translation
Process & Distribute
• No manual coding. GUI for developing & maintaining MR jobs
• No code generation. Engine runs natively on each node
• Develop & test locally in Windows; run natively on Hadoop
Optimize & Secure
• Faster throughput per node
• Full support for Kerberos & LDAP
• Web‐based monitoring console
• Sort‐work compression for storage savings
DMX‐h 
ETL
Collect Process
& Distribute
Optimize
& Secure
© 2014 MapR Technologies 17
A Roadmap to Hadoop Success
Agile Data 
Exploration & 
Visualization
Next‐gen Analytics
Cheap Storage
Offload Data 
Warehouse
Enabling The
Data‐driven Organization
Solving The Intractable
IT Problem
17
© 2014 MapR Technologies 18
MapR + Syncsort Solutions
Data Warehouse 
Optimization
Click‐stream 
Analysis
Mainframe Offload
Shift ELT Workloads 
to Hadoop
Access, Translate & Analyze 
Mainframe Data with Hadoop
Collect, Process & Analyze More 
Data from Your Website
© 2014 MapR Technologies 19
Q&AEngage with us!
1. Download the MapR Sandbox for Hadoop: www.mapr.com/sandbox
2. Try Syncsort’s Hadoop ETL in the MapR Sandbox: www.syncsort.com/mapr
3. Learn best practices for Hadoop ETL: www.mapr.com/EDH

Más contenido relacionado

Was ist angesagt?

Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...
Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...
Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...Matt Stubbs
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Distributed graph mining
Distributed graph miningDistributed graph mining
Distributed graph miningSayeed Mahmud
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataMathieu Dumoulin
 
Big data processing with PubSub, Dataflow, and BigQuery
Big data processing with PubSub, Dataflow, and BigQueryBig data processing with PubSub, Dataflow, and BigQuery
Big data processing with PubSub, Dataflow, and BigQueryThuyen Ho
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016Mathieu Dumoulin
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Mathieu Dumoulin
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architecturesArun Kejariwal
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Innovating to Create a Brighter Future for AI, HPC, and Big Data
Innovating to Create a Brighter Future for AI, HPC, and Big DataInnovating to Create a Brighter Future for AI, HPC, and Big Data
Innovating to Create a Brighter Future for AI, HPC, and Big Datainside-BigData.com
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Costing your Bug Data Operations
Costing your Bug Data OperationsCosting your Bug Data Operations
Costing your Bug Data OperationsDataWorks Summit
 

Was ist angesagt? (17)

Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...
Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...
Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Distributed graph mining
Distributed graph miningDistributed graph mining
Distributed graph mining
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
 
Big data processing with PubSub, Dataflow, and BigQuery
Big data processing with PubSub, Dataflow, and BigQueryBig data processing with PubSub, Dataflow, and BigQuery
Big data processing with PubSub, Dataflow, and BigQuery
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Modern real-time streaming architectures
Modern real-time streaming architecturesModern real-time streaming architectures
Modern real-time streaming architectures
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Innovating to Create a Brighter Future for AI, HPC, and Big Data
Innovating to Create a Brighter Future for AI, HPC, and Big DataInnovating to Create a Brighter Future for AI, HPC, and Big Data
Innovating to Create a Brighter Future for AI, HPC, and Big Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Costing your Bug Data Operations
Costing your Bug Data OperationsCosting your Bug Data Operations
Costing your Bug Data Operations
 

Ähnlich wie How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying Hadoop for Deeper Consumer Insights

November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridYahoo Developer Network
 
Control m customers using big data
Control m customers using big dataControl m customers using big data
Control m customers using big dataJuliette Smit
 
Initiative Based Technology Consulting Case Studies
Initiative Based Technology Consulting Case StudiesInitiative Based Technology Consulting Case Studies
Initiative Based Technology Consulting Case Studieschanderdw
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsYong Feng
 
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...Precisely
 
IMS01 IMS Keynote
IMS01   IMS KeynoteIMS01   IMS Keynote
IMS01 IMS KeynoteRobert Hain
 
Mainframe Optimization in 2017
Mainframe Optimization in 2017Mainframe Optimization in 2017
Mainframe Optimization in 2017Precisely
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalAvere Systems
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale
 
Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013ScaleOut Software
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data AnalyticsAmazon Web Services
 
Mainframe Optimization in 2017
Mainframe Optimization in 2017Mainframe Optimization in 2017
Mainframe Optimization in 2017Precisely
 
From Disaster to Recovery: Preparing Your IT for the Unexpected
From Disaster to Recovery: Preparing Your IT for the UnexpectedFrom Disaster to Recovery: Preparing Your IT for the Unexpected
From Disaster to Recovery: Preparing Your IT for the UnexpectedDataCore Software
 
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB
 
Learn the new rules of cloud storage
Learn the new rules of cloud storageLearn the new rules of cloud storage
Learn the new rules of cloud storageBuurst
 
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo AquinoFInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo AquinoHugo Aquino
 
Going Remote: Running VFX Virtual Workstations
Going Remote: Running VFX Virtual WorkstationsGoing Remote: Running VFX Virtual Workstations
Going Remote: Running VFX Virtual WorkstationsAmazon Web Services
 
1 Billion Events per Day, Israel 3rd Java Technology Day, June 22, 2009
1 Billion Events per Day, Israel 3rd Java Technology Day, June 22, 20091 Billion Events per Day, Israel 3rd Java Technology Day, June 22, 2009
1 Billion Events per Day, Israel 3rd Java Technology Day, June 22, 2009Moshe Kaplan
 
Cignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysCignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysMongoDB APAC
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processJampp
 

Ähnlich wie How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying Hadoop for Deeper Consumer Insights (20)

November 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory gridNovember 2013 HUG: Real-time analytics with in-memory grid
November 2013 HUG: Real-time analytics with in-memory grid
 
Control m customers using big data
Control m customers using big dataControl m customers using big data
Control m customers using big data
 
Initiative Based Technology Consulting Case Studies
Initiative Based Technology Consulting Case StudiesInitiative Based Technology Consulting Case Studies
Initiative Based Technology Consulting Case Studies
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
Using Mainframe Data in the Cloud: Design Once, Deploy Anywhere in a Hybrid W...
 
IMS01 IMS Keynote
IMS01   IMS KeynoteIMS01   IMS Keynote
IMS01 IMS Keynote
 
Mainframe Optimization in 2017
Mainframe Optimization in 2017Mainframe Optimization in 2017
Mainframe Optimization in 2017
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute final
 
RightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to CloudRightScale Roadtrip Boston: Accelerate to Cloud
RightScale Roadtrip Boston: Accelerate to Cloud
 
Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013Real-time analysis using an in-memory data grid - Cloud Expo 2013
Real-time analysis using an in-memory data grid - Cloud Expo 2013
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Mainframe Optimization in 2017
Mainframe Optimization in 2017Mainframe Optimization in 2017
Mainframe Optimization in 2017
 
From Disaster to Recovery: Preparing Your IT for the Unexpected
From Disaster to Recovery: Preparing Your IT for the UnexpectedFrom Disaster to Recovery: Preparing Your IT for the Unexpected
From Disaster to Recovery: Preparing Your IT for the Unexpected
 
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
 
Learn the new rules of cloud storage
Learn the new rules of cloud storageLearn the new rules of cloud storage
Learn the new rules of cloud storage
 
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo AquinoFInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
FInal Project - USMx CC605x Cloud Computing for Enterprises - Hugo Aquino
 
Going Remote: Running VFX Virtual Workstations
Going Remote: Running VFX Virtual WorkstationsGoing Remote: Running VFX Virtual Workstations
Going Remote: Running VFX Virtual Workstations
 
1 Billion Events per Day, Israel 3rd Java Technology Day, June 22, 2009
1 Billion Events per Day, Israel 3rd Java Technology Day, June 22, 20091 Billion Events per Day, Israel 3rd Java Technology Day, June 22, 2009
1 Billion Events per Day, Israel 3rd Java Technology Day, June 22, 2009
 
Cignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysCignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdays
 
Processing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the processProcessing 19 billion messages in real time and NOT dying in the process
Processing 19 billion messages in real time and NOT dying in the process
 

Mehr von MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0MapR Technologies
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications MapR Technologies
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR Technologies
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceMapR Technologies
 
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataBaptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataMapR Technologies
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital TransformationMapR Technologies
 

Mehr von MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
 
Baptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big DataBaptist Health: Solving Healthcare Problems with Big Data
Baptist Health: Solving Healthcare Problems with Big Data
 
The Keys to Digital Transformation
The Keys to Digital TransformationThe Keys to Digital Transformation
The Keys to Digital Transformation
 

Último

Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfInfopole1
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTopCSSGallery
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.IPLOOK Networks
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingMAGNIntelligence
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and businessFrancesco Corti
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveIES VE
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfTejal81
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxNeo4j
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdfThe Good Food Institute
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxKaustubhBhavsar6
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Alkin Tezuysal
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applicationsnooralam814309
 

Último (20)

Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdf
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development Companies
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced Computing
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
SheDev 2024
SheDev 2024SheDev 2024
SheDev 2024
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and business
 
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES LiveKeep Your Finger on the Pulse of Your Building's Performance with IES Live
Keep Your Finger on the Pulse of Your Building's Performance with IES Live
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptx
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
 
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through TokenizationStobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
Stobox 4: Revolutionizing Investment in Real-World Assets Through Tokenization
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
 

How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying Hadoop for Deeper Consumer Insights

  • 1. © comScore, Inc. Proprietary. Syncsort & MapR @ comScore Michael Brown, CTO | July 9th, 2014
  • 2. © comScore, Inc. Proprietary.© comScore, Inc. Proprietary. The comScore Story Analytics for a Digital World™
  • 3. © comScore, Inc. Proprietary. 3 The Digital World is Complex V0113
  • 4. © comScore, Inc. Proprietary. 4 comScore’s Mission Be the Leader in Digital Media Analytics. Measure all forms of media—content and advertising—at scale, across all platforms, in real-time, globally.
  • 5. © comScore, Inc. Proprietary. 5 comScore Brings it Together TabletPC/Mac TV SmartphoneGaming V0113
  • 6. © comScore, Inc. Proprietary. 6 comScore is a leading internet technology company that provides Analytics for a Digital World™ NASDAQ SCOR Clients 2,400+ Worldwide Employees 1,200+ Headquarters Reston, Virginia, USA Global Coverage Measurement from 172 Countries; 44 Markets Reported Local Presence 32 Locations in 23 Countries V0113
  • 7. © comScore, Inc. Proprietary. 7 Providing Analytics For More Than 2,400+ Clients Globally Media Agencies Telecom/Mobile Financial Retail Travel CPG Health Technology V0113
  • 8. © comScore, Inc. Proprietary. 8 Census Tags & Data Feeds Panels PC, iOS, Android Survey Non-behavioral elements Methods Aggregation Dictionaries Taxonomies Syndicated Data Platform Media Metrix vCE Collection Calibration Delivery Consulting Analysis Models Weighting Projection De-Duplication Attribution Turning Big Data into Powerful Insight Client Analytics Platform Digital Analytix
  • 9. © comScore, Inc. Proprietary. 9
  • 10. © comScore, Inc. Proprietary. 10 Panel Heat Map
  • 11. © comScore, Inc. Proprietary. 11 Average Records Captured per Day (2005-2009) - 200,000,000 400,000,000 600,000,000 800,000,000 1,000,000,000 1,200,000,000 1,400,000,000 1,600,000,000 1,800,000,000 9/26/2005 10/26/2005 11/26/2005 12/26/2005 1/26/2006 2/26/2006 3/26/2006 4/26/2006 5/26/2006 6/26/2006 7/26/2006 8/26/2006 9/26/2006 10/26/2006 11/26/2006 12/26/2006 1/26/2007 2/26/2007 3/26/2007 4/26/2007 5/26/2007 6/26/2007 7/26/2007 8/26/2007 9/26/2007 10/26/2007 11/26/2007 12/26/2007 1/26/2008 2/26/2008 3/26/2008 4/26/2008 5/26/2008 6/26/2008 7/26/2008 8/26/2008 9/26/2008 10/26/2008 11/26/2008 12/26/2008 1/26/2009 2/26/2009 3/26/2009
  • 12. © comScore, Inc. Proprietary. 12 CENSUS Unified Digital Measurement™ (UDM) Establishes Platform For Panel + Census Data Integration Adopted by 90% of Top 100 U.S. Media Properties PANEL Unified Digital Measurement (UDM) Patent-Pending Methodology Global PERSON Measurement Global DEVICE Measurement V0411
  • 13. © comScore, Inc. Proprietary. 13 Beacon Heat Map
  • 14. © comScore, Inc. Proprietary. 14 Monthly Records Collection Billion 200 Billion 400 Billion 600 Billion 800 Billion 1,000 Billion 1,200 Billion 1,400 Billion 1,600 Billion 1,800 Billion 2,000 Billion #ofrecords Beacon Records Panel Records Total records collected in June 2014 = 1,726,563,202,649 Total records collected YTD 2014 = 10,037,131,368,475
  • 15. © comScore, Inc. Proprietary. DMX @ comScore
  • 16. © comScore, Inc. Proprietary. 16 DMX use at comScore Purchased our first 4 licenses in 2000! We use DMX from Syncsort across hundreds of servers for efficient data processing and aggregation. We currently run over 100+ unique jobs every day. With these jobs we process over 150 billion rows of data through DMX! Connect Design Process Accelerate
  • 17. © comScore, Inc. Proprietary. 17 Compression w/Sorting Compress Log Files when processing large volumes of log data Several advantages to Sorting Data First:  Reduces the size of the data  Improves application performance Examples:  1 Hour of one source of our data 2,315 GB raw (2.9 billion rows)  Standard compression of time ordered data is 509 GB (22% of original)  Standard compression on a sorted set is 324 GB (14% of original) When applied to all our sources we save  5.0 TB per day  155 TB per month  460 TB per quarter
  • 18. © comScore, Inc. Proprietary. Hadoop @ comScore
  • 19. © comScore, Inc. Proprietary. 19 Why Hadoop? • comScore built our own distributed computing stack in 2002. • In 2009 we decided it was better to leverage the efforts of the Hadoop community instead of building our own stack. • We recognized the benefit of switching to Hadoop which would allow for seamless scaling of our infrastructure to meet the needs of the business. • Hadoop allows us to add compute, storage and memory linearly and allows you to process things at tremendous scale. • Partnered with SyncSort on their Hadoop efforts from Oct 2010 • Evaluated the beta of MapR in the fall of 2011
  • 20. © comScore, Inc. Proprietary. 20 90 Days of Data 1,148 1,919 3,049 4,862 5,084 Trillion 1,000 Trillion 2,000 Trillion 3,000 Trillion 4,000 Trillion 5,000 Trillion 6,000 Trillion 2009 2010 2011 2012 2013 2014 2016
  • 21. © comScore, Inc. Proprietary. 21 High Level Data Flow Panel Census Custom Code + ADW EDW Delivery
  • 22. © comScore, Inc. Proprietary. 22 Our Cluster Production Hadoop Cluster  400+ nodes: Mix of Dell 720xd, R710 and R510 servers  Each R720xd has (24x1.2TB drives; 128GB RAM; 32 cores)  13,800+ total CPUs  31.6 TB total memory  8.2 PB total disk space  Our distro is MapR M5 2.1.3
  • 23. © comScore, Inc. Proprietary. Leveraging Partitions from MapR
  • 24. © comScore, Inc. Proprietary.
  • 25. © comScore, Inc. Proprietary. Validation Funnel & Target Effectiveness
  • 26. © comScore, Inc. Proprietary. 26 Our growth As our volume has grown we have the following stats:  Over 683 billion events per month  Daily Aggregate 1.8 billion  160 billion aggregate records for 92 days  146K Campaigns  Over 50 countries  We see 15 billion distinct cookies in a month  We only need to output 26 million rows
  • 27. © comScore, Inc. Proprietary. 27 Solution to reduce the shuffle The Problem:  Most aggregations within comScore can not take advantage of combiners, leading to large shuffles and job performance issues The Idea:  Partition and sort the data by cookie on a daily basis  Create a custom InputFormat to merge daily partitions for monthly aggregations
  • 28. © comScore, Inc. Proprietary. 28 Custom Input Format with Map Side Aggregation CB Mapper MapperMapperMap Map Map Reduce ReduceReduce BA AC A B C A B C Combiner Combiner Combiner A B C
  • 29. © comScore, Inc. Proprietary. 29 Risks for Partitioning Data locality  Custom InputFormat requires reading blocks of the partitioned data over the network  This was solved using a feature of the MapR file system. We created volumes and set the chunk size to zero which guarantees that the data written to a volume will stay on one node Map failures might result in long run times  Size of the map inputs is no longer set by block size  This was solved by creating a large number (10K) of volumes to limit the size of data processed by each mapper
  • 30. © comScore, Inc. Proprietary. 30 Partitioning Summary Benefits:  A large portion of the aggregation can be completed in the map phase  Applications can now take advantage of combiners  Shuffles sizes are minimal Results:  Took a job from 35 hours to 3 hours with no hardware changes
  • 31. © comScore, Inc. Proprietary. DMX-h @ comScore
  • 32. © comScore, Inc. Proprietary. 32 Reasons for comScore selecting DMX-h Performance • DMX-h as the pluggable sort in Hadoop allows us to increase throughput on it’s existing platform; this reduces capital and ongoing operational expenses • The increase in throughput allows us to also deliver our data more quickly to our customers. These things make the data more valuable to our clients. Speed of Development • The ability to quickly build out applications in the DMX-h GUI allows us to iterate and respond quicker to the needs of the business. • The ease of development also allows us to democratize the access to the Hadoop platform by leveraging a point and click GUI.
  • 33. © comScore, Inc. Proprietary. 33 Performance - DMx Pluggable Sort Testing Results First Comparison Run on our Dev Cluster Pig scripts and called with SyncSort plug in GroupBy / Distinct Operations • Counting uniques • These have large shuffle steps which leads to more data to sort. • Observed up to a 20% decrease in job runtime Filter Operations • Searching for a specific value • Observed a 5% – 10% decrease in job runtime • Dependent on type of filter and size of job output 40GB compressed data, base run is 86 min, test run is 68 min; Savings of 20% Results from 7 Nodes; 56 cores; 433 GB RAM; 28 TB disk; MapR M5 3.0.2; DMX-h 7.12
  • 34. © comScore, Inc. Proprietary. 34 Speed of Development - POC We took an existing process that runs in our Hadoop cluster and converted that to DMX-h to validate the new capabilities. The existing process: • Written in 75 lines of Pig with 3 Java UDFs • Developed in about 25 hours • Processes 3.5 billion input rows per day • Takes 35 minutes to run on a daily basis
  • 35. © comScore, Inc. Proprietary. 35 DMXh-Process
  • 36. © comScore, Inc. Proprietary. 36 Speed of Development - POC The new process in DMX-h: • Developed a new job with 13 tasks • No Java UDF required • Runs on the same data and in the same environment. • Developed in 12 hours. • Runs in 11 minutes! 1/3 of the time of the Pig & Java code.
  • 37. © comScore, Inc. Proprietary. 37 Useful Factoids Visit www.comscoredatamine.com or follow @datagems for the latest gems. Colorful, bite-sized graphical representations of the best discoveries we unearth.
  • 38. © comScore, Inc. Proprietary. 38 Thank You! Michael Brown CTO comScore, Inc. mbrown@comscore.com
  • 39. © 2014 MapR Technologies 1© 2014 MapR Technologies
  • 40. © 2014 MapR Technologies 2 Today’s Presenters Steve Wooledge VP - Product Marketing @swooledge Jorge Lopez Director - Product Marketing @zanilli Mike Brown CTO
  • 41. © 2014 MapR Technologies 3© 2014 MapR Technologies comScore
  • 42. © comScore, Inc. Proprietary. Syncsort & MapR @ comScore • Michael Brown, CTO | July 9th, 2014
  • 43. © 2014 MapR Technologies 5© 2014 MapR Technologies Leveraging MapR and Syncsort
  • 44. © 2014 MapR Technologies 6 Big Data is Overwhelming Traditional Systems • Mission-critical reliability • Transaction guarantees • Deep security • Real-time performance • Backup and recovery • Interactive SQL • Rich analytics • Workload management • Data governance • Backup and recovery Enterprise Data Architecture 1TRENDTREND ENTERPRISE USERS OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS PRODUCTION REQUIREMENTS PRODUCTION REQUIREMENTS OUTSIDE SOURCES
  • 45. © 2014 MapR Technologies 7 Hadoop: The Disruptive Technology at the Core of Big DataTRENDTREND JOB TRENDS FROM INDEED.COM Jan ‘06 Jan ‘12 Jan ‘14Jan ‘07 Jan ‘08 Jan ‘09 Jan ‘10 Jan ‘11 Jan ‘13 2
  • 46. © 2014 MapR Technologies 8 OPERATIONAL SYSTEMS ANALYTICAL SYSTEMS ENTERPRISE USERS 1REALITYREALITY • Data staging • Archive • Data transformation • Data exploration • Streaming, interactions Hadoop Relieves the Pressure from Enterprise Systems 2 Interoperability 1 Reliability and DR 4 Supports operations and analytics 3 High performance Keys for Production Success
  • 47. © 2014 MapR Technologies 9 FOUNDATION Architecture Matters for Success2REALITYREALITY Data protection & security High performance Multi-tenancy Operational & Analytical Workloads Open standards for integration NEW APPLICATIONS SLAs TRUSTEDINFORMATION LOWERTCO
  • 48. © 2014 MapR Technologies 10 The Power of the Open Source Community ManagementManagement MapR Data Platform APACHE HADOOP AND OSS ECOSYSTEM Security YARN Pig Cascading Spark Batch Spark Streaming Storm* Streaming HBase Solr NoSQL & Search Juju Provisioning & coordination Savannah* Mahout MLLib ML, Graph GraphX MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Governance Tez* Accumulo* Hive Impala Shark Drill* SQL Sentry* Oozie ZooKeeperSqoop Knox* WhirrFalcon*Flume Data Integration & Access HttpFS Hue * Certification/support planned for 2014
  • 49. © 2014 MapR Technologies 11 MapR Distribution for Hadoop ManagementManagement MapR Data Platform APACHE HADOOP AND OSS ECOSYSTEM Security YARN Pig Cascading Spark Batch Spark Streaming Storm* Streaming HBase Solr NoSQL & Search Juju Provisioning & coordination Savannah* Mahout MLLib ML, Graph GraphX MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Governance Tez* Accumulo* Hive Impala Shark Drill* SQL Sentry* Oozie ZooKeeperSqoop Knox* WhirrFalcon*Flume Data Integration & Access HttpFS Hue * Certification/support planned for 2014 • High availability • Data protection • Disaster recovery • Standard file access • Standard database access • Pluggable services • Broad developer support • Enterprise security authorization • Wire-level authentication • Data governance • Ability to support predictive analytics, real-time database operations, and support high arrival rate data • Ability to logically divide a cluster to support different use cases, job types, user groups, and administrators • 2X to 7X higher performance • Consistent, low latency Enterprise-grade Security OperationalPerformance Multi-tenancyInteroperability
  • 50. © 2014 MapR Technologies 12 MapR: Best Solution for Customer Success Top Ranked Exponential Growth 500+ Customers Premier Investors 3X3X bookings Q1 ‘13 – Q1 ‘14 80%80% of accounts expand 3X 90%90% software licenses <1%<1% lifetime churn >$1B>$1B in incremental revenue generated by 1 customer
  • 51. © 2014 MapR Technologies 13 MapR and Syncsort Reference Architecture Sources RELATIONAL, SAAS, MAINFRAME DOCUMENTS, EMAILS LOG FILES, CLICKSTREAMS BLOGS, TWEETS, LINK DATA DATA MARTS DATA WAREHOUSE MapR Data Platform Business Intelligence / Visualization MapR-DB MapR-FS Batch (MR, Spark, Hive, Pig, …) Interactive (Impala, Drill, …) Streaming (Spark Streaming, Storm…) MAPR DISTRIBUTION FOR HADOOP
  • 52. © 2014 MapR Technologies 14 Do You Know Syncsort? • Syncsort provides fast, secure, enterprise‐grade  software spanning “Big Iron to Big Data”  • Fastest sort technology in the market • Powering 50% of mainframes’ sort • A history of innovation • 25+ issued & pending patents • Large global customer base • 12,000+ deployments in 80 countries and serving 87 of  the Fortune 100 • First‐to‐market, fully integrated approach to Hadoop  ETL • Top 7 contributors to Hadoop. Based on number of  lines of code changed in 2013 Our customers are achieving the impossible, every  day! Our customers are achieving the impossible, every  day! Key Partners
  • 53. © 2014 MapR Technologies 15 The Hadoop Challenge PROCESS Sort JoinAggregate Copy Merge DISTRIBUTECOLLECT Most organizations use Hadoop to… EExtract TTransform LLoad
  • 54. © 2014 MapR Technologies 16 Turning Hadoop into a Feature-rich ETL Solution Collect • Broad based connectivity with automated parallelism  • Best in class mainframe data access & translation Process & Distribute • No manual coding. GUI for developing & maintaining MR jobs • No code generation. Engine runs natively on each node • Develop & test locally in Windows; run natively on Hadoop Optimize & Secure • Faster throughput per node • Full support for Kerberos & LDAP • Web‐based monitoring console • Sort‐work compression for storage savings DMX‐h  ETL Collect Process & Distribute Optimize & Secure
  • 55. © 2014 MapR Technologies 17 A Roadmap to Hadoop Success Agile Data  Exploration &  Visualization Next‐gen Analytics Cheap Storage Offload Data  Warehouse Enabling The Data‐driven Organization Solving The Intractable IT Problem 17
  • 56. © 2014 MapR Technologies 18 MapR + Syncsort Solutions Data Warehouse  Optimization Click‐stream  Analysis Mainframe Offload Shift ELT Workloads  to Hadoop Access, Translate & Analyze  Mainframe Data with Hadoop Collect, Process & Analyze More  Data from Your Website
  • 57. © 2014 MapR Technologies 19 Q&AEngage with us! 1. Download the MapR Sandbox for Hadoop: www.mapr.com/sandbox 2. Try Syncsort’s Hadoop ETL in the MapR Sandbox: www.syncsort.com/mapr 3. Learn best practices for Hadoop ETL: www.mapr.com/EDH