SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Downloaden Sie, um offline zu lesen
Welcome to Today’s
DBTA Roundtable Web Event
Stephen Faig
Business Development Manager
Unisphere Media
Publishers of DBTA
Hadoop and Your Enterprise Data Warehouse
Nitin Bandugula
Product Marketing Manager
MapR Technologies
Kevin Petrie
Senior Director
Attunity
George Corugedo
Chief Technology Officer & Co-Founder
RedPoint Global Inc.
© 2015 MapR Technologies 5© 2015 MapR Technologies
© 2015 MapR Technologies 6
Empowering “as it happens”
businesses by speeding up the
data-to-action cycle
© 2015 MapR Technologies 7
Top-Ranked NoSQL
Top-Ranked Hadoop
Distribution
Top-Ranked SQL-on-Hadoop
Solution
© 2015 MapR Technologies 8
Topics
• The Need for EDW Optimization
• Different Stages of the Optimization
• MapR Customer Examples
• The MapR Advantage
© 2015 MapR Technologies 9© 2015 MapR Technologies
The Need for EDW Optimization
© 2015 MapR Technologies 10
Technical Best-Practices Driving Change in Data Architecture
2
Speed of
operations
1
Scale of
analytics
Source: TDWI, April 2014
© 2015 MapR Technologies 11
Unused Data,
Related Loads
EDW
ELT
Unused
Tables
(72%)
ELT
• 70% of data is unused
• Almost 60% of CPU capacity is ETL/ELT
• 15% of CPU consumed by ETL to load
unused data
• 30% of CPU consumed by 5% of resource
consuming ETL workloads.
Meanwhile…The Industry Norm in the DW
© 2015 MapR Technologies 12
Data
IT Budgets
Force of Adoption: Costs
Hadoop TAM comes from disrupting enterprise data warehouse and storage spending
• Gartner, "Forecast Analysis: Enterprise IT Spending by Vertical Industry Market, Worldwide, 2010-2016, 3Q12 Update.“
• Wall Street Journal, “Financial Services Companies Firms See Results from Big Data Push”, Jan. 27, 2014
$9,000
$40,000
<$1,000
DATA GROWING
AT 40%
2013
ENTERPRISE
STORAGE
IT BUDGETS
GROWING AT 2.5%
2014 2015 2016 2017
DATABASE
WAREHOUSE
$ PER TERABYTE
HADOOP
© 2015 MapR Technologies 13
SCALE: New Data Sources Unlock New Insights & Apps
Existing structured data
• Well-defined and well-
understood schema
– OLTP data
– Data warehouse data
– End user data stores (e.g.,
Excel, Access)
New multi-structured data
• Typically un-modeled,
different in format
– Log data
– Clickstream data
– Sensor data
– Rich media (e.g., audio, video)
– Documents
Both types needed today for deeper insights
© 2015 MapR Technologies 14© 2015 MapR Technologies
Stages of the Optimization
© 2015 MapR Technologies 15
Stage 1: Offload Cold Data – Free up DW space
Structured
Data
ETLIncoming
Data
Data Warehouse
Hadoop Platform
• Unused data
moved out
• ETL done the
traditional way
• Critical data
available for query
Data Access:
– BI through ODBC
– Hive Connectors
Cold Data
Offload
Restored
Disk
© 2015 MapR Technologies 16
Stage 2: ETL In Hadoop
Low Latency Data
ETL
Incoming
Data
Data Warehouse
Hadoop Platform
Bulk Data
Restored
CPU and
Disk
• ETL now done on
Hadoop
• Analytics through
EDW as well as
Hadoop
• Restores even more
CPU and Disk
• Improves old DW
Response and Speed
© 2015 MapR Technologies 17
Stage 3: Hadoop Optimized Data Architecture
Sources
RELATIONAL,
SAAS,
MAINFRAME
DOCUMENTS,
EMAILS
LOG FILES,
CLICKSTREAMS
SENSORS
BLOGS,
TWEETS,
LINK DATA
DATA WAREHOUSE
Data Movement
Data Access
Analytics
Search
Schema-less
data exploration
BI, reporting
Ad-hoc integrated
analytics
Data Transformation, Enrichment
and Integration
MAPR DISTRIBUTION FOR HADOOP
Streaming
(Spark Streaming,
Storm)
NoSQL ODBMS
(HBase, Accumulo, …)
MapR Data Platform
MapR-DB
MAPR DISTRIBUTION FOR HADOOP
Batch /
Search
(MR, Spark, Pig, …)
MapR-FS
Operational Apps
Recommendations
Fraud Detection
Logistics
Optimized Data Architecture Machine Learning
SQL
Analytics
(Hive, Drill …)
© 2015 MapR Technologies 18© 2015 MapR Technologies
MapR Customer Examples
© 2015 MapR Technologies 19
MapR Customer Success for Enterprise Data Hub
• EDH most common use case
• Across industries including
- Financial services
- Telecommunications
- Government
- Healthcare
- Technology
© 2015 MapR Technologies 20
Cisco - 360° Customer View
Deepening customer relationships and increasing sales opportunities
• Improve customer satisfaction and sales opportunities by integrating all
customer data into one dashboard, accessible across company divisions
• Provide a consistent and proactively knowledgeable customer experience
• Integrate all customer data across silos into a central data repository
• Continually feed real-time customer data into the repository
• Provide a real-time view of each customer across company divisions:
marketing, support, finance, point of sale, etc.
OBJECTIVES
CHALLENGES
SOLUTION
Cisco’s 360° customer view solution enabled them to analyze service sales
opportunities in 1/10 the time, at 1/10 the cost, and generated $40 million in
incremental service bookings in the first year.
Business
Impact
• Central data repository results in lower cost and reduced complexity
• Accelerates analysis cycle time and rapid actions
• Provides high availability and disaster recovery
© 2015 MapR Technologies 21
F100 Telco - Data Warehouse Optimization
Improve data services to customers while reducing enterprise architecture costs
• Provide cloud, security, managed services, data center, & comms
• Report on customer usage, profiles, billing, and sales metrics
• Improve service: Measure service quality and repair metrics
• Reduce customer churn – identify and address IP network hotspots
• Cost of ETL & DW storage for growing IP and clickstream data; >3 months
• Reliability & cost of Hadoop alternatives limited ETL & storage offload
• MapR Data Platform for data staging, ETL, and storage at 1/10th the cost
• MapR provided smallest datacenter footprint with best DR solution
• Enterprise-grade: NFS file management, consistent snapshots & mirroring
OBJECTIVES
CHALLENGES
SOLUTION
• Increased scale to handle network IP and clickstream data
• Reduced workload on DW to maintain reporting SLA’s to business
• Unlocked new insights into network usage and customer preferences
Business
Impact
FORTUNE 100
TELCO
© 2015 MapR Technologies 22© 2015 MapR Technologies
MapR Enterprise Data Hub Solution
© 2015 MapR Technologies 23
MapR Enterprise Data Hub
• Scale - Reliability Across the Enterprise
– Advanced multi-tenancy
– Business continuity – HA, DR
• Speed
– 2-7x faster than other Hadoop distro’s
– Ultra-fast data ingest, NFS, & R/W file system
• Self-Service Data Exploration
– On-the-fly SQL without up-front schema
– ANSI SQL: use existing BI/DW investments
The Hadoop platform of choice for big & fast data-driven apps
Security
Streaming
NoSQL & Search
Provisioning
&
coordination
ML, Graph
W orkflow
& Data Governance
Batch
SQL
INTEGRATED
COMMERCIAL
ENGINES
TOOLSCOMPUTE
ENGINES
Batch
Interactive
Real-time
Online
Others
Management
Operations
Governance
Audits
Security
MapR-FS MapR-DB
MapR Data Platform
© 2015 MapR Technologies 24
Traditional
Approach
Drill: Agility by Reducing Distance to Data
Short analytic life cycles with no upfront schema creation and management
Hadoop Data Schema Design Transformation Data Movement Users
Hadoop Data Users
New Business Questions
Total Time to Value: Weeks to Months
Total Time to Value: Minutes
New
Approach
Data Preparation
New Business Questions
Drill enables the
“As-It-Happens” business
with instant SQL analytics
on complex data
FROM:
TO:
© 2015 MapR Technologies 25
Thank You
@mapr maprtech
nitin@mapr.com
MapRTechnologies
maprtech
mapr-technologies
Freeon-demand
Hadoop training leading to certification
Start becoming an expert now
mapr.com/training
Data Quality in the Data Hub
February 2015
27  RedPoint Global Inc. 2015 Confidential
Overview of RedPoint Global
Launched 2006
Founded and staffed by industry veterans
Headquarters: Wellesley, Massachusetts
Offices in US, UK, Australia, Philippines
Global customer base
Serves most major industries MAGIC QUADRANT
Data Quality
MAGIC QUADRANT
Multichannel Campaign
Management
MAGIC QUADRANT
Integrated Marketing
Management
28  RedPoint Global Inc. 2015 Confidential
Extensive experience with a diverse customer base
29  RedPoint Global Inc. 2015 Confidential
Cloudera Stack
30  RedPoint Global Inc. 2015 Confidential
Andrew Brust, GigaOm Research
31  RedPoint Global Inc. 2015 Confidential
There is lots of Hype Out There
32  RedPoint Global Inc. 2015 Confidential
Don’t believe the Marketing Hype
33  RedPoint Global Inc. 2015 Confidential
Data Hub for MDM
Data Hub
1          
          
          
 
 
 n
YARN
Production RDBMS
Databases
DataIngestion
Specialized Analytic
Databases & Caches
Any analytics
Any reporting
Predictive Analytics
Clustering
Profiling
Analytics
Marketing Automation
Real Time Personalization
Omni-Channel Optimization
Digital and Traditional Channels
Interaction Systems
DataQualityProcessing
Persistent Entity Resolution, Linkage and Keying
34  RedPoint Global Inc. 2015 Confidential
How About MDM on a Data Lake?
• Severe shortage of Map Reduce skilled
resources
• Inconsistent skills lead to inconsistent
results of code based solutions
• Nascent technologies require multiple
point solutions
• Technologies are not enterprise grade
• Some functionality may not be possible
within these frameworks
Challenges to Data Lake Approach
• Data is ingested in its raw state
regardless of format, structure or lack of
structure
• Raw data can be used and reused for
differing purposes across the enterprise
• Beyond inexpensive storage, Hadoop is
an extremely power and scalable and
segmentable computational platform
• Master Data can be fed across the
enterprise and deep analytics on clean
data is immediately enabled
Benefits of a Hadoop Data Lake
35  RedPoint Global Inc. 2015 Confidential
Key Functions for Master Data Management
Master Key Management
ETL & ELT Data Quality
Web Services Integration
Integration & Matching
Process Automation
& Operations
• Profiling, reads/writes,
transformations
• Single project for all jobs
• Cleanse data
• Parsing, correction
• Geo-spatial analysis
• Grouping
• Fuzzy match
• Create keys
• Track changes
• Maintain matches
over time
• Consume and publish
• HTTP/HTTPS protocols
• XML/JSON/SOAP formats
• Job scheduling, monitoring,
notifications
• Central point of control
• Meta Data Management
36  RedPoint Global Inc. 2015 Confidential
Overview - What is Hadoop/Hadoop 2.0
Hadoop 1.0
• All operations based on Map Reduce
• Intrinsic inconsistency of code based
solutions
• Highly skilled and expensive resources
needed
• 3rd party applications constrained by the
need to generate code
Hadoop 2.0
• Introduction of the YARN:
“a general-purpose, distributed, application
management framework that supersedes the classic
Apache Hadoop MapReduce framework for
processing data in Hadoop clusters.”
• Mature applications can now operate
directly on Hadoop
• Reduce skill requirements and increased
consistency
37  RedPoint Global Inc. 2015 Confidential
RedPoint Data Management on Hadoop
Partitioning
AM / Tasks
Execution
AM / Tasks
Data I/O
Key / Split
Analysis
Parallel Section
YARN
MapReduce
38  RedPoint Global Inc. 2015 Confidential
Resource
Manager
Launches
Tasks
Node Manager
DM App Master
DM Task
Node Manager
DM Task
DM Task
Node Manager
DM Task
DM Task
Launches DM
App Master
Data Management
Designer
DM
Execution
Server
Parallel Section
Running DM Task
1
2
3
RedPoint DM for Hadoop: Processing Flow
39  RedPoint Global Inc. 2015 Confidential
Reference Hadoop Architecture
Monitoring and Management Tools
Management
MAPREDUCE
REST
DATA REFINEMENT
HIVEPIG
HTTP
STREAM
STRUCTURE
HCATALOG
(metadata services)
Query/Visualization/
Reporting/Analytical
Tools and Apps
SOURCE
DATA
- Sensor Logs
- Clickstream
- Flat Files
- Unstructured
- Sentiment
- Customer
- Inventory
DBs
JMS
Queue’s
Fil
es
Fil
esFiles
Data Sources
RDBMS
EDW
INTERACTIVE
HIVE Server2
LOAD
SQOOP
WebHDFS
Flume
NFS
LOAD
SQOOP/Hive
Web HDFS
YARN
         
          
          
 
 
 n
1            

           
           
            
HDFS
RedPoint Functional Footprint
40  RedPoint Global Inc. 2015 Confidential
>150 Lines of MR Code ~50 Lines of Script Code 0 Lines of Code
6 hours of development 3 hours of development 15 min. of development
6 minutes runtime 15 minutes runtime 3 minutes runtime
Extensive optimization
needed
User Defined Functions
required prior to running
script
No tuning or optimization
required
RedPoint
Benchmarks – Project Gutenberg
Map Reduce Pig
Sample MapReduce (small subset of the entire code which totals nearly 150 lines):
public static class MapClass
extends Mapper<WordOffset, Text, Text, IntWritable> {
private final static String delimiters =
"',./<>?;:"[]{}-=_+()&*%^#$!@`~ |«»¡¢£¤¥¦©¬®¯±¶·¿";
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(WordOffset key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer itr = new StringTokenizer(line, delimiters);
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
Sample Pig script without the UDF:
SET pig.maxCombinedSplitSize 67108864
SET pig.splitCombination true
A = LOAD '/testdata/pg/*/*/*';
B = FOREACH A GENERATE FLATTEN(TOKENIZE((chararray)$0)) AS word;
C = FOREACH B GENERATE UPPER(word) AS word;
D = GROUP C BY word;
E = FOREACH D GENERATE COUNT(C) AS occurrences, group;
F = ORDER E BY occurrences DESC;
STORE F INTO '/user/cleonardi/pg/pig-count';
41  RedPoint Global Inc. 2015 Confidential
Data Lake Architecture for MDM
42  RedPoint Global Inc. 2015 Confidential
Recommendations for Data Quality
• There is a gap between current use and the
mainstream
• Don’t believe the hype; there’s plenty of it
• Data Quality creates trust in information which
enables confident and nimble decision making.
• Look for broad enterprise apps that have
solved the parallel scalability problem
• Consider a Data Hub approach for Data Quality
for maximum flexibility and scalable
performance
43  RedPoint Global Inc. 2015 Confidential
George Corugedo
Chief Technology Officer
George.corugedo@redpoint.net
781.725.0252
Download our white paper
From Yawn to Yarn: Why You Should be
Excited about Hadoop
Redpoint.net/dbtawebinar
Question and Answer Session
(please submit questions)
Nitin Bandugula
Product Marketing Manager
MapR Technologies
Kevin Petrie
Senior Director
Attunity
George Corugedo
Chief Technology Officer & Co-Founder
RedPoint Global Inc.
Please use the same URL you used to view today’s live event
for the archive event, plus we will be sending you a follow-up
email with that URL once the archive is posted!
Thank you for participating in
today’s roundtable web event
Just by attending this event the winner of the
$100 AmEx Gift Card is…….

Weitere ähnliche Inhalte

Was ist angesagt?

Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisNetAppUK
 
The Impact of SAP Hana on the SAP Infrastructure Utility Services Marketplace
The Impact of SAP Hana on the SAP Infrastructure Utility Services MarketplaceThe Impact of SAP Hana on the SAP Infrastructure Utility Services Marketplace
The Impact of SAP Hana on the SAP Infrastructure Utility Services MarketplaceLisa Milani, MBA
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the OrganizationSeeling Cheung
 
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...VMware Tanzu
 
IT Modernization in Practice
IT Modernization in PracticeIT Modernization in Practice
IT Modernization in PracticeTom Diederich
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...Hortonworks
 
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...DataWorks Summit
 
Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013Michael Hiskey
 
Modern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BIModern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BIKognitio
 
HP Enterprises in Hana Pankaj Jain May 2016
HP Enterprises in Hana Pankaj Jain May 2016HP Enterprises in Hana Pankaj Jain May 2016
HP Enterprises in Hana Pankaj Jain May 2016INDUSCommunity
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalDiego Alberto Tamayo
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsSeeling Cheung
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...Big Data Montreal
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicDataWorks Summit
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
 
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-HadoopHP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-HadoopMapR Technologies
 

Was ist angesagt? (20)

Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis Kapsalis
 
Operational-Analytics
Operational-AnalyticsOperational-Analytics
Operational-Analytics
 
The Impact of SAP Hana on the SAP Infrastructure Utility Services Marketplace
The Impact of SAP Hana on the SAP Infrastructure Utility Services MarketplaceThe Impact of SAP Hana on the SAP Infrastructure Utility Services Marketplace
The Impact of SAP Hana on the SAP Infrastructure Utility Services Marketplace
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT T...
 
Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
 
IT Modernization in Practice
IT Modernization in PracticeIT Modernization in Practice
IT Modernization in Practice
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
 
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
 
Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013Hortonworks kognitio webinar 10 dec 2013
Hortonworks kognitio webinar 10 dec 2013
 
Modern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BIModern Data Architecture: In-Memory with Hadoop - the new BI
Modern Data Architecture: In-Memory with Hadoop - the new BI
 
HP Enterprises in Hana Pankaj Jain May 2016
HP Enterprises in Hana Pankaj Jain May 2016HP Enterprises in Hana Pankaj Jain May 2016
HP Enterprises in Hana Pankaj Jain May 2016
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with Telematics
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
BDM39: HP Vertica BI: Sub-second big data analytics your users and developers...
 
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-HadoopHP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
HP Vertica and MapR Webinar: Building a Business Case for SQL-on-Hadoop
 

Andere mochten auch

Map r hadoop-security-mar2014 (2)
Map r hadoop-security-mar2014 (2)Map r hadoop-security-mar2014 (2)
Map r hadoop-security-mar2014 (2)MapR Technologies
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
 
Why Elastic? @ 50th Vinitaly 2016
Why Elastic? @ 50th Vinitaly 2016Why Elastic? @ 50th Vinitaly 2016
Why Elastic? @ 50th Vinitaly 2016Christoph Wurm
 
Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민
Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민
Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민NAVER D2
 
Which data should you move to Hadoop?
Which data should you move to Hadoop?Which data should you move to Hadoop?
Which data should you move to Hadoop?Attunity
 
Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...Zaloni
 
MapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR Technologies
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 
Kibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stackKibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stackSylvain Wallez
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR Technologies
 
Key Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShareKey Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShareMapR Technologies
 
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭台灣資料科學年會
 
Hadoop disaster recovery
Hadoop disaster recoveryHadoop disaster recovery
Hadoop disaster recoverySandeep Singh
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 

Andere mochten auch (18)

Map r hadoop-security-mar2014 (2)
Map r hadoop-security-mar2014 (2)Map r hadoop-security-mar2014 (2)
Map r hadoop-security-mar2014 (2)
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Why Elastic? @ 50th Vinitaly 2016
Why Elastic? @ 50th Vinitaly 2016Why Elastic? @ 50th Vinitaly 2016
Why Elastic? @ 50th Vinitaly 2016
 
Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민
Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민
Elastic v5.0.0 Update uptoalpha3 v0.2 - 김종민
 
Which data should you move to Hadoop?
Which data should you move to Hadoop?Which data should you move to Hadoop?
Which data should you move to Hadoop?
 
Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...
 
MapR-DB Elasticsearch Integration
MapR-DB Elasticsearch IntegrationMapR-DB Elasticsearch Integration
MapR-DB Elasticsearch Integration
 
Handling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in FinanceHandling the Extremes: Scaling and Streaming in Finance
Handling the Extremes: Scaling and Streaming in Finance
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 
Kibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stackKibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stack
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
 
Key Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShareKey Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShare
 
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
 
Hadoop disaster recovery
Hadoop disaster recoveryHadoop disaster recovery
Hadoop disaster recovery
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 

Ähnlich wie Hadoop and Your Enterprise Data Warehouse

MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Precisely
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014MapR Technologies
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Cloudera, Inc.
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with HadoopPrecisely
 
Game Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingGame Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingInside Analysis
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsjdijcks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresKangaroot
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsDenodo
 
How PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
How PepsiCo's Big Data Strategy is Disrupting CPG Retail AnalyticsHow PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
How PepsiCo's Big Data Strategy is Disrupting CPG Retail AnalyticsHortonworks
 
Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic IntelAPAC
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Precisely
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 

Ähnlich wie Hadoop and Your Enterprise Data Warehouse (20)

MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
 
Game Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise ThinkingGame Changed – How Hadoop is Reinventing Enterprise Thinking
Game Changed – How Hadoop is Reinventing Enterprise Thinking
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
How PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
How PepsiCo's Big Data Strategy is Disrupting CPG Retail AnalyticsHow PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
How PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
 
Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic Gab Genai Cloudera - Going Beyond Traditional Analytic
Gab Genai Cloudera - Going Beyond Traditional Analytic
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 

Mehr von Edgar Alejandro Villegas

What's New in Predictive Analytics IBM SPSS - Apr 2016
What's New in Predictive Analytics IBM SPSS - Apr 2016What's New in Predictive Analytics IBM SPSS - Apr 2016
What's New in Predictive Analytics IBM SPSS - Apr 2016Edgar Alejandro Villegas
 
The Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperThe Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperEdgar Alejandro Villegas
 
SQL in Hadoop To Boldly Go Where no Data Warehouse Has Gone Before
SQL in Hadoop  To Boldly Go Where no Data Warehouse Has Gone BeforeSQL in Hadoop  To Boldly Go Where no Data Warehouse Has Gone Before
SQL in Hadoop To Boldly Go Where no Data Warehouse Has Gone BeforeEdgar Alejandro Villegas
 
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343Edgar Alejandro Villegas
 
Best Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle OptimizerBest Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle OptimizerEdgar Alejandro Villegas
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...Edgar Alejandro Villegas
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Edgar Alejandro Villegas
 
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slidesFast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slidesEdgar Alejandro Villegas
 
BITGLASS - DATA BREACH DISCOVERY DATASHEET
BITGLASS - DATA BREACH DISCOVERY DATASHEETBITGLASS - DATA BREACH DISCOVERY DATASHEET
BITGLASS - DATA BREACH DISCOVERY DATASHEETEdgar Alejandro Villegas
 
Four Pillars of Business Analytics - e-book - Actuate
Four Pillars of Business Analytics - e-book - ActuateFour Pillars of Business Analytics - e-book - Actuate
Four Pillars of Business Analytics - e-book - ActuateEdgar Alejandro Villegas
 

Mehr von Edgar Alejandro Villegas (20)

What's New in Predictive Analytics IBM SPSS - Apr 2016
What's New in Predictive Analytics IBM SPSS - Apr 2016What's New in Predictive Analytics IBM SPSS - Apr 2016
What's New in Predictive Analytics IBM SPSS - Apr 2016
 
Oracle big data discovery 994294
Oracle big data discovery   994294Oracle big data discovery   994294
Oracle big data discovery 994294
 
Actian Ingres10.2 Datasheet
Actian Ingres10.2 DatasheetActian Ingres10.2 Datasheet
Actian Ingres10.2 Datasheet
 
Actian Matrix Datasheet
Actian Matrix DatasheetActian Matrix Datasheet
Actian Matrix Datasheet
 
Actian Matrix Whitepaper
 Actian Matrix Whitepaper Actian Matrix Whitepaper
Actian Matrix Whitepaper
 
Actian Vector Whitepaper
 Actian Vector Whitepaper Actian Vector Whitepaper
Actian Vector Whitepaper
 
Actian DataFlow Whitepaper
Actian DataFlow WhitepaperActian DataFlow Whitepaper
Actian DataFlow Whitepaper
 
The Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology WhitepaperThe Four Pillars of Analytics Technology Whitepaper
The Four Pillars of Analytics Technology Whitepaper
 
SQL in Hadoop To Boldly Go Where no Data Warehouse Has Gone Before
SQL in Hadoop  To Boldly Go Where no Data Warehouse Has Gone BeforeSQL in Hadoop  To Boldly Go Where no Data Warehouse Has Gone Before
SQL in Hadoop To Boldly Go Where no Data Warehouse Has Gone Before
 
Realtime analytics with_hadoop
Realtime analytics with_hadoopRealtime analytics with_hadoop
Realtime analytics with_hadoop
 
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
SQL – The Natural Language for Analysis - Oracle - Whitepaper - 2431343
 
Big Data SurVey - IOUG - 2013 - 594292
Big Data SurVey - IOUG - 2013 - 594292Big Data SurVey - IOUG - 2013 - 594292
Big Data SurVey - IOUG - 2013 - 594292
 
Best Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle OptimizerBest Practices for Oracle Exadata and the Oracle Optimizer
Best Practices for Oracle Exadata and the Oracle Optimizer
 
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...Best Practices –  Extreme Performance with Data Warehousing  on Oracle Databa...
Best Practices – Extreme Performance with Data Warehousing on Oracle Databa...
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869
 
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slidesFast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
Fast and Easy Analytics: - Tableau - Data Base Trends - Dbt06122013slides
 
BITGLASS - DATA BREACH DISCOVERY DATASHEET
BITGLASS - DATA BREACH DISCOVERY DATASHEETBITGLASS - DATA BREACH DISCOVERY DATASHEET
BITGLASS - DATA BREACH DISCOVERY DATASHEET
 
Four Pillars of Business Analytics - e-book - Actuate
Four Pillars of Business Analytics - e-book - ActuateFour Pillars of Business Analytics - e-book - Actuate
Four Pillars of Business Analytics - e-book - Actuate
 
Sas hpa-va-bda-exadata-2389280
Sas hpa-va-bda-exadata-2389280Sas hpa-va-bda-exadata-2389280
Sas hpa-va-bda-exadata-2389280
 
Splice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakesSplice machine-bloor-webinar-data-lakes
Splice machine-bloor-webinar-data-lakes
 

Kürzlich hochgeladen

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 

Kürzlich hochgeladen (20)

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 

Hadoop and Your Enterprise Data Warehouse

  • 1. Welcome to Today’s DBTA Roundtable Web Event
  • 2. Stephen Faig Business Development Manager Unisphere Media Publishers of DBTA
  • 3. Hadoop and Your Enterprise Data Warehouse
  • 4. Nitin Bandugula Product Marketing Manager MapR Technologies Kevin Petrie Senior Director Attunity George Corugedo Chief Technology Officer & Co-Founder RedPoint Global Inc.
  • 5. © 2015 MapR Technologies 5© 2015 MapR Technologies
  • 6. © 2015 MapR Technologies 6 Empowering “as it happens” businesses by speeding up the data-to-action cycle
  • 7. © 2015 MapR Technologies 7 Top-Ranked NoSQL Top-Ranked Hadoop Distribution Top-Ranked SQL-on-Hadoop Solution
  • 8. © 2015 MapR Technologies 8 Topics • The Need for EDW Optimization • Different Stages of the Optimization • MapR Customer Examples • The MapR Advantage
  • 9. © 2015 MapR Technologies 9© 2015 MapR Technologies The Need for EDW Optimization
  • 10. © 2015 MapR Technologies 10 Technical Best-Practices Driving Change in Data Architecture 2 Speed of operations 1 Scale of analytics Source: TDWI, April 2014
  • 11. © 2015 MapR Technologies 11 Unused Data, Related Loads EDW ELT Unused Tables (72%) ELT • 70% of data is unused • Almost 60% of CPU capacity is ETL/ELT • 15% of CPU consumed by ETL to load unused data • 30% of CPU consumed by 5% of resource consuming ETL workloads. Meanwhile…The Industry Norm in the DW
  • 12. © 2015 MapR Technologies 12 Data IT Budgets Force of Adoption: Costs Hadoop TAM comes from disrupting enterprise data warehouse and storage spending • Gartner, "Forecast Analysis: Enterprise IT Spending by Vertical Industry Market, Worldwide, 2010-2016, 3Q12 Update.“ • Wall Street Journal, “Financial Services Companies Firms See Results from Big Data Push”, Jan. 27, 2014 $9,000 $40,000 <$1,000 DATA GROWING AT 40% 2013 ENTERPRISE STORAGE IT BUDGETS GROWING AT 2.5% 2014 2015 2016 2017 DATABASE WAREHOUSE $ PER TERABYTE HADOOP
  • 13. © 2015 MapR Technologies 13 SCALE: New Data Sources Unlock New Insights & Apps Existing structured data • Well-defined and well- understood schema – OLTP data – Data warehouse data – End user data stores (e.g., Excel, Access) New multi-structured data • Typically un-modeled, different in format – Log data – Clickstream data – Sensor data – Rich media (e.g., audio, video) – Documents Both types needed today for deeper insights
  • 14. © 2015 MapR Technologies 14© 2015 MapR Technologies Stages of the Optimization
  • 15. © 2015 MapR Technologies 15 Stage 1: Offload Cold Data – Free up DW space Structured Data ETLIncoming Data Data Warehouse Hadoop Platform • Unused data moved out • ETL done the traditional way • Critical data available for query Data Access: – BI through ODBC – Hive Connectors Cold Data Offload Restored Disk
  • 16. © 2015 MapR Technologies 16 Stage 2: ETL In Hadoop Low Latency Data ETL Incoming Data Data Warehouse Hadoop Platform Bulk Data Restored CPU and Disk • ETL now done on Hadoop • Analytics through EDW as well as Hadoop • Restores even more CPU and Disk • Improves old DW Response and Speed
  • 17. © 2015 MapR Technologies 17 Stage 3: Hadoop Optimized Data Architecture Sources RELATIONAL, SAAS, MAINFRAME DOCUMENTS, EMAILS LOG FILES, CLICKSTREAMS SENSORS BLOGS, TWEETS, LINK DATA DATA WAREHOUSE Data Movement Data Access Analytics Search Schema-less data exploration BI, reporting Ad-hoc integrated analytics Data Transformation, Enrichment and Integration MAPR DISTRIBUTION FOR HADOOP Streaming (Spark Streaming, Storm) NoSQL ODBMS (HBase, Accumulo, …) MapR Data Platform MapR-DB MAPR DISTRIBUTION FOR HADOOP Batch / Search (MR, Spark, Pig, …) MapR-FS Operational Apps Recommendations Fraud Detection Logistics Optimized Data Architecture Machine Learning SQL Analytics (Hive, Drill …)
  • 18. © 2015 MapR Technologies 18© 2015 MapR Technologies MapR Customer Examples
  • 19. © 2015 MapR Technologies 19 MapR Customer Success for Enterprise Data Hub • EDH most common use case • Across industries including - Financial services - Telecommunications - Government - Healthcare - Technology
  • 20. © 2015 MapR Technologies 20 Cisco - 360° Customer View Deepening customer relationships and increasing sales opportunities • Improve customer satisfaction and sales opportunities by integrating all customer data into one dashboard, accessible across company divisions • Provide a consistent and proactively knowledgeable customer experience • Integrate all customer data across silos into a central data repository • Continually feed real-time customer data into the repository • Provide a real-time view of each customer across company divisions: marketing, support, finance, point of sale, etc. OBJECTIVES CHALLENGES SOLUTION Cisco’s 360° customer view solution enabled them to analyze service sales opportunities in 1/10 the time, at 1/10 the cost, and generated $40 million in incremental service bookings in the first year. Business Impact • Central data repository results in lower cost and reduced complexity • Accelerates analysis cycle time and rapid actions • Provides high availability and disaster recovery
  • 21. © 2015 MapR Technologies 21 F100 Telco - Data Warehouse Optimization Improve data services to customers while reducing enterprise architecture costs • Provide cloud, security, managed services, data center, & comms • Report on customer usage, profiles, billing, and sales metrics • Improve service: Measure service quality and repair metrics • Reduce customer churn – identify and address IP network hotspots • Cost of ETL & DW storage for growing IP and clickstream data; >3 months • Reliability & cost of Hadoop alternatives limited ETL & storage offload • MapR Data Platform for data staging, ETL, and storage at 1/10th the cost • MapR provided smallest datacenter footprint with best DR solution • Enterprise-grade: NFS file management, consistent snapshots & mirroring OBJECTIVES CHALLENGES SOLUTION • Increased scale to handle network IP and clickstream data • Reduced workload on DW to maintain reporting SLA’s to business • Unlocked new insights into network usage and customer preferences Business Impact FORTUNE 100 TELCO
  • 22. © 2015 MapR Technologies 22© 2015 MapR Technologies MapR Enterprise Data Hub Solution
  • 23. © 2015 MapR Technologies 23 MapR Enterprise Data Hub • Scale - Reliability Across the Enterprise – Advanced multi-tenancy – Business continuity – HA, DR • Speed – 2-7x faster than other Hadoop distro’s – Ultra-fast data ingest, NFS, & R/W file system • Self-Service Data Exploration – On-the-fly SQL without up-front schema – ANSI SQL: use existing BI/DW investments The Hadoop platform of choice for big & fast data-driven apps Security Streaming NoSQL & Search Provisioning & coordination ML, Graph W orkflow & Data Governance Batch SQL INTEGRATED COMMERCIAL ENGINES TOOLSCOMPUTE ENGINES Batch Interactive Real-time Online Others Management Operations Governance Audits Security MapR-FS MapR-DB MapR Data Platform
  • 24. © 2015 MapR Technologies 24 Traditional Approach Drill: Agility by Reducing Distance to Data Short analytic life cycles with no upfront schema creation and management Hadoop Data Schema Design Transformation Data Movement Users Hadoop Data Users New Business Questions Total Time to Value: Weeks to Months Total Time to Value: Minutes New Approach Data Preparation New Business Questions Drill enables the “As-It-Happens” business with instant SQL analytics on complex data FROM: TO:
  • 25. © 2015 MapR Technologies 25 Thank You @mapr maprtech nitin@mapr.com MapRTechnologies maprtech mapr-technologies Freeon-demand Hadoop training leading to certification Start becoming an expert now mapr.com/training
  • 26. Data Quality in the Data Hub February 2015
  • 27. 27  RedPoint Global Inc. 2015 Confidential Overview of RedPoint Global Launched 2006 Founded and staffed by industry veterans Headquarters: Wellesley, Massachusetts Offices in US, UK, Australia, Philippines Global customer base Serves most major industries MAGIC QUADRANT Data Quality MAGIC QUADRANT Multichannel Campaign Management MAGIC QUADRANT Integrated Marketing Management
  • 28. 28  RedPoint Global Inc. 2015 Confidential Extensive experience with a diverse customer base
  • 29. 29  RedPoint Global Inc. 2015 Confidential Cloudera Stack
  • 30. 30  RedPoint Global Inc. 2015 Confidential Andrew Brust, GigaOm Research
  • 31. 31  RedPoint Global Inc. 2015 Confidential There is lots of Hype Out There
  • 32. 32  RedPoint Global Inc. 2015 Confidential Don’t believe the Marketing Hype
  • 33. 33  RedPoint Global Inc. 2015 Confidential Data Hub for MDM Data Hub 1                                      n YARN Production RDBMS Databases DataIngestion Specialized Analytic Databases & Caches Any analytics Any reporting Predictive Analytics Clustering Profiling Analytics Marketing Automation Real Time Personalization Omni-Channel Optimization Digital and Traditional Channels Interaction Systems DataQualityProcessing Persistent Entity Resolution, Linkage and Keying
  • 34. 34  RedPoint Global Inc. 2015 Confidential How About MDM on a Data Lake? • Severe shortage of Map Reduce skilled resources • Inconsistent skills lead to inconsistent results of code based solutions • Nascent technologies require multiple point solutions • Technologies are not enterprise grade • Some functionality may not be possible within these frameworks Challenges to Data Lake Approach • Data is ingested in its raw state regardless of format, structure or lack of structure • Raw data can be used and reused for differing purposes across the enterprise • Beyond inexpensive storage, Hadoop is an extremely power and scalable and segmentable computational platform • Master Data can be fed across the enterprise and deep analytics on clean data is immediately enabled Benefits of a Hadoop Data Lake
  • 35. 35  RedPoint Global Inc. 2015 Confidential Key Functions for Master Data Management Master Key Management ETL & ELT Data Quality Web Services Integration Integration & Matching Process Automation & Operations • Profiling, reads/writes, transformations • Single project for all jobs • Cleanse data • Parsing, correction • Geo-spatial analysis • Grouping • Fuzzy match • Create keys • Track changes • Maintain matches over time • Consume and publish • HTTP/HTTPS protocols • XML/JSON/SOAP formats • Job scheduling, monitoring, notifications • Central point of control • Meta Data Management
  • 36. 36  RedPoint Global Inc. 2015 Confidential Overview - What is Hadoop/Hadoop 2.0 Hadoop 1.0 • All operations based on Map Reduce • Intrinsic inconsistency of code based solutions • Highly skilled and expensive resources needed • 3rd party applications constrained by the need to generate code Hadoop 2.0 • Introduction of the YARN: “a general-purpose, distributed, application management framework that supersedes the classic Apache Hadoop MapReduce framework for processing data in Hadoop clusters.” • Mature applications can now operate directly on Hadoop • Reduce skill requirements and increased consistency
  • 37. 37  RedPoint Global Inc. 2015 Confidential RedPoint Data Management on Hadoop Partitioning AM / Tasks Execution AM / Tasks Data I/O Key / Split Analysis Parallel Section YARN MapReduce
  • 38. 38  RedPoint Global Inc. 2015 Confidential Resource Manager Launches Tasks Node Manager DM App Master DM Task Node Manager DM Task DM Task Node Manager DM Task DM Task Launches DM App Master Data Management Designer DM Execution Server Parallel Section Running DM Task 1 2 3 RedPoint DM for Hadoop: Processing Flow
  • 39. 39  RedPoint Global Inc. 2015 Confidential Reference Hadoop Architecture Monitoring and Management Tools Management MAPREDUCE REST DATA REFINEMENT HIVEPIG HTTP STREAM STRUCTURE HCATALOG (metadata services) Query/Visualization/ Reporting/Analytical Tools and Apps SOURCE DATA - Sensor Logs - Clickstream - Flat Files - Unstructured - Sentiment - Customer - Inventory DBs JMS Queue’s Fil es Fil esFiles Data Sources RDBMS EDW INTERACTIVE HIVE Server2 LOAD SQOOP WebHDFS Flume NFS LOAD SQOOP/Hive Web HDFS YARN                                      n 1                                                   HDFS RedPoint Functional Footprint
  • 40. 40  RedPoint Global Inc. 2015 Confidential >150 Lines of MR Code ~50 Lines of Script Code 0 Lines of Code 6 hours of development 3 hours of development 15 min. of development 6 minutes runtime 15 minutes runtime 3 minutes runtime Extensive optimization needed User Defined Functions required prior to running script No tuning or optimization required RedPoint Benchmarks – Project Gutenberg Map Reduce Pig Sample MapReduce (small subset of the entire code which totals nearly 150 lines): public static class MapClass extends Mapper<WordOffset, Text, Text, IntWritable> { private final static String delimiters = "',./<>?;:"[]{}-=_+()&*%^#$!@`~ |«»¡¢£¤¥¦©¬®¯±¶·¿"; private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(WordOffset key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line, delimiters); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } Sample Pig script without the UDF: SET pig.maxCombinedSplitSize 67108864 SET pig.splitCombination true A = LOAD '/testdata/pg/*/*/*'; B = FOREACH A GENERATE FLATTEN(TOKENIZE((chararray)$0)) AS word; C = FOREACH B GENERATE UPPER(word) AS word; D = GROUP C BY word; E = FOREACH D GENERATE COUNT(C) AS occurrences, group; F = ORDER E BY occurrences DESC; STORE F INTO '/user/cleonardi/pg/pig-count';
  • 41. 41  RedPoint Global Inc. 2015 Confidential Data Lake Architecture for MDM
  • 42. 42  RedPoint Global Inc. 2015 Confidential Recommendations for Data Quality • There is a gap between current use and the mainstream • Don’t believe the hype; there’s plenty of it • Data Quality creates trust in information which enables confident and nimble decision making. • Look for broad enterprise apps that have solved the parallel scalability problem • Consider a Data Hub approach for Data Quality for maximum flexibility and scalable performance
  • 43. 43  RedPoint Global Inc. 2015 Confidential George Corugedo Chief Technology Officer George.corugedo@redpoint.net 781.725.0252 Download our white paper From Yawn to Yarn: Why You Should be Excited about Hadoop Redpoint.net/dbtawebinar
  • 44. Question and Answer Session (please submit questions)
  • 45. Nitin Bandugula Product Marketing Manager MapR Technologies Kevin Petrie Senior Director Attunity George Corugedo Chief Technology Officer & Co-Founder RedPoint Global Inc.
  • 46. Please use the same URL you used to view today’s live event for the archive event, plus we will be sending you a follow-up email with that URL once the archive is posted!
  • 47. Thank you for participating in today’s roundtable web event Just by attending this event the winner of the $100 AmEx Gift Card is…….