SlideShare a Scribd company logo
1 of 47
Supporting Financial Services 
With a More Flexible Approach 
to Big Data 
October 2014
Hortonworks 
Fall 2014 
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
We Do Hadoop
Our Mission: 
Power your Modern Data Architecture 
with HDP and Enterprise Apache Hadoop 
Who we are 
June 2011: Original 24 architects, developers, operators of Hadoop from Yahoo! 
June 2014: An enterprise software company with 420+ Employees 
Our model 
Innovate and deliver Apache Hadoop as a complete enterprise data platform 
completely in the open, backed by a world class support organization 
Key Partners 
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Fastest growing Fortune 1000 customer base 
Customer Momentum 
• 300+ customers in seven quarters, growing at 75+/quarter 
• Two thirds of customers come from F1000 
• 100% renewal rate 
Largest Cluster in North America 
32,000 Nodes 
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Largest Cluster in Europe 
1,000 Nodes 
30+ customers migrated from other distributions 
Some notable migrations include many of the early adopters of Hadoop: 
Experience at Scale 
80,000 nodes under contract 
Largest Known Cluster in APAC 
400 Nodes
Hortonworks: A Leader In Hadoop 
The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014 
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
“Hortonworks loves and lives 
open source innovation” 
Vision & Execution for Enterprise Hadoop. 
Hortonworks leads with a strong strategy and roadmap for open source innovation 
with Hadoop and a strong delivery of that innovation in Hortonworks Data Platform. 
World Class Support and Services. 
Hortonworks' Customer Support received a maximum score 
and was significantly higher than both Cloudera and MapR. 
Key Strategic Partnerships. 
Hortonworks’ unique strategic partnerships with Microsoft, SAP, Teradata and others 
are a key strength as part of its overall strategy of ecosystem partnership to 
accelerate Hadoop adoption in the enterprise.
HDP IS Apache Hadoop 
There is ONE Enterprise Hadoop: everything else is a vendor derivation 
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
HDP 
• Reliable 
• Consistent 
• Current
Enabling a Modern Data Architecture 
with HDP and Apache Hadoop 
Fall 2014 
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Hortonworks. We do Hadoop.
Traditional systems under pressure 
DATA SYSTEM APPLICATIONS 
Business 
Analytics 
Custom 
Applications 
RDBMS EDW MPP 
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Packaged 
Applications 
• Silos of Data 
• Costly to Scale 
• Constrained Schemas 
Clickstream 
Geolocation 
Sentiment, Web Data 
Sensor, Machine Data 
Unstructured docs, emails 
Server logs 
SOURCES 
Existing Sources 
(CRM, ERP,…) 
New Data Types 
…and difficult to 
manage new data
Traditional Hadoop, challenges & limitations 
MapReduce 
Largely Batch Processing 
1 ° ° ° ° ° 
HDFS 
(Hadoop Distributed File System) 
° ° ° ° ° N 
SOURCES 
EXISTING 
Systems 
Clickstream Web &Social Geolocation Sensor & 
Machine 
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Server Logs Unstructured 
Architectural Limitations 
• Single-purpose clusters, specific data sets 
• Primarily a batch system using MapReduce 
Enterprise Challenges 
• Limited enterprise capabilities: 
Operations, Security & Governance 
• Created additional Silos 
Interoperability Challenges 
• Difficult to natively integrate existing applications 
Commercial add-ons opportunistically emerged 
in the early days to address these shortcomings 
DATA SYSTEM APPLICATIONS 
Business 
Analytics 
Custom 
Applications 
Packaged 
Applications 
RDBMS EDW MPP
2006 2009 
MR-279: YARN 
Hadoop w/ MapReduce 
MapReduce 
Largely Batch Processing 
1 ° ° ° ° ° 
HDFS 
(Hadoop Distributed File System) 
° ° ° ° ° N 
Hadoop2 & YARN based Architecture 
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
YARN: Data Operating System 
1 ° ° ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° ° 
° 
N ° 
HDFS 
(Hadoop Distributed File System) 
Siloed clusters 
Largely batch system 
Difficult to integrate 
Hadoop 2 & YARN 
Batch Interactive Real-Time 
Architected & 
led development 
of YARN to enable 
the Modern Data 
Architecture 
October 23, 2013
HDP2 and YARN enable the Modern Data Architecture 
Batch Interactive Real-Time 
HDFS 
(Hadoop Distributed File System) 
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Hortonworks architected and 
led development of YARN 
Common data set, multiple applications 
• Optionally land all data in a single cluster 
• Batch, interactive & real-time use cases 
• Support multi-tenant access, processing 
& segmentation of data 
YARN: Architectural center of Hadoop 
• Consistent security, governance & operations 
• Ecosystem applications certified 
by Hortonworks to run natively in Hadoop 
SOURCES 
EXISTING 
Systems 
Clickstream Web 
&Social 
Geolocation Sensor 
& Machine 
Server 
Logs 
Unstructured 
DATA SYSTEM APPLICATIONS 
Business 
Analytics 
Custom 
Applications 
Packaged 
Applications 
RDBMS EDW MPP YARN: Data Operating System 
1 ° ° ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° ° N
A Blueprint for Enterprise Hadoop 
Load data 
and manage 
according 
to policy 
PRESENTATION & APPLICATION 
ENTERPRISE MGMT & SECURITY 
DATA ACCESS SECURITY 
Access your data simultaneously in multiple ways 
(batch, interactive, real-time) Provide layered 
YARN Data Operating System 
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Deploy and 
effectively 
manage the 
platform 
Store and process all of your Corporate Data Assets 
approach to 
security through 
Authentication, 
Authorization, 
Accounting, and 
Data Protection 
DATA MANAGEMENT 
GOVERNANCE 
& INTEGRATION 
OPERATIONS 
Enable both existing and new applications to 
provide value to the organization 
Empower existing operations and 
security tools to manage Hadoop 
Provide deployment choice across physical, virtual, cloud 
DEPLOYMENT OPTIONS
HDP Delivers Enterprise Hadoop 
Hortonworks Data Platform 2.2 
GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SECURITY OPERATIONS 
Java 
Scala 
Cascading 
Tez 
Stream 
Storm 
YARN: Data Operating System 
(Cluster Resource Management) 
Script 
Pig 
SQL 
Hive 
Tez Tez 
1 ° ° ° ° ° ° ° 
° ° ° ° ° ° ° ° 
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
° ° 
° ° 
° ° ° ° ° 
° ° ° ° ° 
HDFS 
(Hadoop Distributed File System) 
Search 
Solr 
NoSQL 
HBase 
Accumulo 
Slider Slider 
In-Memory 
Spark 
Provision, 
Manage & 
Monitor 
Ambari 
Zookeeper 
Scheduling 
Oozie 
Data Workflow, 
Lifecycle & 
Governance 
Falcon 
Sqoop 
Flume 
Kafka 
NFS 
WebHDFS 
Authentication 
Authorization 
Accounting 
Data Protection 
Storage: HDFS 
Resources: YARN 
Access: Hive, … 
Pipeline: Falcon 
Cluster: Knox 
Linux Windows Deployment Choice On- 
Premises 
Cloud 
YARN is the architectural 
center of HDP 
• Common data set across all 
applications 
• Batch, interactive & real-time 
workloads 
• Multi-tenant access & processing 
Provides comprehensive 
enterprise capabilities 
• Governance 
• Security 
• Operations 
Enables broad 
ecosystem adoption 
• ISVs can plug directly into Hadoop 
The widest range of deployment options 
• Linux & Windows 
• On-premises & cloud 
Others 
ISV 
Engines
The Modern Data Architecture w/ HDP 
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Clickstream 
Capture and analyze 
website visitors’ data 
trails and optimize 
your website 
Sensors 
Discover patterns in 
data streaming 
automatically from 
remote sensors and 
machines 
Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Server Logs 
Research logs to 
diagnose process 
failures and prevent 
security breaches 
Hadoop Value: New Types of Data 
Sentiment 
Understand how 
your customers feel 
about your brand 
and products – 
right now 
Geographic 
Analyze location-based 
data to 
manage operations 
where they occur 
Unstructured 
Understand patterns 
in files across millions 
of web pages, emails, 
and documents
New analytic applications for new types of data 
$ 
Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
• Supplier Consolidation 
• Supply Chain and Logistics 
• Assembly Line Quality Assurance 
• Proactive Maintenance 
• Crowdsourced Quality Assurance 
• New Account Risk Screens 
• Fraud Prevention 
• Trading Risk 
• Maximize Deposit Spread 
• Insurance Underwriting 
• Accelerate Loan Processing 
• Call Detail Records (CDRs) 
• Infrastructure Investment 
• Next Product to Buy (NPTB) 
• Real-time Bandwidth 
Allocation 
• New Product Development 
• 360° View of the Customer 
• Analyze Brand Sentiment 
• Localized, Personalized 
Promotions 
• Website Optimization 
• Optimal Store Layout 
Financial 
Services 
Retail Telecom Manufacturing 
Healthcare 
Utilities, 
Oil & Gas 
Public 
Sector 
• Genomic data for medical trials 
• Monitor patient vitals 
• Reduce re-admittance rates 
• Store medical research data 
• Recruit cohorts for 
pharmaceutical trials 
• Smart meter stream 
analysis 
• Slow oil well decline curves 
• Optimize lease bidding 
• Compliance reporting 
• Proactive equipment repair 
• Seismic image processing 
• Analyze public sentiment 
• Protect critical networks 
• Prevent fraud and waste 
• Crowdsource reporting for 
repairs to infrastructure 
• Fulfill open records requests
..to shift from reactive to proactive interactions 
A shift in Advertising 
From mass branding …to 1x1 Targeting 
A shift in Financial Services 
From Educated Investing …to Automated Algorithms 
A shift in Healthcare 
From mass treatment …to Designer Medicine 
A shift in Retail 
A shift in Telco 
Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
HDP and Hadoop allow 
organizations to shift 
interactions from… 
Reactive 
Post Transaction 
Proactive 
Pre Decision 
From static branding …to Real-time Personalization 
From break then fix …to repair before break
Data Lake: An architectural shift 
SCALE 
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
SCOPE 
Unlocking the Data Lake 
RDBMS 
MPP 
EDW 
Data Lake 
Enabled by YARN 
• Single data repository, 
shared infrastructure 
• Multiple biz apps 
accessing all the data 
• Enable a shift from 
reactive to proactive 
interactions 
• Gain new insight across 
the entire enterprise 
New Analytic Apps 
or IT Optimization 
HDP 2.1 
Governance 
& Integration 
Security 
Operations 
Data Access 
YARN 
Data Management
HDP is deeply integrated in the data center 
YARN 
Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
DEV & DATA TOOLS 
OPERATIONAL TOOLS 
INFRASTRUCTURE 
SOURCES 
EXISTING 
Systems 
Clickstream Web &Social Geolocation Sensor & 
Machine 
Server Logs Unstructured 
DATA SYSTEM 
RDBMS EDW MPP 
HANA 
APPLICATIONS 
BusinessObjects BI 
Deep Partnerships 
Hortonworks engages 
in deep engineered relationships 
with the leaders in the data center, 
such as Microsoft, Teradata, Redhat, 
HP, SAS & SAP 
Broad Partnerships 
Over 600 partners work with us to 
certify their applications to work with 
Hadoop so they can extend big data 
to their users 
HDP 2.1 
Governance 
& Integration 
Security 
Operations 
Data Access 
Data Management
HDP Use Cases in Financial Services 
Fall 2014 
Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Hortonworks. We do Hadoop.
Monetize Anonymous & Aggregate Banking Data 
Problem 
Valuable banking data needed to be anonymous & unified 
• Bank possesses data that indicates larger macro-economic trends, which can be 
monetized in secondary markets 
• Regulations and company policies protect customer privacy 
• Data sets are isolated in legacy silos controlled by LOBs 
• IT challenged by joining data while guaranteeing anonymity 
Solution 
Cross-bank data lake for aggregate data with secure access 
• Multiple data sets abstracted from source platforms 
• Single point of security & privacy for de-identification, masking, encryption, 
authentication and access control 
• Mortgage bankers, consumer bankers, credit card group and treasury bankers have 
access to the same cross-sell data 
• Interoperability with partners SAS, R, RedHat & Splunk 
• Economies of scale for compression & archiving data 
• Significant reduction in storage costs from prior platforms 
Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Creating Opportunity 
Data: Structured, 
Clickstream, Social & 
Unstructured 
Banking 
One of the largest US banks
Insurance Data Lake to Manage Risk 
Creating Opportunity 
Data: Structured, 
Clickstream, Server Log 
Problem 
Challenges merging new & old data hamper analysis 
• Traditional and newer types of data were both growing quickly but were difficult to 
combine in the EDW 
• “Schema on load” requirements of EDW platform limited ingest of some data with 
significant predictive power 
• Company missed data-driven ways to serve customers 
• Process of separating legitimate from fraudulent claims created “needle-in-a-haystack” 
Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
problem 
Solution 
Common platform for all types of data improves up-sell and reduces fraud 
• “Schema on read” Hadoop architecture means that more data sources can be 
easily ingested to enrich predictive analytics 
• Agents use big data insights to determine the best action for valued customers and 
recommend those in real-time 
• Claims analysts and underwriters process streaming data to quickly flag fraud risks 
and fast-track legitimate claims 
Health Insurance 
Large US medical insurer 
>$30B in revenue 
>20M members 
~35K employees
Maintaining SLAs for Equity Trading Information 
Problem 
Meeting 12 millisecond SLAs for “ticker plant” 
• Daily ingest: 50GB server log data from 10,000 feeds 
• Four times daily, this data is pushed into DB2 
• Applications query this data 35K times per second 
• 70% of queries are for data <1 year old, 30% for >1 year old 
• Current architecture can only hold 10 years of trading data 
• Growing volume puts performance at risk of missing SLAs 
Solution 
Meeting SLAs with confidence 
• HBase provides super-fast queries within SLA targets 
• ETL offloading to Hadoop allows longer data retention, without jeopardizing fast 
response times 
Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Improving Efficiency 
Data: Server Log & ETL 
Investment 
Services 
Highly trafficked website 
providing business and 
financial information 
~15K employees
Hadoop is a Platform Decision 
Open Leadership 
Drive innovation in the open via 
the Apache community-driven 
open source process 
Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 
Enterprise Rigor 
Engineer, test and certify 
Apache Hadoop with the 
enterprise in mind 
Ecosystem Endorsement 
Focus on deep integration with 
existing data center technologies 
and skills 
Fastest Growing Customer and Partner Base 
Largest and most experienced Hadoop adopters have standardized on Hortonworks 
The data center leaders have standardized on Hortonworks
26 
WANdisco Background 
• WANdisco: Wide Area Network Distributed Computing 
– Enterprise-ready, high availability software solutions that enable globally distributed 
organizations to meet today’s data challenges of secure storage, scalability and availability 
• Leader in tools for software engineers – Subversion 
– Apache Software Foundation sponsor 
• Highly successful IPO, London Stock Exchange, June 2012 (LSE:WAND) 
• US patented active-active replication technology granted, November 2012 
• Global locations 
– San Ramon (CA) 
– Chengdu (China) 
– Tokyo (Japan) 
– Boston (MA) 
– Sheffield (UK) 
– Belfast (UK)
27 
Customers
28 
Non-Stop Hadoop 
Non-Intrusive Plugin 
to Hortonworks HDP 
Provides Continuous Availability 
In the LAN / Across the WAN 
Active/Active
3 Problems For Sharing Data Across Clusters 
LAN / WAN 
29
Enterprise-Ready Hadoop 
Characteristics of Mission-critical Financial Applications 
30 
• Require Continuous Availability 
– SLA’s, regulatory compliance 
• Require HDFS to be Deployed Globally 
– Share data between data centers 
– Data is consistent, not eventual 
• Ease Administrative Burden 
– Reduce operational complexity 
– Simplify disaster recovery 
– Lower RTO/RPO 
• Allow Maximum Utilization of 
Resources 
– Within the data center 
– Across data centers
Breaking Away from Active/Passive 
What’s in a NameNode 
31 
Single Standby 
• Inefficient utilization of resources 
– Journal Nodes 
– ZooKeeper Nodes 
– Standby Node 
• Performance bottleneck 
• Still tied to the beeper 
• Limited to LAN scope 
Active / Active 
• All resources utilized 
– Only NameNode configuration 
– Scale as the cluster grows 
– All NameNodes active 
• Load balancing 
• Set resiliency (# of active NN) 
• Global consistency
Breaking Away from Active/Passive 
What’s in a Data Center 
32 
Standby Data Center 
• Idle Resource 
– Single data center ingest 
– Disaster recovery only 
• One-way synchronization 
– DistCp 
• Error-prone 
– Clusters can diverge over time 
• Difficult to scale > 2 Data Centers 
– Complexity of sharing data 
increases 
Active / Active 
• DR resource available 
– Ingest at all data centers 
– Run jobs in both data centers 
• Replication is multi-directional 
– Active/active 
• Absolute consistency 
– Single HDFS spans locations 
• ‘N’ data center support 
– Global HDFS allows appropriate 
data to be shared
Use Cases
Use Case: Disaster Recovery 
34 
• Data is as current as possible (no 
periodic synchs) 
• Doesn’t require monitoring and 
consistency checking 
• Virtually zero downtime to recover 
from regional data center failure 
• Regulatory compliance
35 
• Ingest and analyze anywhere 
• Analyze everywhere 
– Fraud detection 
– Equity trading information 
– New business 
– Etc… 
• Backup data center(s) can be used 
for work 
– No idle resources 
Use Case: Multi-Data Center 
Ingest and multi-tenant workloads
Use Case: Heterogeneous Hardware 
In-memory analytics 
36 
• Mixed Hardware Profiles 
– Memory, disk, CPU 
– Isolate memory-hungry 
processing (Storm/Spark) 
from regular jobs 
• Share data, not processing 
– Isolate lower priority 
(dev/test) work
The difficulty realizing the data lake… 
37
…is that data spans the entire world 
38
39 
Data 
Ocean 
Feeder 
Site 
Accounting 
Mart 
Banking 
Mart 
• Data Marts 
– Restrict access to relevant 
data 
– Create quick clusters 
• Feeder Sites (Data 
Tributaries) 
– Ingest only 
Data Reservoir 
Use Cases
40 
• Basel III 
– Consistency of data 
• Data Privacy Directive 
– Data sovereignty 
• Data doesn’t leave country of 
origin 
Compliance 
Regulation 
Guidelines 
Regulatory Compliance
Technical Comparison 
Hadoop Powered by WANdisco
Multi-Data Center Hadoop Today 
What's wrong with the status quo 
42 
Periodic Synchronization 
DistCp 
Parallel Data Ingest 
Load Balancer, Streaming
Multi-Data Center Hadoop Today 
Hacks currently in use 
43 
Periodic Synchronization 
DistCp 
• Runs as MapReduce 
• DR data center is read-only 
• Over time, Hadoop clusters 
become inconsistent 
• Manual and labor-intensive 
process to reconcile differences 
• Inefficient use of the network
Multi-Data Center Hadoop Today 
Hacks currently in use 
44 
Parallel Data Ingest 
Load Balancer, Flume 
• Hiccups in either of the Hadoop 
clusters causes the two file 
systems to diverge 
• Potential to run out of buffer when 
WAN is down 
• Requires constant attention and 
sys-admin hours to keep running 
• Data created on the cluster is not 
replicated 
• Use of streaming technologies 
(like flume) for data redirection are 
only for streaming
Architecture of a Non-Stop Hadoop 
45
46 
Question and Answer 
Submit your questions in chat 
Q&A
Thank you 
47

More Related Content

What's hot

Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopRan Ziv
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop IntroductionDzung Nguyen
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview EMC
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production SuccessAllen Day, PhD
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & HadoopEdureka!
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 

What's hot (19)

Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 
Hadoop Fundamentals I
Hadoop Fundamentals IHadoop Fundamentals I
Hadoop Fundamentals I
 
Hadoop
HadoopHadoop
Hadoop
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Introduction to Big Data & Hadoop
Introduction to Big Data & HadoopIntroduction to Big Data & Hadoop
Introduction to Big Data & Hadoop
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 

Viewers also liked

Gluster.community.day.2013
Gluster.community.day.2013Gluster.community.day.2013
Gluster.community.day.2013Udo Seidel
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Spark Summit
 
Cassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data LocalityCassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data LocalityRussell Spitzer
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksMarian Marinov
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalabilityWANdisco Plc
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 

Viewers also liked (7)

Gluster.community.day.2013
Gluster.community.day.2013Gluster.community.day.2013
Gluster.community.day.2013
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
 
Cassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data LocalityCassandra and Spark: Optimizing for Data Locality
Cassandra and Spark: Optimizing for Data Locality
 
Performance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networksPerformance comparison of Distributed File Systems on 1Gbit networks
Performance comparison of Distributed File Systems on 1Gbit networks
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalability
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 

Similar to Supporting Financial Services with a More Flexible Approach to Big Data

Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopHortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopMats Johansson
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupHortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupMats Johansson
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Hortonworks
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJDaniel Madrigal
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 

Similar to Supporting Financial Services with a More Flexible Approach to Big Data (20)

Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopHortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with Hadoop
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Hortonworks and HP Vertica Webinar
Hortonworks and HP Vertica WebinarHortonworks and HP Vertica Webinar
Hortonworks and HP Vertica Webinar
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupHortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User Group
 
Meetup oslo hortonworks HDP
Meetup oslo hortonworks HDPMeetup oslo hortonworks HDP
Meetup oslo hortonworks HDP
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 

More from WANdisco Plc

Forrester On Using Subversion to Optimize Globally Distributed Development
Forrester On Using Subversion to Optimize Globally Distributed DevelopmentForrester On Using Subversion to Optimize Globally Distributed Development
Forrester On Using Subversion to Optimize Globally Distributed DevelopmentWANdisco Plc
 
03.13.13 WANDisco SVN Training: Advanced Branching & Merging
03.13.13 WANDisco SVN Training: Advanced Branching & Merging03.13.13 WANDisco SVN Training: Advanced Branching & Merging
03.13.13 WANDisco SVN Training: Advanced Branching & MergingWANdisco Plc
 
02.28.13 WANDisco SVN Training: Getting Info Out of SVN
02.28.13 WANDisco SVN Training: Getting Info Out of SVN02.28.13 WANDisco SVN Training: Getting Info Out of SVN
02.28.13 WANDisco SVN Training: Getting Info Out of SVNWANdisco Plc
 
02.19.13 WANDisco SVN Training: Branching Options for Development
02.19.13 WANDisco SVN Training: Branching Options for Development02.19.13 WANDisco SVN Training: Branching Options for Development
02.19.13 WANDisco SVN Training: Branching Options for DevelopmentWANdisco Plc
 
uberSVN introduction by WANdisco
uberSVN introduction by WANdiscouberSVN introduction by WANdisco
uberSVN introduction by WANdiscoWANdisco Plc
 
WANdisco Subversion Support Services
WANdisco Subversion Support ServicesWANdisco Subversion Support Services
WANdisco Subversion Support ServicesWANdisco Plc
 
Make Subversion Agile
Make Subversion AgileMake Subversion Agile
Make Subversion AgileWANdisco Plc
 
Subversion in 2010 and Beyond
Subversion in 2010 and BeyondSubversion in 2010 and Beyond
Subversion in 2010 and BeyondWANdisco Plc
 
Forrester Research on Optimizing Globally Distributed Software Development Us...
Forrester Research on Optimizing Globally Distributed Software Development Us...Forrester Research on Optimizing Globally Distributed Software Development Us...
Forrester Research on Optimizing Globally Distributed Software Development Us...WANdisco Plc
 
Forrester Research on Globally Distributed Development Using Subversion
Forrester Research on Globally Distributed Development Using SubversionForrester Research on Globally Distributed Development Using Subversion
Forrester Research on Globally Distributed Development Using SubversionWANdisco Plc
 

More from WANdisco Plc (12)

Forrester On Using Subversion to Optimize Globally Distributed Development
Forrester On Using Subversion to Optimize Globally Distributed DevelopmentForrester On Using Subversion to Optimize Globally Distributed Development
Forrester On Using Subversion to Optimize Globally Distributed Development
 
03.13.13 WANDisco SVN Training: Advanced Branching & Merging
03.13.13 WANDisco SVN Training: Advanced Branching & Merging03.13.13 WANDisco SVN Training: Advanced Branching & Merging
03.13.13 WANDisco SVN Training: Advanced Branching & Merging
 
02.28.13 WANDisco SVN Training: Getting Info Out of SVN
02.28.13 WANDisco SVN Training: Getting Info Out of SVN02.28.13 WANDisco SVN Training: Getting Info Out of SVN
02.28.13 WANDisco SVN Training: Getting Info Out of SVN
 
02.19.13 WANDisco SVN Training: Branching Options for Development
02.19.13 WANDisco SVN Training: Branching Options for Development02.19.13 WANDisco SVN Training: Branching Options for Development
02.19.13 WANDisco SVN Training: Branching Options for Development
 
uberSVN introduction by WANdisco
uberSVN introduction by WANdiscouberSVN introduction by WANdisco
uberSVN introduction by WANdisco
 
Subversion Zen
Subversion ZenSubversion Zen
Subversion Zen
 
WANdisco Subversion Support Services
WANdisco Subversion Support ServicesWANdisco Subversion Support Services
WANdisco Subversion Support Services
 
Make Subversion Agile
Make Subversion AgileMake Subversion Agile
Make Subversion Agile
 
Why Svn
Why SvnWhy Svn
Why Svn
 
Subversion in 2010 and Beyond
Subversion in 2010 and BeyondSubversion in 2010 and Beyond
Subversion in 2010 and Beyond
 
Forrester Research on Optimizing Globally Distributed Software Development Us...
Forrester Research on Optimizing Globally Distributed Software Development Us...Forrester Research on Optimizing Globally Distributed Software Development Us...
Forrester Research on Optimizing Globally Distributed Software Development Us...
 
Forrester Research on Globally Distributed Development Using Subversion
Forrester Research on Globally Distributed Development Using SubversionForrester Research on Globally Distributed Development Using Subversion
Forrester Research on Globally Distributed Development Using Subversion
 

Recently uploaded

Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 

Recently uploaded (20)

Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 

Supporting Financial Services with a More Flexible Approach to Big Data

  • 1. Supporting Financial Services With a More Flexible Approach to Big Data October 2014
  • 2. Hortonworks Fall 2014 Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved We Do Hadoop
  • 3. Our Mission: Power your Modern Data Architecture with HDP and Enterprise Apache Hadoop Who we are June 2011: Original 24 architects, developers, operators of Hadoop from Yahoo! June 2014: An enterprise software company with 420+ Employees Our model Innovate and deliver Apache Hadoop as a complete enterprise data platform completely in the open, backed by a world class support organization Key Partners Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 4. Fastest growing Fortune 1000 customer base Customer Momentum • 300+ customers in seven quarters, growing at 75+/quarter • Two thirds of customers come from F1000 • 100% renewal rate Largest Cluster in North America 32,000 Nodes Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Largest Cluster in Europe 1,000 Nodes 30+ customers migrated from other distributions Some notable migrations include many of the early adopters of Hadoop: Experience at Scale 80,000 nodes under contract Largest Known Cluster in APAC 400 Nodes
  • 5. Hortonworks: A Leader In Hadoop The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014 Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved “Hortonworks loves and lives open source innovation” Vision & Execution for Enterprise Hadoop. Hortonworks leads with a strong strategy and roadmap for open source innovation with Hadoop and a strong delivery of that innovation in Hortonworks Data Platform. World Class Support and Services. Hortonworks' Customer Support received a maximum score and was significantly higher than both Cloudera and MapR. Key Strategic Partnerships. Hortonworks’ unique strategic partnerships with Microsoft, SAP, Teradata and others are a key strength as part of its overall strategy of ecosystem partnership to accelerate Hadoop adoption in the enterprise.
  • 6. HDP IS Apache Hadoop There is ONE Enterprise Hadoop: everything else is a vendor derivation Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDP • Reliable • Consistent • Current
  • 7. Enabling a Modern Data Architecture with HDP and Apache Hadoop Fall 2014 Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hortonworks. We do Hadoop.
  • 8. Traditional systems under pressure DATA SYSTEM APPLICATIONS Business Analytics Custom Applications RDBMS EDW MPP Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Packaged Applications • Silos of Data • Costly to Scale • Constrained Schemas Clickstream Geolocation Sentiment, Web Data Sensor, Machine Data Unstructured docs, emails Server logs SOURCES Existing Sources (CRM, ERP,…) New Data Types …and difficult to manage new data
  • 9. Traditional Hadoop, challenges & limitations MapReduce Largely Batch Processing 1 ° ° ° ° ° HDFS (Hadoop Distributed File System) ° ° ° ° ° N SOURCES EXISTING Systems Clickstream Web &Social Geolocation Sensor & Machine Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Server Logs Unstructured Architectural Limitations • Single-purpose clusters, specific data sets • Primarily a batch system using MapReduce Enterprise Challenges • Limited enterprise capabilities: Operations, Security & Governance • Created additional Silos Interoperability Challenges • Difficult to natively integrate existing applications Commercial add-ons opportunistically emerged in the early days to address these shortcomings DATA SYSTEM APPLICATIONS Business Analytics Custom Applications Packaged Applications RDBMS EDW MPP
  • 10. 2006 2009 MR-279: YARN Hadoop w/ MapReduce MapReduce Largely Batch Processing 1 ° ° ° ° ° HDFS (Hadoop Distributed File System) ° ° ° ° ° N Hadoop2 & YARN based Architecture Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N ° HDFS (Hadoop Distributed File System) Siloed clusters Largely batch system Difficult to integrate Hadoop 2 & YARN Batch Interactive Real-Time Architected & led development of YARN to enable the Modern Data Architecture October 23, 2013
  • 11. HDP2 and YARN enable the Modern Data Architecture Batch Interactive Real-Time HDFS (Hadoop Distributed File System) Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hortonworks architected and led development of YARN Common data set, multiple applications • Optionally land all data in a single cluster • Batch, interactive & real-time use cases • Support multi-tenant access, processing & segmentation of data YARN: Architectural center of Hadoop • Consistent security, governance & operations • Ecosystem applications certified by Hortonworks to run natively in Hadoop SOURCES EXISTING Systems Clickstream Web &Social Geolocation Sensor & Machine Server Logs Unstructured DATA SYSTEM APPLICATIONS Business Analytics Custom Applications Packaged Applications RDBMS EDW MPP YARN: Data Operating System 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N
  • 12. A Blueprint for Enterprise Hadoop Load data and manage according to policy PRESENTATION & APPLICATION ENTERPRISE MGMT & SECURITY DATA ACCESS SECURITY Access your data simultaneously in multiple ways (batch, interactive, real-time) Provide layered YARN Data Operating System Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deploy and effectively manage the platform Store and process all of your Corporate Data Assets approach to security through Authentication, Authorization, Accounting, and Data Protection DATA MANAGEMENT GOVERNANCE & INTEGRATION OPERATIONS Enable both existing and new applications to provide value to the organization Empower existing operations and security tools to manage Hadoop Provide deployment choice across physical, virtual, cloud DEPLOYMENT OPTIONS
  • 13. HDP Delivers Enterprise Hadoop Hortonworks Data Platform 2.2 GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS SECURITY OPERATIONS Java Scala Cascading Tez Stream Storm YARN: Data Operating System (Cluster Resource Management) Script Pig SQL Hive Tez Tez 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved ° ° ° ° ° ° ° ° ° ° ° ° ° ° HDFS (Hadoop Distributed File System) Search Solr NoSQL HBase Accumulo Slider Slider In-Memory Spark Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume Kafka NFS WebHDFS Authentication Authorization Accounting Data Protection Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox Linux Windows Deployment Choice On- Premises Cloud YARN is the architectural center of HDP • Common data set across all applications • Batch, interactive & real-time workloads • Multi-tenant access & processing Provides comprehensive enterprise capabilities • Governance • Security • Operations Enables broad ecosystem adoption • ISVs can plug directly into Hadoop The widest range of deployment options • Linux & Windows • On-premises & cloud Others ISV Engines
  • 14. The Modern Data Architecture w/ HDP Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 15. Clickstream Capture and analyze website visitors’ data trails and optimize your website Sensors Discover patterns in data streaming automatically from remote sensors and machines Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Server Logs Research logs to diagnose process failures and prevent security breaches Hadoop Value: New Types of Data Sentiment Understand how your customers feel about your brand and products – right now Geographic Analyze location-based data to manage operations where they occur Unstructured Understand patterns in files across millions of web pages, emails, and documents
  • 16. New analytic applications for new types of data $ Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved • Supplier Consolidation • Supply Chain and Logistics • Assembly Line Quality Assurance • Proactive Maintenance • Crowdsourced Quality Assurance • New Account Risk Screens • Fraud Prevention • Trading Risk • Maximize Deposit Spread • Insurance Underwriting • Accelerate Loan Processing • Call Detail Records (CDRs) • Infrastructure Investment • Next Product to Buy (NPTB) • Real-time Bandwidth Allocation • New Product Development • 360° View of the Customer • Analyze Brand Sentiment • Localized, Personalized Promotions • Website Optimization • Optimal Store Layout Financial Services Retail Telecom Manufacturing Healthcare Utilities, Oil & Gas Public Sector • Genomic data for medical trials • Monitor patient vitals • Reduce re-admittance rates • Store medical research data • Recruit cohorts for pharmaceutical trials • Smart meter stream analysis • Slow oil well decline curves • Optimize lease bidding • Compliance reporting • Proactive equipment repair • Seismic image processing • Analyze public sentiment • Protect critical networks • Prevent fraud and waste • Crowdsource reporting for repairs to infrastructure • Fulfill open records requests
  • 17. ..to shift from reactive to proactive interactions A shift in Advertising From mass branding …to 1x1 Targeting A shift in Financial Services From Educated Investing …to Automated Algorithms A shift in Healthcare From mass treatment …to Designer Medicine A shift in Retail A shift in Telco Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved HDP and Hadoop allow organizations to shift interactions from… Reactive Post Transaction Proactive Pre Decision From static branding …to Real-time Personalization From break then fix …to repair before break
  • 18. Data Lake: An architectural shift SCALE Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved SCOPE Unlocking the Data Lake RDBMS MPP EDW Data Lake Enabled by YARN • Single data repository, shared infrastructure • Multiple biz apps accessing all the data • Enable a shift from reactive to proactive interactions • Gain new insight across the entire enterprise New Analytic Apps or IT Optimization HDP 2.1 Governance & Integration Security Operations Data Access YARN Data Management
  • 19. HDP is deeply integrated in the data center YARN Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved DEV & DATA TOOLS OPERATIONAL TOOLS INFRASTRUCTURE SOURCES EXISTING Systems Clickstream Web &Social Geolocation Sensor & Machine Server Logs Unstructured DATA SYSTEM RDBMS EDW MPP HANA APPLICATIONS BusinessObjects BI Deep Partnerships Hortonworks engages in deep engineered relationships with the leaders in the data center, such as Microsoft, Teradata, Redhat, HP, SAS & SAP Broad Partnerships Over 600 partners work with us to certify their applications to work with Hadoop so they can extend big data to their users HDP 2.1 Governance & Integration Security Operations Data Access Data Management
  • 20. HDP Use Cases in Financial Services Fall 2014 Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hortonworks. We do Hadoop.
  • 21. Monetize Anonymous & Aggregate Banking Data Problem Valuable banking data needed to be anonymous & unified • Bank possesses data that indicates larger macro-economic trends, which can be monetized in secondary markets • Regulations and company policies protect customer privacy • Data sets are isolated in legacy silos controlled by LOBs • IT challenged by joining data while guaranteeing anonymity Solution Cross-bank data lake for aggregate data with secure access • Multiple data sets abstracted from source platforms • Single point of security & privacy for de-identification, masking, encryption, authentication and access control • Mortgage bankers, consumer bankers, credit card group and treasury bankers have access to the same cross-sell data • Interoperability with partners SAS, R, RedHat & Splunk • Economies of scale for compression & archiving data • Significant reduction in storage costs from prior platforms Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Creating Opportunity Data: Structured, Clickstream, Social & Unstructured Banking One of the largest US banks
  • 22. Insurance Data Lake to Manage Risk Creating Opportunity Data: Structured, Clickstream, Server Log Problem Challenges merging new & old data hamper analysis • Traditional and newer types of data were both growing quickly but were difficult to combine in the EDW • “Schema on load” requirements of EDW platform limited ingest of some data with significant predictive power • Company missed data-driven ways to serve customers • Process of separating legitimate from fraudulent claims created “needle-in-a-haystack” Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved problem Solution Common platform for all types of data improves up-sell and reduces fraud • “Schema on read” Hadoop architecture means that more data sources can be easily ingested to enrich predictive analytics • Agents use big data insights to determine the best action for valued customers and recommend those in real-time • Claims analysts and underwriters process streaming data to quickly flag fraud risks and fast-track legitimate claims Health Insurance Large US medical insurer >$30B in revenue >20M members ~35K employees
  • 23. Maintaining SLAs for Equity Trading Information Problem Meeting 12 millisecond SLAs for “ticker plant” • Daily ingest: 50GB server log data from 10,000 feeds • Four times daily, this data is pushed into DB2 • Applications query this data 35K times per second • 70% of queries are for data <1 year old, 30% for >1 year old • Current architecture can only hold 10 years of trading data • Growing volume puts performance at risk of missing SLAs Solution Meeting SLAs with confidence • HBase provides super-fast queries within SLA targets • ETL offloading to Hadoop allows longer data retention, without jeopardizing fast response times Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Improving Efficiency Data: Server Log & ETL Investment Services Highly trafficked website providing business and financial information ~15K employees
  • 24. Hadoop is a Platform Decision Open Leadership Drive innovation in the open via the Apache community-driven open source process Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Enterprise Rigor Engineer, test and certify Apache Hadoop with the enterprise in mind Ecosystem Endorsement Focus on deep integration with existing data center technologies and skills Fastest Growing Customer and Partner Base Largest and most experienced Hadoop adopters have standardized on Hortonworks The data center leaders have standardized on Hortonworks
  • 25.
  • 26. 26 WANdisco Background • WANdisco: Wide Area Network Distributed Computing – Enterprise-ready, high availability software solutions that enable globally distributed organizations to meet today’s data challenges of secure storage, scalability and availability • Leader in tools for software engineers – Subversion – Apache Software Foundation sponsor • Highly successful IPO, London Stock Exchange, June 2012 (LSE:WAND) • US patented active-active replication technology granted, November 2012 • Global locations – San Ramon (CA) – Chengdu (China) – Tokyo (Japan) – Boston (MA) – Sheffield (UK) – Belfast (UK)
  • 28. 28 Non-Stop Hadoop Non-Intrusive Plugin to Hortonworks HDP Provides Continuous Availability In the LAN / Across the WAN Active/Active
  • 29. 3 Problems For Sharing Data Across Clusters LAN / WAN 29
  • 30. Enterprise-Ready Hadoop Characteristics of Mission-critical Financial Applications 30 • Require Continuous Availability – SLA’s, regulatory compliance • Require HDFS to be Deployed Globally – Share data between data centers – Data is consistent, not eventual • Ease Administrative Burden – Reduce operational complexity – Simplify disaster recovery – Lower RTO/RPO • Allow Maximum Utilization of Resources – Within the data center – Across data centers
  • 31. Breaking Away from Active/Passive What’s in a NameNode 31 Single Standby • Inefficient utilization of resources – Journal Nodes – ZooKeeper Nodes – Standby Node • Performance bottleneck • Still tied to the beeper • Limited to LAN scope Active / Active • All resources utilized – Only NameNode configuration – Scale as the cluster grows – All NameNodes active • Load balancing • Set resiliency (# of active NN) • Global consistency
  • 32. Breaking Away from Active/Passive What’s in a Data Center 32 Standby Data Center • Idle Resource – Single data center ingest – Disaster recovery only • One-way synchronization – DistCp • Error-prone – Clusters can diverge over time • Difficult to scale > 2 Data Centers – Complexity of sharing data increases Active / Active • DR resource available – Ingest at all data centers – Run jobs in both data centers • Replication is multi-directional – Active/active • Absolute consistency – Single HDFS spans locations • ‘N’ data center support – Global HDFS allows appropriate data to be shared
  • 34. Use Case: Disaster Recovery 34 • Data is as current as possible (no periodic synchs) • Doesn’t require monitoring and consistency checking • Virtually zero downtime to recover from regional data center failure • Regulatory compliance
  • 35. 35 • Ingest and analyze anywhere • Analyze everywhere – Fraud detection – Equity trading information – New business – Etc… • Backup data center(s) can be used for work – No idle resources Use Case: Multi-Data Center Ingest and multi-tenant workloads
  • 36. Use Case: Heterogeneous Hardware In-memory analytics 36 • Mixed Hardware Profiles – Memory, disk, CPU – Isolate memory-hungry processing (Storm/Spark) from regular jobs • Share data, not processing – Isolate lower priority (dev/test) work
  • 37. The difficulty realizing the data lake… 37
  • 38. …is that data spans the entire world 38
  • 39. 39 Data Ocean Feeder Site Accounting Mart Banking Mart • Data Marts – Restrict access to relevant data – Create quick clusters • Feeder Sites (Data Tributaries) – Ingest only Data Reservoir Use Cases
  • 40. 40 • Basel III – Consistency of data • Data Privacy Directive – Data sovereignty • Data doesn’t leave country of origin Compliance Regulation Guidelines Regulatory Compliance
  • 41. Technical Comparison Hadoop Powered by WANdisco
  • 42. Multi-Data Center Hadoop Today What's wrong with the status quo 42 Periodic Synchronization DistCp Parallel Data Ingest Load Balancer, Streaming
  • 43. Multi-Data Center Hadoop Today Hacks currently in use 43 Periodic Synchronization DistCp • Runs as MapReduce • DR data center is read-only • Over time, Hadoop clusters become inconsistent • Manual and labor-intensive process to reconcile differences • Inefficient use of the network
  • 44. Multi-Data Center Hadoop Today Hacks currently in use 44 Parallel Data Ingest Load Balancer, Flume • Hiccups in either of the Hadoop clusters causes the two file systems to diverge • Potential to run out of buffer when WAN is down • Requires constant attention and sys-admin hours to keep running • Data created on the cluster is not replicated • Use of streaming technologies (like flume) for data redirection are only for streaming
  • 45. Architecture of a Non-Stop Hadoop 45
  • 46. 46 Question and Answer Submit your questions in chat Q&A

Editor's Notes

  1. Hortonworks approach is quite clear… we are focused on delivery of enterprise grade Hadoop as a reliable data platform that will enable your transition to a modern data architecture. To this end, we work solely within the broad open source community with a focus on innovation at the core of Apache Hadoop with YARN as a foundation and then within all the related projects that deliver on the key requirements for the enterprise such as governance, security and operation.   However, I can’t really talk about Hortonworks without first taking a moment to talk about the history of Hadoop. What we now know of as Hadoop really started in 2005, when a team at Yahoo was directed to build out a large-scale data storage and processing technology that would allow them to improve their most critical application, Search.   Their challenge was essentially two-fold. First they needed to capture and archive the contents of the internet, and then process the data so that users could search through it effectively an efficiently. Clearly traditional approaches were both technically (due to the size of the data) and commercially (due to the cost) impractical. The result was the Apache Hadoop project that delivered large scale storeage (HDFS) and processing (MapReduce). Yahoo soon committed to this open source approach as they understood that rather than locking a few guys in a room to work on it, they could work within the Apache Software Foundation so that others would pick it up, progress it, and contribute it back to the community and thereby greatly accelerate progress for all.   And this is exactly what happened: all of the leading consumer web companies began to use and advance it, to the point that by 2011, Hadoop underpinned every click at Yahoo, and their infrastructure had reached 35,000 nodes.   Soon, mainstream IT started to look closely at Hadoop as a way to address the architectural challenge faced by the explosion of data that every organization was experiencing as mobile, social and machine generated data began to accelerate. It was at this point with an objective of facilitating broader market adoption, the core Hadoop team left to form Hortonworks – with the blessing of Yahoo – and with a singular goal: to progress Hadoop into Enterprise Hadoop – a complete open source data platform that enables a modern data architecture. Since our incepetion just three years ago, we have grown to more than 450 employees and have partnered closely with the leaders in the datacenter, all of whom share this vision: to enable a modern data architecture with Hadoop in order to allow their customers to address the architectural challenge that they all are facing due to exploding data volumes. [note: if useful as a talk track, Doug Cutting was hired by the team at Yahoo in the early days based on some prototype work he’d done and he left to form Cloudera in 2008 well BEFORE Hadoop was running at scale inside of Yahoo]
  2. In the past few years, we have seen phenomenal momentum behind Hadoop and Hortonworks: we began shipping the Hortonworks Data Platform in Q3 of 2012 (only 8 quarters ago), and since, we have partnered with more than 350 customers as their Hadoop provider of choice, 2/3 of which are in the Fortune 1000.   The most interesting aspect for me: the quantity of early adopters that began their Hadoop journey with an alternative distribution (since Cloudera had been in the market 3 years before Hortonworks was formed) that have migrated over to partner with Hortonworks for Hadoop.   These are the very largest users on the planet, who having gotten past their initial forays with the tech really now understand what they want from their Hadoop vendor and have migrated en masse to Hortonworks.   Why? I’ll talk about that a bit more detail, but at a high level: - Open leadership: Hortonworks engineers literally wrote most of the code that needs to be supported and is leading the innovation in the community - Enterprise Rigor: we apply enterprise software rigor to the build, test and release process from the work done in the open source community - Ecosystem endorsement: the Hortonworks Data Platform is deeply integrated with existing datacenter investments allowing users to reuse existing skills. In fact HDP is uniquely sold by many of the major vendors in the ecosystem.
  3. Not only are we the fastest growing Hadoop company, Hortonworks is also a leader… We contribute more lines of code to the Apache Hadoop than any other company. Our engineers are architects that lead innovation in the open community. Our customer turn to us for these reasons… and the analyst community agrees. Our momentum is represented in the latest Forrester Wave, wherein Hortonworks was ranked the #1 overall offering – this based on HDP 1.3, a product released in May 2013. As a reference, we are currently on 2.1.   Not surprisingly given our leadership position in the community we received the very highest rating for Vision and Execution, an acknowledgement that our engineering team is driving the majority of the innovation in Hadoop.   And we were also acknowledged for our deep strategic partnerships: in fact HDP was represented in the wave 3 separate times as we ARE the hadoop offering for Microsoft and Teradata, both of whom were ranked in the Wave.   But most importantly: we received the maximum possible score for our support services. This is ultimately the most important decision criteria… can your partner support your critical deployment? Not surprisingly, the people that built the tech are in the best position to support it. It was for this reason that others who package the work done in the Apache Software Foundation (MapR, Cloudera, Pivotal, IBM, etc) are not able to provide the same level of support as Hortonworks.
  4. Finally, there is only ONE Apache Hadoop. Every other package of hadoop is a vendor derivation of the platform. At Hortonworks, everything we package in HDP is from the very latest components at the apache software foundation. This ensures that our customers have access to the very latest innovation from the community, to which we then apply enterprise software rigor to the build, test and release process to create HDP.   HDP “IS” Apache Hadoop – it is not a vendor derivative that has been forked and modified, it IS Apache Hadoop, no additions, no hold-backs.   When comparing Hadoop offerings vendors it is critical to understand this picture as it makes it clear where vendors are diverging from the community approach and ultimately locking customers out of the community innovation.
  5. Our goal since our inception has been very simple: to enable a Modern Data Architecture with Enterprise Hadoop. Everything we do is with this architectural goal in mind. They want single platform to enable batch interactive and real time
  6. So, where does Hadoop fit in the data center? This picture here is a very simple depiction of the typical data architecture in any organization.   - There are sources of data: ERP, CRM, other digital sources - That data is then stored in a data system: a data warehouse, MPP system, etc - Then an application of some kind accesses that data system: a packaged application such as Excel or Tableau, a custom application written by a developer, or even another business application   This has been the foundation of the data center for years. We have had some challenges with this architecture all along, however, we are seeing increased pressure to modify and improve this basic blueprint because A) this approach created silos of data and it was difficult to either share the data or get a single view of it B) these systems are costly to scale C) and they are also coupled to a very static schema. Changes to a data model are difficult if not imnpossible. This limits flexibility and iniight.   Finally, the emergence of NEW types of data as we digitize the world around us such as clickstream, machine sensor, etc, are growing at exponential rates. We are all becoming data driven organizations.   In fact that sheer volume of data is to grow 20X between 2013 and 2020 – and which puts tremendous pressure on this architecture. The old architecture is neither technologically nor commercially practical.
  7. In response, many organizations have turned to Apache Hadoop. Originally created by the team at Yahoo, it introduced a scale-out approach to the storage and processing of data that could scale linearly in an extremely cost-effective manner.   However traditional Hadoop has had its own limitations: - Architecturally, Hadoop in this sense was a purely batch system: load data into HDFS and then utilize MapReduce to run a batch lookup. While useful, it limited the kinds of applications that could be built - It required a dedicated cluster per use-case: because the lack of a central resource manager meant that a given application would monopolize all of the resources of the cluster until that particular job was completed. - Traditional Hadoop was also not well suited to integration with existing environments: integration tended to be custom for each application - Finally, It lacked the enterprise capabilities that mainstream IT require   Rather than enabling a modern data architecture, in some cases it created yet more silo’s. [Some vendors want to return to this, with a singel engine]
  8. This all changed with the introduction of Hadoop 2 and YARN. Introduced in October, 2013 it changed everything.   Introduced in MR-279 by Arun Murthy in 2009, Arun and the team at Hortonworks architected and led it’s development as the core change in Hadoop 2. Our view was that to truly enable Hadoop as a component of a broad data architecture, YARN was the fundamental requirement as it turns Hadoop from a single application data system to a multi application data system. This is foundational to our approach of innovating from the core outwards to build Enterprise Hadoop. With YARN it is now possible to land all data in one cluster and then access it in multiple ways: from batch to interactive to real-time.   Today, YARN, at the core of Hadoop is the center of our focus on innovation in and around Hadoop. It is clearly the enabling technology that has started a transition to a data lake within organizations. Simply stated… Hortonworks Architected & led development of YARN in order to enable the Modern Data Architecture
  9. YARN is relatively the element that enables the modern data architecture as it turns hadoop into a truly multi-purpose data platform with batch, interactive and real time workloads all running in a single cluster..   It enables users to: - Create a central cluster into which data can be stored and then accessed it using a range of processing engines: batch, interactive, real-time. - It is akin to the journey with virtualization: from a single virtual server to a pool of virtual infrastructure.   It is the architectural center of Hadoop - it provides the data operating system around which the core enterprise capabilities of security, governance and operations can be integrated - It is the integration point into which all data processing engines integrate – from the open source community but also from the commercial vendor ecosystem
  10. Hadoop has evolved over the years to not only provide linear scale compute and storage, but it also needed explicit functions to make it a complete data platform. These new projects spun up around Hadoop to meet some of the complex requirements of the modern enterprise A good way to look at the evolution of Hadoop is through this picture. - When Hadoop began it was simply a data management layer (HDFS) and a single data access engine (MapReduce). Over the past several years the range of components in the Hadoop ecosystem has exploded: - Data Access - The emergence of multiple access engines spanning SQL, NoSQL, Scripting, Streaming and more. YARN ensures that they all can be part of Hadoop seamlessly. - Security - To address the key requirements of authorization, access, audit/accounting and data protection - Operations - Tools to manage the platform - Governance and integration - Tools to load and manage data according to policy   These are all the core requirements of any data platform and over time the Hadoop community has expanded to include all of these capabilities. The reason that there are 5 categories?   Because each addresses the requirements of each different persona that engages with a data platform. Developers (Data Access) Administrators (Security, Operations) Governance (Data Architects)
  11. Capturing new data and providing the ability to process streams of this data is allowing organizations to shift from taking a REACTIVE, post transaction approach to more of a PROACTIVE, pre decision approach to interactions with their customers, suppliers and employees.   Again, no matter the vertical, this transition is happening.   For instance… read.
  12. Ultimately, most organizations that adopt Hadoop, create a data lake. A data lake provides a single data repository on shared infrastructure and serves the needs of multiple business applications all running on a single set of data. This visionary architecture was not possible until October of 2013 when Hortonworks and the community pushed YARN GA.   With a YARN-based architecture serving as the data operating system for Hadoop 2, HDP takes Hadoop beyond single-use, batch processing to a fully functional, multi-use platform that enables batch, interactive, and real-time data processing. Leading organizations can now use YARN and HDP to process data and derive value from multiple business cases and realize their vision of a data lake.   The value in delivering multiple access methods on a single set of data extends beyond data science. It allows a business to set an architecture where it can deliver multiple value points all across a single set of data to create an enterprise capability previously only imagined. For instance, an organization can analyze real-time clickstream data using Apache Storm to pick off events that need attention, run an Apache Pig script to update product catalog recommendations, and then deliver this information via low-latency access through Apache HBase to millions of web visitors—all in real time, and all on a single set of data.
  13. The modern data architecture simply does not work unless it integrates with the systems and tools you already deploy. HDP enables your existing data platforms to expand the data you have under management through integration. The goal of HDO is to augment not replace these existing systems as we very clearly understand that you need to ruuse skills.   Further, through our work within the Hadoop community to deliver YARN, we have opened up Hadoop and unlocked innovation in the community of data center ISVs can extend their applications so that they can run natively IN Hadoop as just another workload operating on the single set of data lake. They can now function as a first class citizen alongside any other workload in Hadoop.
  14. Our goal since our inception has been very simple: to enable a Modern Data Architecture with Enterprise Hadoop. Everything we do is with this architectural goal in mind. They want single platform to enable batch interactive and real time
  15. Hundreds of organizations have turned to Hortonworks because Hadoop is ultimately a platform decision. It is typically the first step towards re-architecting your back end data systems and not to be considered lightly. These organizations that have already been successful with Hadoop have required not just a stable, reliable and complete Hadoop solution, but more importantly a connection with the architects, builders and operators of this open source technology. They saw this in Hortonworks. And as with any platform decision, it is imperative that Hadoop integrates with the tools and systems that are already resident in your data center. We forge deep relationships with our hundreds of partners so that you can not only ensure integration but also effectively reapply existing systems and skillsets toward your big data challenges. At Hortonworks, we hold true to these foundational beliefs and have partnered with hundreds of organizations from some of the largest and earliest big data adopters to the most conservative and data rich companies on the planet. We ensure that your Hadoop journey is successful and more companies are turning to Hortonworks today than any other offering on the marketplace. We invite you to join our community.
  16. Maximize Resource Utilization No idle standby Isolate Dev and Test Clusters Share data not resource Carve off hardware for a specific group Prevents a bad map/reduce job from bringing down the cluster Guarantee Consistency and availability of data Data is instantly available
  17. Optimized hardware profiles for job specific tasks Batch Real-time NoSQL (HBASE) Set replication factors per sub-cluster Use at LAN or WAN scope Resilient to NameNode failures