SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
Hortonworks: We Do Hadoop
“State of the Union” Webinar
Shaun Connolly, VP Strategy
@shaunconnolly, @hortonworks

January 22, 2014

© Hortonworks Inc. 2014

Page 1
Today’s Webinar
• Apache Hadoop & Hortonworks Overview
• Hadoop’s Role
• Hadoop Adoption: From Apps to Lake
• Enterprise Hadoop Technology Directions

© Hortonworks Inc. 2014

Page 2
Our Mission:

Enable your Modern Data Architecture by
Delivering Enterprise Apache Hadoop

Our Commitment
Headquarters: Palo Alto, CA
Employees: 300+ and growing

Open Leadership
Drive innovation in the open exclusively via the
Apache community-driven open source process

Reseller Partners

Enterprise Rigor
Engineer, test and certify Apache Hadoop with
the enterprise in mind

Ecosystem Endorsement
Focus on deep integration with existing data
center technologies and skills

Our Vision:
More than Half the World's Data Will Be Processed by Apache Hadoop
© Hortonworks Inc. 2014

Page 3
Apache Community Process
Apache Community Projects

Apache
HBase

Apache Software Foundation
Guiding Principles
•  Release early & often
•  Transparency, respect, meritocracy

Apache
Hive

Apache
Pig

Key Roles

Test &
Patch

Apache
Hadoop
Apache
Storm

Release

•  PMC Members
–  Managing community projects
–  Mentoring new incubator projects

Design & Develop

•  Committers
Apache
Falcon

Apache
Ambari

–  Authoring, reviewing & editing code

•  Release Managers
–  Testing & releasing projects

© Hortonworks Inc. 2014

Page 4
Hortonworks Process for Enterprise Hadoop
Upstream Community Projects

Downstream Enterprise Product
Certified at scale using the most
advanced Hadoop test bed on the planet

Apache
HBase

•  1000’s of production nodes at Yahoo!

Apache
Hive

•  Over 1500 unit & system tests
Integrate
& Test

Apache
Pig

Test &
Patch

Apache
Hadoop
Apache
Storm

Release

Design &
Develop
Fixed Issues

Design & Develop
Apache
Falcon

Apache
Ambari

HDP 2.0

Package
& Certify

Stable Project
Releases

Distribute

Virtuous cycle when development & fixed issues done
upstream & stable project releases flow downstream
© Hortonworks Inc. 2014

Page 5
Hadoop’s Role…
“Hadoop is becoming a more ‘normal’
software market” and the “Hadoop vendor
ecosystem [is] gaining critical mass”
Tony Baer, Ovum

© Hortonworks Inc. 2014

Page 6
APPLICATIONS	
  

A Traditional Approach Under Pressure
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Packaged	
  
Applica4ons	
  

DATA	
  	
  SYSTEM	
  

2.8	
  ZB	
  in	
  2012	
  
85%	
  from	
  New	
  Data	
  Types	
  
RDBMS	
  

EDW	
  

MPP	
  

REPOSITORIES	
  

15x	
  Machine	
  Data	
  by	
  2020	
  
40	
  ZB	
  by	
  2020	
  

SOURCES	
  

Source: IDC

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2014

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  

Page 7
Unlock Value in New Types of Data
1.  Social
Understand how people are feeling and interacting –
right now

2.  Clickstream
Capture and analyze website visitors’ data trails and
optimize your website

3.  Sensor/Machine
Discover patterns in data streaming from remote
sensors and machines

4.  Geographic

Value

Analyze location-based data to manage operations
where they occur

5.  Server Logs
Diagnose process failures and prevent security
breaches

6.  Unstructured (txt, video, pictures, etc..)
Understand patterns in files across millions of web
pages, emails, and documents

© Hortonworks Inc. 2014

+ Online archive
Data that was once purged or moved
to tape can be stored in Hadoop to
discover long term trends and
previously hidden value

Page 8
SOURCES	
  

DATA	
  	
  SYSTEM	
  

APPLICATIONS	
  

A Modern Data Architecture Enabled
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

RDBMS	
  

EDW	
  

Packaged	
  
Applica4ons	
  

• Complement	
  Data	
  Systems	
  
• Right	
  Workload	
  Right	
  Place	
  

MPP	
  

REPOSITORIES	
  

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2014

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  

Page 9
DATA	
  SYSTEM	
  

APPLICATIONS	
  

A Modern Data Architecture Applied
BusinessObjects BI

DEV	
  &	
  DATA	
  TOOLS	
  

OPERATIONAL	
  TOOLS	
  
RDBMS	
  

EDW	
  

HANA

MPP	
  

SOURCES	
  

INFRASTRUCTURE	
  

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2014

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  

Page 10
Major Vendors Have Embraced Hadoop

HDInsight &
HDP for Windows

Teradata Portfolio
for Hadoop

•  Only Hadoop Distribution
for Windows Azure &
Windows Server

•  Seamless data access
between Teradata and
Hadoop (SQL-H)

•  Native integration with
SQL Server, Excel, and
System Center

•  Simple management &
monitoring with Viewpoint
integration

•  Extends Hadoop to .NET
community

•  Flexible deployment
options

Instant Access +
Infinite Scale
•  SAP can assure their
customers they are
deploying an SAP HANA
+ Hadoop architecture
fully supported by SAP
•  Enables analytics apps
(BOBJ) to interact with
Hadoop

Complete Portfolio for Hadoop

	
  

UDA	
  
Diagram	
  
Appliances

© Hortonworks Inc. 2014

Page 11
Hadoop Adoption
“Hadoop’s momentum is unstoppable as its open
source roots grow wildly into enterprises. Its refreshingly
unique approach to data management is transforming how
companies store, process, analyze, and share big data”
--Mike Gualtieri, Forrester

© Hortonworks Inc. 2014

Page 12
SCALE

Drivers of Hadoop Adoption

New Analytic Apps
New Types of Data
LOB Driven


SCOPE

© Hortonworks Inc. 2014

Page 13
20 Common Business Applications
Industry

Use Case
New Account Risk Screens

Geographic
Clickstream
Sensor

Assembly Line Quality Assurance

Sensor

Crowdsourced Quality Assurance

Social

Use Genomic Data in Medical Trials

Structured

Monitor Patient Vitals in Real-Time

Sensor

Recruit and Retain Patients for Drug Trials

Social, Clickstream

Improve Prescription Adherence

Social, Unstructured, Geographic

Unify Exploration & Production Data

Sensor, Geographic & Unstructured

Monitor Rig Safety in Real-Time

© Hortonworks Inc. 2014

Clickstream, Text

Supply Chain and Logistics

Government

Server Logs, Text, Social

Website Optimization

Oil & Gas

Machine, Server Logs

Localized, Personalized Promotions

Pharmaceuticals

Machine, Geographic

360° View of the Customer

Healthcare

Geographic, Sensor, Text

Real-time Bandwidth Allocation

Manufacturing

Server Logs

Infrastructure Investment

Retail

Trading Risk

Call Detail Records (CDRs)

Telecom

Text, Server Logs

Insurance Underwriting

Financial Services

Type of Data

Sensor, Unstructured

ETL Offload in Response to Federal Budgetary Pressures

Structured

Sentiment Analysis for Government Programs

Social
Page 14
Drivers Hadoop Adoption
SALESofCANVAS


MDA/Data Lake
Cost, Insight
IT Driven


SCALE

More data and
analytic apps

New Analytic Apps
New Types of Data
LOB Driven


SCOPE

© Hortonworks Inc. 2014

Page 15
PB’s

The Journey Towards a Data Lake

PB

Risk Management
E.g., Fraud Reduction

New Business
E.g., Data as a Product

DATA

TB’s

Customer Intimacy
E.g., 360 Degree View
of the Customer

DATA LAKE
Operational Excellence
E.g., Network
Maintenance

An architectural shift in the
data center that uses Hadoop
to deliver deep insight across a
large, broad, diverse set of
data at efficient scale

VALUE
© Hortonworks Inc. 2014

Page 16
Drivers of the Data Lake

DATA	
  LAKE	
  

•  Allows simultaneous access by and timely insights for all
your users across all your data
•  Enabled schema on read & enterprise-wide pool of data
	
  Data	
  	
  
	
  Access	
  

+	
  Hadoop	
  =	
  INSIGHT

	
  BROAD	
  INSIGHT	
  

Access your data simultaneously in multiple ways
Data	
  Access	
  
Irrespective ofdthe sprocessing engine, analytical
Access	
  your	
   ata	
   imultaneously	
  in	
  mul4ple	
  ways	
  
application or presentation

	
  EFFICIENT	
  
+	
  Hadoop	
  =	
  SCALE SCALE	
  
Data	
  Management	
  

Store	
  and	
  process	
  all	
  of	
  your	
  Corporate	
  Data	
  Assets	
  

•  Acquire all data in original format and store in one place,
cost effectively and for an unlimited time
•  Scale horizontally and to petabyte scale

© Hortonworks Inc. 2014

Page 17
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Packaged	
  
Applica4ons	
  

	
  BROAD	
  INSIGHT	
  

DATA	
  LAKE	
  

APPLICATIONS	
  

Data Lake Transforms Your Architecture

Data	
  Access	
  

Access	
  your	
  data	
  simultaneously	
  in	
  mul4ple	
  ways	
  

	
  EFFICIENT	
  SCALE	
  
Data	
  Management	
  

SOURCES	
  

Store	
  and	
  process	
  all	
  of	
  your	
  Corporate	
  Data	
  Assets	
  

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2014

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  

Page 18
Enterprise Hadoop
Technology Directions
“With Hadoop 2.0 we expect this ecosystem
to grow like bamboo in spring time.”
Robin Bloor, The Bloor Group

© Hortonworks Inc. 2014

Page 19
What’s Needed for Enterprise Hadoop?

1
2
3

Key Services
Platform, Operational and Data
services essential for the
enterprise

OPERATIONAL	
  
OPERATIONAL	
  
SERVICES	
  
SERVICES	
  
AMBARI	
  
Cluster	
  
Mgmt	
   Dataset	
  
FALCON*	
  
Mgmt	
  
Schedule	
  
OOZIE	
  

SQOOP	
  

MAP	
  	
  

Process	
  
REDUCE	
  
	
  

NFS	
  

OS/VM	
  

Data	
  

Security	
  
KNOX*	
  

TEZ	
  

YARN	
  	
  	
  
Resource	
  Management	
  

WebHDFS	
  

CORE	
  	
  
CORE	
  SERVICES	
  
SERVICES	
  

© Hortonworks Inc. 2014

HBASE	
   PIG	
   HIVE	
  &	
  
Data	
  Access	
  
HCATALOG	
  

Movement	
  

Leverage your existing skills:
development, analytics,
operations

Interoperable with existing data
center investments

FLUME	
  

Data	
  

Skills

Integration

DATA	
  
SERVICES	
  

HDFS	
  
Storage	
  
Enterprise Readiness

High Availability, Disaster
Recovery, Rolling Upgrades,
Security and Snapshots

Cloud	
  

Appliance	
  

Page 20
What’s Needed for Enterprise Hadoop?

1
2
3

Key Services
Platform, Operational and Data
services essential for the
enterprise

OPERATIONAL	
  
OPERATIONAL	
  
SERVICES	
  
SERVICES	
  
AMBARI	
  
Cluster	
  
AMBARI	
   Dataset	
  
Mgmnt	
   FALCON	
  
FALCON*	
  
Mgmnt	
  
Schedule	
  
OOZIE	
  
OOZIE	
  

CORE	
  

	
  
	
  

CORE	
  	
  
CORE	
  SERVICES	
  
SERVICES	
  

Integration

HBASE	
   PIG	
   HIVE	
  &	
  
Data	
  Access	
  HIVE	
  
HCATALOG	
  
HBASE	
  

Movement	
  

SQOOP	
  
SQOOP	
  

MAP	
  	
  

Process	
  
REDUCE	
  
	
  

NFS	
  
NFS	
  

YARN	
  	
  	
  
Resource	
  Management	
  

WebHDFS	
  

WebHDFS	
  

KNOX	
  
KNOX*	
  

TEZ	
  
TEZ	
  

HDFS	
  
Storage	
  
HDFS	
  
Enterprise Readiness

High Availability, Disaster
Recovery, Rolling Upgrades,
Security and Snapshots

HORTONWORKS	
  	
  
DATA	
  PLATFORM	
  (HDP)	
  

Interoperable with existing data
center investments

OS/VM	
  

© Hortonworks Inc. 2014

FLUME	
  
FLUME	
  
Data	
  

LOAD	
  &	
  	
  
LOAD	
  &	
  	
  
EXTRACT	
  
EXTRACT	
  

Skills
Leverage your existing skills:
development, analytics,
operations

DATA	
  
DATA	
  
SERVICES	
  
SERVICES	
  

Cloud	
  

Appliance	
  
Page 21
Hadoop 2 & Beyond

details: hortonworks.com/labs

© Hortonworks Inc. 2014

Page 22
Hadoop 2: The Introduction of YARN
Store all data in one place, interact in multiple ways
Single Use System

Multi-Use Data Platform

Batch Apps

Batch, Interactive, Online, Streaming, …

1st Gen
of Hadoop

2nd Gen of Hadoop
Classic	
  
Hadoop	
  
Apps	
  
Batch	
  
MapReduce	
  

MapReduce	
  

(cluster	
  resource	
  management	
  
	
  &	
  data	
  processing)	
  

HDFS	
  

(redundant,	
  reliable	
  storage)	
  

© Hortonworks Inc. 2014

Hive,	
  Pig,	
  others…	
  
Batch	
  &	
  Interac4ve	
  
Tez	
  

Flexible	
  Data	
  
Processing	
  

Online	
  Data	
  	
  
Processing	
  

HBase,	
  Accumulo	
  

Stream	
  	
  
Processing	
  
Storm	
  

	
  
others	
  
…	
  

Efficient	
  Cluster	
  Resource	
  	
  
Management	
  &	
  Shared	
  Services	
  
(YARN)	
  

Redundant,	
  Reliable	
  Storage	
  
(HDFS)	
  

Page 23
Apache Hadoop YARN
The Data Operating System for Hadoop 2
Flexible
Enables other purpose-built data
processing models beyond
MapReduce (batch), such as
interactive and streaming

Efficient

Shared

Double processing IN Hadoop on
the same hardware while
providing predictable
performance & quality of service

Provides a stable, reliable,
secure foundation and
shared operational services
across multiple workloads

Data	
  Processing	
  Engines	
  Run	
  Na4vely	
  IN	
  Hadoop	
  
BATCH	
  
INTERACTIVE	
  
ONLINE	
  
STREAMING	
   IN-­‐MEMORY	
  
MapReduce	
  
Tez	
  
HBase,	
  Accum	
  
Storm	
  
Spark	
  

OTHER	
  

Open	
  Source	
  /	
  Commercial	
  

YARN:	
  Cluster	
  Resource	
  Management	
  	
  	
  
HDFS:	
  Redundant,	
  Reliable	
  Storage	
  

© Hortonworks Inc. 2014

Page 24
Apache Tez: Modern Execution Engine
Apache Tez is a modern & more efficient
alternative to MapReduce built on YARN
Supports BOTH Batch & Interactive workloads
–  Used for Stinger initiative to enable interactive SQL for Apache Hive
–  Hive and Pig will work on Tez
–  Other solutions are considering Tez

Hive	
  
MR	
  

(batch)	
  

(SQL)	
  

Pig	
  

(data	
  flow)	
  

	
  
OTHER	
  

Open	
  Source	
  /	
  Commercial	
  

Tez	
  

	
  

(execu@on	
  engine)	
  

YARN	
  

(cluster	
  resource	
  management)	
  

HDFS	
  

(redundant,	
  reliable	
  storage)	
  

© Hortonworks Inc. 2014

Page 25
Batch AND Interactive SQL-IN-Hadoop
Apache Hive

Value Delivered

•  The defacto standard for Hadoop SQL access
•  Used by your current data center partners
•  Built for batch AND interactive query

•  Enables rapid insight over big data

SQL
Stinger Initiative

•  Single engine for batch & interactive
•  Preserves and transparently enhances
existing investments in use of Hive
–  Ex. Hive-based solutions get 100x faster

•  SQL compliance improves integration
with other data systems & tools
•  New ORCFile reduces storage up to
70% while improving resource use,
scale, and throughput

Broad, community based effort to deliver the
next generation of Apache Hive
Speed

Scale

SQL

Improve Hive query
performance by 100X to
allow for interactive
query times (seconds)

The only SQL interface
to Hadoop designed for
queries that scale from
TB to PB

Support broadest range
of SQL semantics for
analytic applications
against Hadoop

© Hortonworks Inc. 2014

Page 26
Speed: Delivering Interactive Query
Query	
  27:	
  Pricing	
  Analy4cs	
  using	
  Star	
  Schema	
  Join	
  	
  
Query	
  82:	
  Inventory	
  Analy4cs	
  Joining	
  2	
  Large	
  Fact	
  Tables	
  
1400s

190x	
  
Improvement	
  

3200s

200x	
  
Improvement	
  

65s
39s
14.9s

7.2s
TPC-­‐DS	
  Query	
  27	
  

Hive 10

Hive 0.11 (Phase 1)

TPC-­‐DS	
  Query	
  82	
  

Trunk (Phase 3)

All	
  Results	
  at	
  Scale	
  Factor	
  200	
  (Approximately	
  200GB	
  Data)	
  
© Hortonworks Inc. 2014

Page 27
SCALE: Interactive Query at Petabyte Scale
Sustained Query Times

Smaller Footprint

Apache Hive 0.12 provides
sustained acceptable query
times even at petabyte scale

Better encoding with ORCFile in
Apache Hive 0.12 reduces resource
requirements for your cluster

File	
  Size	
  Comparison	
  Across	
  Encoding	
  Methods	
  
Dataset:	
  TPC-­‐DS	
  Scale	
  500	
  Dataset	
  

585	
  GB	
  
(Original	
  Size)	
  

505	
  GB	
  
(14%	
  Smaller)	
  

Impala	
  

221	
  GB	
  

(62%	
  Smaller)	
  

Hive	
  12	
  

131	
  GB	
  

(78%	
  Smaller)	
  

Encoded	
  with	
  

Text	
  

© Hortonworks Inc. 2014

Encoded	
  with	
  

RCFile	
  

Encoded	
  with	
  

Parquet	
  

•  Larger Block Sizes
•  Columnar format
arranges columns
adjacent within the
file for compression
& fast access

Encoded	
  with	
  

ORCFile	
  

Page 28
SQL: Enhancing SQL Semantics
Hive	
  SQL	
  Datatypes	
  

Hive	
  SQL	
  Seman4cs	
  

SQL Compliance

INT	
  

SELECT,	
  INSERT	
  

TINYINT/SMALLINT/BIGINT	
  

GROUP	
  BY,	
  ORDER	
  BY,	
  SORT	
  BY	
  

BOOLEAN	
  

JOIN	
  on	
  explicit	
  join	
  key	
  

FLOAT	
  

Inner,	
  outer,	
  cross	
  and	
  semi	
  joins	
  

DOUBLE	
  

Sub-­‐queries	
  in	
  FROM	
  clause	
  

Hive 12 provides a wide
array of SQL datatypes
and semantics so your
existing tools integrate
more seamlessly with
Hadoop

STRING	
  

ROLLUP	
  and	
  CUBE	
  

TIMESTAMP	
  

UNION	
  

BINARY	
  

Windowing	
  Func@ons	
  (OVER,	
  RANK,	
  etc)	
  

DECIMAL	
  

Custom	
  Java	
  UDFs	
  

ARRAY,	
  MAP,	
  STRUCT,	
  UNION	
  

Standard	
  Aggrega@on	
  (SUM,	
  AVG,	
  etc.)	
  

DATE	
  

Advanced	
  UDFs	
  (ngram,	
  Xpath,	
  URL)	
  	
  

VARCHAR	
  

Sub-­‐queries	
  for	
  IN/NOT	
  IN,	
  HAVING	
  

CHAR	
  

Expanded	
  JOIN	
  Syntax	
  
INTERSECT	
  /	
  EXCEPT	
  

© Hortonworks Inc. 2014

Available	
  
Hive	
  0.12	
  (HDP	
  2.0)	
  
Hive	
  13	
  

Page 29
Real-Time Streaming-IN-Hadoop
Apache Storm
A community-based effort to bring
real-time processing to Hadoop
Goals:

Project	
  Phases	
  
Storm	
  :	
  Streaming	
  in	
  Hadoop	
  
• 
• 
• 
• 

Coming
Soon

Storm-­‐on-­‐YARN	
  
Installa@on	
  with	
  Ambari	
  
Ganglia	
  &	
  Nagios	
  based	
  monitoring	
  
Kaia,	
  HBase,	
  HDFS	
  &	
  Cassandra	
  
connectors	
  

HADOOP INTEGRATION
Making streaming a first-class component of a
modern data architecture

ENTERPRISE CONNECTIVITY
Connecting Storm to the important streaming
sources within the enterprise

IMPROVED MULTI-TENANCY
Increasing operations usability and enabling simple
programming of new flows

© Hortonworks Inc. 2014

Storm	
  :	
  Enterprise	
  Connec4vity	
  

•  No@fica@on	
  and	
  data	
  persistence	
  
bolts:	
  EDWs,	
  RDBMS,	
  JMS	
  etc	
  
•  Data	
  Ingest	
  Spouts	
  
•  AD/LDAP	
  plugin	
  for	
  authen@ca@on	
  
•  High	
  Availability	
  management	
  w/
Ambari	
  

Storm	
  :	
  Improved	
  Mul4-­‐Tenancy	
  
•  Declara@ve	
  “wiring”	
  
•  Hive	
  update	
  support	
  
•  Advanced	
  scheduler	
  

Page 30
Simplified Data Processing for Hadoop
Apache Falcon
Create and implement reusable
workflows for datasets to orchestrate
movement and track lineage

Hortonworks	
  Investment	
  in	
  
Apache	
  Falcon	
  
Q4 2013

Phase	
  1:	
  
• 
• 
• 
• 

Goals:

Acquisition & Processing Data
•  Direct data to processing engines or formats
•  Obfuscate or transform data

Phase	
  2:	
  
• 
• 
• 
• 

Replication & Retention Policy
•  Replicate datasets
•  Establish retention policies for datasets

© Hortonworks Inc. 2014

Coming

Soon

Hive	
  /	
  HCatalog	
  integra@on	
  
Basic	
  Dashboard	
  for	
  En@ty	
  Viewing	
  
Kerberos	
  security	
  support	
  
Ambari	
  integra@on	
  for	
  management	
  

	
  
	
  
Phase	
  3	
  

Coming

Soon

•  Advanced	
  Dashboard	
  for	
  pipeline	
  
building	
  
•  Dataset	
  lineage	
  

Redirection & Extensions of Hadoop
•  Redirect data to encrypt or decrypt
•  Extract segments of data and redirect to other tools

Incubate	
  Apache	
  Falcon	
  
Dataset	
  Replica@on	
  
Dataset	
  Reten@on	
  
Falcon	
  Tech	
  Preview	
  

	
  

Page 31
Enterprise Hadoop Security Today
Authentication

Authorization

Audit

Data Protection

Who am I/prove it?
Control access to
cluster.

Restrict access
to explicit data

Understand who
did what

Encrypt data at
rest & motion

Kerberos in
native Apache
Hadoop
Perimeter
Security with
Apache Knox
Gateway

© Hortonworks Inc. 2014

Native in Apache Hadoop
•  MapReduce Access Control Lists
•  HDFS Permissions
•  Process Execution audit trail

Cell level access control in
Apache Accumulo

Wire encryption
in native Apache
Hadoop
Orchestrated
encryption with
3rd party tools

Page 32
Hadoop Security – What’s Next?
Security in Enterprise Hadoop
Driving the next generation of
Hadoop security
Goals:

Flexible Authentication & Authorization
Improve authentication choices and provide more
granular access controls for the Hadoop platform,
services and data.

Improve Data Protection
Enhance Hadoop’s audit and data protection
capabilities to support broader enterprise
governance and compliance needs.

Work with Existing Systems
Integrate with existing enterprise security and
identity management systems in a consistent way.

© Hortonworks Inc. 2014

Security	
  Investments	
  
Security	
  Phase	
  1:	
  
• 
• 
• 
• 

Delivere
Strong	
  AuthN	
  with	
  Kerberos	
  	
  
d in
HDP 2.0
HBase,	
  Hive,	
  HDFS	
  basic	
  AuthZ	
  
Encryp@on	
  with	
  SSL	
  for	
  NN,	
  JT,	
  etc.	
  
Wire	
  encryp@on	
  with	
  Shuffle,	
  HDFS,	
  
JDBC	
  

Security	
  Phase	
  2:	
  

•  Knox:	
  Hadoop	
  Perimeter	
  Security	
  
•  SQL-­‐style	
  Hive	
  AuthZ	
  (GRANT,	
  
REVOKE)	
  
Coming
Soon
•  ACLs	
  for	
  HDFS	
  
•  SSL	
  support	
  for	
  Hive	
  Server	
  2	
  
•  PAM	
  support	
  for	
  Hive	
  

Security	
  Phase	
  3:	
  

•  Audit	
  event	
  correla@on	
  and	
  Audit	
  
viewer	
  
•  NotOnlyKerberos	
  –	
  Support	
  other	
  
Token-­‐Based	
  Authen@ca@on	
  
•  Data	
  Encryp@on	
  in	
  HDFS,	
  Hive	
  &	
  
HBase	
  

Page 33
Operating Enterprise Hadoop at Scale
Apache Ambari is the only 100% open source
framework for provisioning, managing and
monitoring Apache Hadoop clusters

AMBARI	
  WEB	
  
	
  
	
  

Integra@on	
  With	
  Exis@ng	
  Opera@ons	
  Tools	
  

Viewpoint

COMING SOON!
Ambari Stacks: AMBARI-2714
Ambari Views: AMBARI-4234

Others	
  

REST	
  APIs	
  
PROVISION

AMBARI	
  SERVER	
  

PROVISION	
  |	
  MANAGE	
  |	
  MONITOR	
  

© Hortonworks Inc. 2014

compute
&
storage

.

.

.

MANAGE

.

.

.

.

MONITOR

.

.

.

compute
&
storage

Page 34
Recap
• Hadoop's role is becoming clear
• Major vendors have recognized Hadoop’s role and are
actively integrating it into their solutions
• Adoption path is consistent: from apps to lake
• Open source innovation continues unabated
– YARN opens up the platform, and as adoption deepens, the
community of committers is working to mature it even further

© Hortonworks Inc. 2014

Page 35
Try Hadoop Today… Get Involved
Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop 2

Amsterdam
April 2 - 3, 2014
REGISTER NOW

© Hortonworks Inc. 2014

San Jose, CA
June 3 - 5, 2014
CALL FOR
PAPERS OPEN

Page 36

Weitere ähnliche Inhalte

Was ist angesagt?

Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack EuropeHortonworks
 
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Hortonworks
 
Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group Hortonworks
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Hortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageHortonworks
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Hortonworks
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudHortonworks
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Hortonworks
 
Bigger Data For Your Budget
Bigger Data For Your BudgetBigger Data For Your Budget
Bigger Data For Your BudgetHortonworks
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopHortonworks
 
Actian forrester- hortonworks
Actian   forrester- hortonworksActian   forrester- hortonworks
Actian forrester- hortonworksHortonworks
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
 

Was ist angesagt? (20)

Yahoo! Hack Europe
Yahoo! Hack EuropeYahoo! Hack Europe
Yahoo! Hack Europe
 
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
 
Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group Hortonworks and Clarity Solution Group
Hortonworks and Clarity Solution Group
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
Hortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptxHortonworks sqrrl webinar v5.pptx
Hortonworks sqrrl webinar v5.pptx
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinar
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
 
Bigger Data For Your Budget
Bigger Data For Your BudgetBigger Data For Your Budget
Bigger Data For Your Budget
 
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside HadoopEliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
 
Actian forrester- hortonworks
Actian   forrester- hortonworksActian   forrester- hortonworks
Actian forrester- hortonworks
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
 

Andere mochten auch

Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer. Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer. Mats Johansson
 
Big data hadoop architect program certificate
Big data hadoop architect program certificateBig data hadoop architect program certificate
Big data hadoop architect program certificateSumeet Khanna
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksData Con LA
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Hadoop infrastructure for education
Hadoop infrastructure for educationHadoop infrastructure for education
Hadoop infrastructure for educationDarko Marjanovic
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and ManufacturingCloudera, Inc.
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramSkillspeed
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 

Andere mochten auch (20)

Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer. Hortonworks - How Hadoop makes the successful Retailer.
Hortonworks - How Hadoop makes the successful Retailer.
 
Big data hadoop architect program certificate
Big data hadoop architect program certificateBig data hadoop architect program certificate
Big data hadoop architect program certificate
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Hadoop In Action
Hadoop In ActionHadoop In Action
Hadoop In Action
 
Hadoop infrastructure for education
Hadoop infrastructure for educationHadoop infrastructure for education
Hadoop infrastructure for education
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Surviving the Hadoop Revolution
Surviving the Hadoop RevolutionSurviving the Hadoop Revolution
Surviving the Hadoop Revolution
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Hadoop and Manufacturing
Hadoop and ManufacturingHadoop and Manufacturing
Hadoop and Manufacturing
 
Interacting with hdfs
Interacting with hdfsInteracting with hdfs
Interacting with hdfs
 
Proof-Of-Concept
Proof-Of-ConceptProof-Of-Concept
Proof-Of-Concept
 
Apache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJApache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
 
Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2Hortonworks Data In Motion Webinar Series Pt. 2
Hortonworks Data In Motion Webinar Series Pt. 2
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Hadoop
Hadoop Hadoop
Hadoop
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 

Ähnlich wie Enterprise Apache Hadoop: State of the Union

Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopHortonworks
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataWANdisco Plc
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupHortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupMats Johansson
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...Hortonworks
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataPatrickCrompton
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paperSupratim Ray
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopHortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopMats Johansson
 

Ähnlich wie Enterprise Apache Hadoop: State of the Union (20)

Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
Distilling Hadoop Patterns of Use and How You Can Use Them for Your Big Data ...
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Hortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User GroupHortonworks Hadoop @ Oslo Hadoop User Group
Hortonworks Hadoop @ Oslo Hadoop User Group
 
Meetup oslo hortonworks HDP
Meetup oslo hortonworks HDPMeetup oslo hortonworks HDP
Meetup oslo hortonworks HDP
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopHortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with Hadoop
 

Mehr von Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

Mehr von Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Kürzlich hochgeladen

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 

Kürzlich hochgeladen (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Enterprise Apache Hadoop: State of the Union

  • 1. Hortonworks: We Do Hadoop “State of the Union” Webinar Shaun Connolly, VP Strategy @shaunconnolly, @hortonworks January 22, 2014 © Hortonworks Inc. 2014 Page 1
  • 2. Today’s Webinar • Apache Hadoop & Hortonworks Overview • Hadoop’s Role • Hadoop Adoption: From Apps to Lake • Enterprise Hadoop Technology Directions © Hortonworks Inc. 2014 Page 2
  • 3. Our Mission: Enable your Modern Data Architecture by Delivering Enterprise Apache Hadoop Our Commitment Headquarters: Palo Alto, CA Employees: 300+ and growing Open Leadership Drive innovation in the open exclusively via the Apache community-driven open source process Reseller Partners Enterprise Rigor Engineer, test and certify Apache Hadoop with the enterprise in mind Ecosystem Endorsement Focus on deep integration with existing data center technologies and skills Our Vision: More than Half the World's Data Will Be Processed by Apache Hadoop © Hortonworks Inc. 2014 Page 3
  • 4. Apache Community Process Apache Community Projects Apache HBase Apache Software Foundation Guiding Principles •  Release early & often •  Transparency, respect, meritocracy Apache Hive Apache Pig Key Roles Test & Patch Apache Hadoop Apache Storm Release •  PMC Members –  Managing community projects –  Mentoring new incubator projects Design & Develop •  Committers Apache Falcon Apache Ambari –  Authoring, reviewing & editing code •  Release Managers –  Testing & releasing projects © Hortonworks Inc. 2014 Page 4
  • 5. Hortonworks Process for Enterprise Hadoop Upstream Community Projects Downstream Enterprise Product Certified at scale using the most advanced Hadoop test bed on the planet Apache HBase •  1000’s of production nodes at Yahoo! Apache Hive •  Over 1500 unit & system tests Integrate & Test Apache Pig Test & Patch Apache Hadoop Apache Storm Release Design & Develop Fixed Issues Design & Develop Apache Falcon Apache Ambari HDP 2.0 Package & Certify Stable Project Releases Distribute Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream © Hortonworks Inc. 2014 Page 5
  • 6. Hadoop’s Role… “Hadoop is becoming a more ‘normal’ software market” and the “Hadoop vendor ecosystem [is] gaining critical mass” Tony Baer, Ovum © Hortonworks Inc. 2014 Page 6
  • 7. APPLICATIONS   A Traditional Approach Under Pressure Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DATA    SYSTEM   2.8  ZB  in  2012   85%  from  New  Data  Types   RDBMS   EDW   MPP   REPOSITORIES   15x  Machine  Data  by  2020   40  ZB  by  2020   SOURCES   Source: IDC Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2014 Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 7
  • 8. Unlock Value in New Types of Data 1.  Social Understand how people are feeling and interacting – right now 2.  Clickstream Capture and analyze website visitors’ data trails and optimize your website 3.  Sensor/Machine Discover patterns in data streaming from remote sensors and machines 4.  Geographic Value Analyze location-based data to manage operations where they occur 5.  Server Logs Diagnose process failures and prevent security breaches 6.  Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents © Hortonworks Inc. 2014 + Online archive Data that was once purged or moved to tape can be stored in Hadoop to discover long term trends and previously hidden value Page 8
  • 9. SOURCES   DATA    SYSTEM   APPLICATIONS   A Modern Data Architecture Enabled Custom   Applica4ons   Business     Analy4cs   RDBMS   EDW   Packaged   Applica4ons   • Complement  Data  Systems   • Right  Workload  Right  Place   MPP   REPOSITORIES   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2014 Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 9
  • 10. DATA  SYSTEM   APPLICATIONS   A Modern Data Architecture Applied BusinessObjects BI DEV  &  DATA  TOOLS   OPERATIONAL  TOOLS   RDBMS   EDW   HANA MPP   SOURCES   INFRASTRUCTURE   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2014 Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 10
  • 11. Major Vendors Have Embraced Hadoop HDInsight & HDP for Windows Teradata Portfolio for Hadoop •  Only Hadoop Distribution for Windows Azure & Windows Server •  Seamless data access between Teradata and Hadoop (SQL-H) •  Native integration with SQL Server, Excel, and System Center •  Simple management & monitoring with Viewpoint integration •  Extends Hadoop to .NET community •  Flexible deployment options Instant Access + Infinite Scale •  SAP can assure their customers they are deploying an SAP HANA + Hadoop architecture fully supported by SAP •  Enables analytics apps (BOBJ) to interact with Hadoop Complete Portfolio for Hadoop   UDA   Diagram   Appliances © Hortonworks Inc. 2014 Page 11
  • 12. Hadoop Adoption “Hadoop’s momentum is unstoppable as its open source roots grow wildly into enterprises. Its refreshingly unique approach to data management is transforming how companies store, process, analyze, and share big data” --Mike Gualtieri, Forrester © Hortonworks Inc. 2014 Page 12
  • 13. SCALE Drivers of Hadoop Adoption New Analytic Apps New Types of Data LOB Driven SCOPE © Hortonworks Inc. 2014 Page 13
  • 14. 20 Common Business Applications Industry Use Case New Account Risk Screens Geographic Clickstream Sensor Assembly Line Quality Assurance Sensor Crowdsourced Quality Assurance Social Use Genomic Data in Medical Trials Structured Monitor Patient Vitals in Real-Time Sensor Recruit and Retain Patients for Drug Trials Social, Clickstream Improve Prescription Adherence Social, Unstructured, Geographic Unify Exploration & Production Data Sensor, Geographic & Unstructured Monitor Rig Safety in Real-Time © Hortonworks Inc. 2014 Clickstream, Text Supply Chain and Logistics Government Server Logs, Text, Social Website Optimization Oil & Gas Machine, Server Logs Localized, Personalized Promotions Pharmaceuticals Machine, Geographic 360° View of the Customer Healthcare Geographic, Sensor, Text Real-time Bandwidth Allocation Manufacturing Server Logs Infrastructure Investment Retail Trading Risk Call Detail Records (CDRs) Telecom Text, Server Logs Insurance Underwriting Financial Services Type of Data Sensor, Unstructured ETL Offload in Response to Federal Budgetary Pressures Structured Sentiment Analysis for Government Programs Social Page 14
  • 15. Drivers Hadoop Adoption SALESofCANVAS MDA/Data Lake Cost, Insight IT Driven SCALE More data and analytic apps New Analytic Apps New Types of Data LOB Driven SCOPE © Hortonworks Inc. 2014 Page 15
  • 16. PB’s The Journey Towards a Data Lake PB Risk Management E.g., Fraud Reduction New Business E.g., Data as a Product DATA TB’s Customer Intimacy E.g., 360 Degree View of the Customer DATA LAKE Operational Excellence E.g., Network Maintenance An architectural shift in the data center that uses Hadoop to deliver deep insight across a large, broad, diverse set of data at efficient scale VALUE © Hortonworks Inc. 2014 Page 16
  • 17. Drivers of the Data Lake DATA  LAKE   •  Allows simultaneous access by and timely insights for all your users across all your data •  Enabled schema on read & enterprise-wide pool of data  Data      Access   +  Hadoop  =  INSIGHT  BROAD  INSIGHT   Access your data simultaneously in multiple ways Data  Access   Irrespective ofdthe sprocessing engine, analytical Access  your   ata   imultaneously  in  mul4ple  ways   application or presentation  EFFICIENT   +  Hadoop  =  SCALE SCALE   Data  Management   Store  and  process  all  of  your  Corporate  Data  Assets   •  Acquire all data in original format and store in one place, cost effectively and for an unlimited time •  Scale horizontally and to petabyte scale © Hortonworks Inc. 2014 Page 17
  • 18. Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons    BROAD  INSIGHT   DATA  LAKE   APPLICATIONS   Data Lake Transforms Your Architecture Data  Access   Access  your  data  simultaneously  in  mul4ple  ways    EFFICIENT  SCALE   Data  Management   SOURCES   Store  and  process  all  of  your  Corporate  Data  Assets   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2014 Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 18
  • 19. Enterprise Hadoop Technology Directions “With Hadoop 2.0 we expect this ecosystem to grow like bamboo in spring time.” Robin Bloor, The Bloor Group © Hortonworks Inc. 2014 Page 19
  • 20. What’s Needed for Enterprise Hadoop? 1 2 3 Key Services Platform, Operational and Data services essential for the enterprise OPERATIONAL   OPERATIONAL   SERVICES   SERVICES   AMBARI   Cluster   Mgmt   Dataset   FALCON*   Mgmt   Schedule   OOZIE   SQOOP   MAP     Process   REDUCE     NFS   OS/VM   Data   Security   KNOX*   TEZ   YARN       Resource  Management   WebHDFS   CORE     CORE  SERVICES   SERVICES   © Hortonworks Inc. 2014 HBASE   PIG   HIVE  &   Data  Access   HCATALOG   Movement   Leverage your existing skills: development, analytics, operations Interoperable with existing data center investments FLUME   Data   Skills Integration DATA   SERVICES   HDFS   Storage   Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots Cloud   Appliance   Page 20
  • 21. What’s Needed for Enterprise Hadoop? 1 2 3 Key Services Platform, Operational and Data services essential for the enterprise OPERATIONAL   OPERATIONAL   SERVICES   SERVICES   AMBARI   Cluster   AMBARI   Dataset   Mgmnt   FALCON   FALCON*   Mgmnt   Schedule   OOZIE   OOZIE   CORE       CORE     CORE  SERVICES   SERVICES   Integration HBASE   PIG   HIVE  &   Data  Access  HIVE   HCATALOG   HBASE   Movement   SQOOP   SQOOP   MAP     Process   REDUCE     NFS   NFS   YARN       Resource  Management   WebHDFS   WebHDFS   KNOX   KNOX*   TEZ   TEZ   HDFS   Storage   HDFS   Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots HORTONWORKS     DATA  PLATFORM  (HDP)   Interoperable with existing data center investments OS/VM   © Hortonworks Inc. 2014 FLUME   FLUME   Data   LOAD  &     LOAD  &     EXTRACT   EXTRACT   Skills Leverage your existing skills: development, analytics, operations DATA   DATA   SERVICES   SERVICES   Cloud   Appliance   Page 21
  • 22. Hadoop 2 & Beyond details: hortonworks.com/labs © Hortonworks Inc. 2014 Page 22
  • 23. Hadoop 2: The Introduction of YARN Store all data in one place, interact in multiple ways Single Use System Multi-Use Data Platform Batch Apps Batch, Interactive, Online, Streaming, … 1st Gen of Hadoop 2nd Gen of Hadoop Classic   Hadoop   Apps   Batch   MapReduce   MapReduce   (cluster  resource  management    &  data  processing)   HDFS   (redundant,  reliable  storage)   © Hortonworks Inc. 2014 Hive,  Pig,  others…   Batch  &  Interac4ve   Tez   Flexible  Data   Processing   Online  Data     Processing   HBase,  Accumulo   Stream     Processing   Storm     others   …   Efficient  Cluster  Resource     Management  &  Shared  Services   (YARN)   Redundant,  Reliable  Storage   (HDFS)   Page 23
  • 24. Apache Hadoop YARN The Data Operating System for Hadoop 2 Flexible Enables other purpose-built data processing models beyond MapReduce (batch), such as interactive and streaming Efficient Shared Double processing IN Hadoop on the same hardware while providing predictable performance & quality of service Provides a stable, reliable, secure foundation and shared operational services across multiple workloads Data  Processing  Engines  Run  Na4vely  IN  Hadoop   BATCH   INTERACTIVE   ONLINE   STREAMING   IN-­‐MEMORY   MapReduce   Tez   HBase,  Accum   Storm   Spark   OTHER   Open  Source  /  Commercial   YARN:  Cluster  Resource  Management       HDFS:  Redundant,  Reliable  Storage   © Hortonworks Inc. 2014 Page 24
  • 25. Apache Tez: Modern Execution Engine Apache Tez is a modern & more efficient alternative to MapReduce built on YARN Supports BOTH Batch & Interactive workloads –  Used for Stinger initiative to enable interactive SQL for Apache Hive –  Hive and Pig will work on Tez –  Other solutions are considering Tez Hive   MR   (batch)   (SQL)   Pig   (data  flow)     OTHER   Open  Source  /  Commercial   Tez     (execu@on  engine)   YARN   (cluster  resource  management)   HDFS   (redundant,  reliable  storage)   © Hortonworks Inc. 2014 Page 25
  • 26. Batch AND Interactive SQL-IN-Hadoop Apache Hive Value Delivered •  The defacto standard for Hadoop SQL access •  Used by your current data center partners •  Built for batch AND interactive query •  Enables rapid insight over big data SQL Stinger Initiative •  Single engine for batch & interactive •  Preserves and transparently enhances existing investments in use of Hive –  Ex. Hive-based solutions get 100x faster •  SQL compliance improves integration with other data systems & tools •  New ORCFile reduces storage up to 70% while improving resource use, scale, and throughput Broad, community based effort to deliver the next generation of Apache Hive Speed Scale SQL Improve Hive query performance by 100X to allow for interactive query times (seconds) The only SQL interface to Hadoop designed for queries that scale from TB to PB Support broadest range of SQL semantics for analytic applications against Hadoop © Hortonworks Inc. 2014 Page 26
  • 27. Speed: Delivering Interactive Query Query  27:  Pricing  Analy4cs  using  Star  Schema  Join     Query  82:  Inventory  Analy4cs  Joining  2  Large  Fact  Tables   1400s 190x   Improvement   3200s 200x   Improvement   65s 39s 14.9s 7.2s TPC-­‐DS  Query  27   Hive 10 Hive 0.11 (Phase 1) TPC-­‐DS  Query  82   Trunk (Phase 3) All  Results  at  Scale  Factor  200  (Approximately  200GB  Data)   © Hortonworks Inc. 2014 Page 27
  • 28. SCALE: Interactive Query at Petabyte Scale Sustained Query Times Smaller Footprint Apache Hive 0.12 provides sustained acceptable query times even at petabyte scale Better encoding with ORCFile in Apache Hive 0.12 reduces resource requirements for your cluster File  Size  Comparison  Across  Encoding  Methods   Dataset:  TPC-­‐DS  Scale  500  Dataset   585  GB   (Original  Size)   505  GB   (14%  Smaller)   Impala   221  GB   (62%  Smaller)   Hive  12   131  GB   (78%  Smaller)   Encoded  with   Text   © Hortonworks Inc. 2014 Encoded  with   RCFile   Encoded  with   Parquet   •  Larger Block Sizes •  Columnar format arranges columns adjacent within the file for compression & fast access Encoded  with   ORCFile   Page 28
  • 29. SQL: Enhancing SQL Semantics Hive  SQL  Datatypes   Hive  SQL  Seman4cs   SQL Compliance INT   SELECT,  INSERT   TINYINT/SMALLINT/BIGINT   GROUP  BY,  ORDER  BY,  SORT  BY   BOOLEAN   JOIN  on  explicit  join  key   FLOAT   Inner,  outer,  cross  and  semi  joins   DOUBLE   Sub-­‐queries  in  FROM  clause   Hive 12 provides a wide array of SQL datatypes and semantics so your existing tools integrate more seamlessly with Hadoop STRING   ROLLUP  and  CUBE   TIMESTAMP   UNION   BINARY   Windowing  Func@ons  (OVER,  RANK,  etc)   DECIMAL   Custom  Java  UDFs   ARRAY,  MAP,  STRUCT,  UNION   Standard  Aggrega@on  (SUM,  AVG,  etc.)   DATE   Advanced  UDFs  (ngram,  Xpath,  URL)     VARCHAR   Sub-­‐queries  for  IN/NOT  IN,  HAVING   CHAR   Expanded  JOIN  Syntax   INTERSECT  /  EXCEPT   © Hortonworks Inc. 2014 Available   Hive  0.12  (HDP  2.0)   Hive  13   Page 29
  • 30. Real-Time Streaming-IN-Hadoop Apache Storm A community-based effort to bring real-time processing to Hadoop Goals: Project  Phases   Storm  :  Streaming  in  Hadoop   •  •  •  •  Coming Soon Storm-­‐on-­‐YARN   Installa@on  with  Ambari   Ganglia  &  Nagios  based  monitoring   Kaia,  HBase,  HDFS  &  Cassandra   connectors   HADOOP INTEGRATION Making streaming a first-class component of a modern data architecture ENTERPRISE CONNECTIVITY Connecting Storm to the important streaming sources within the enterprise IMPROVED MULTI-TENANCY Increasing operations usability and enabling simple programming of new flows © Hortonworks Inc. 2014 Storm  :  Enterprise  Connec4vity   •  No@fica@on  and  data  persistence   bolts:  EDWs,  RDBMS,  JMS  etc   •  Data  Ingest  Spouts   •  AD/LDAP  plugin  for  authen@ca@on   •  High  Availability  management  w/ Ambari   Storm  :  Improved  Mul4-­‐Tenancy   •  Declara@ve  “wiring”   •  Hive  update  support   •  Advanced  scheduler   Page 30
  • 31. Simplified Data Processing for Hadoop Apache Falcon Create and implement reusable workflows for datasets to orchestrate movement and track lineage Hortonworks  Investment  in   Apache  Falcon   Q4 2013 Phase  1:   •  •  •  •  Goals: Acquisition & Processing Data •  Direct data to processing engines or formats •  Obfuscate or transform data Phase  2:   •  •  •  •  Replication & Retention Policy •  Replicate datasets •  Establish retention policies for datasets © Hortonworks Inc. 2014 Coming Soon Hive  /  HCatalog  integra@on   Basic  Dashboard  for  En@ty  Viewing   Kerberos  security  support   Ambari  integra@on  for  management       Phase  3   Coming Soon •  Advanced  Dashboard  for  pipeline   building   •  Dataset  lineage   Redirection & Extensions of Hadoop •  Redirect data to encrypt or decrypt •  Extract segments of data and redirect to other tools Incubate  Apache  Falcon   Dataset  Replica@on   Dataset  Reten@on   Falcon  Tech  Preview     Page 31
  • 32. Enterprise Hadoop Security Today Authentication Authorization Audit Data Protection Who am I/prove it? Control access to cluster. Restrict access to explicit data Understand who did what Encrypt data at rest & motion Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway © Hortonworks Inc. 2014 Native in Apache Hadoop •  MapReduce Access Control Lists •  HDFS Permissions •  Process Execution audit trail Cell level access control in Apache Accumulo Wire encryption in native Apache Hadoop Orchestrated encryption with 3rd party tools Page 32
  • 33. Hadoop Security – What’s Next? Security in Enterprise Hadoop Driving the next generation of Hadoop security Goals: Flexible Authentication & Authorization Improve authentication choices and provide more granular access controls for the Hadoop platform, services and data. Improve Data Protection Enhance Hadoop’s audit and data protection capabilities to support broader enterprise governance and compliance needs. Work with Existing Systems Integrate with existing enterprise security and identity management systems in a consistent way. © Hortonworks Inc. 2014 Security  Investments   Security  Phase  1:   •  •  •  •  Delivere Strong  AuthN  with  Kerberos     d in HDP 2.0 HBase,  Hive,  HDFS  basic  AuthZ   Encryp@on  with  SSL  for  NN,  JT,  etc.   Wire  encryp@on  with  Shuffle,  HDFS,   JDBC   Security  Phase  2:   •  Knox:  Hadoop  Perimeter  Security   •  SQL-­‐style  Hive  AuthZ  (GRANT,   REVOKE)   Coming Soon •  ACLs  for  HDFS   •  SSL  support  for  Hive  Server  2   •  PAM  support  for  Hive   Security  Phase  3:   •  Audit  event  correla@on  and  Audit   viewer   •  NotOnlyKerberos  –  Support  other   Token-­‐Based  Authen@ca@on   •  Data  Encryp@on  in  HDFS,  Hive  &   HBase   Page 33
  • 34. Operating Enterprise Hadoop at Scale Apache Ambari is the only 100% open source framework for provisioning, managing and monitoring Apache Hadoop clusters AMBARI  WEB       Integra@on  With  Exis@ng  Opera@ons  Tools   Viewpoint COMING SOON! Ambari Stacks: AMBARI-2714 Ambari Views: AMBARI-4234 Others   REST  APIs   PROVISION AMBARI  SERVER   PROVISION  |  MANAGE  |  MONITOR   © Hortonworks Inc. 2014 compute & storage . . . MANAGE . . . . MONITOR . . . compute & storage Page 34
  • 35. Recap • Hadoop's role is becoming clear • Major vendors have recognized Hadoop’s role and are actively integrating it into their solutions • Adoption path is consistent: from apps to lake • Open source innovation continues unabated – YARN opens up the platform, and as adoption deepens, the community of committers is working to mature it even further © Hortonworks Inc. 2014 Page 35
  • 36. Try Hadoop Today… Get Involved Download the Hortonworks Sandbox Learn Hadoop Build Your Analytic App Try Hadoop 2 Amsterdam April 2 - 3, 2014 REGISTER NOW © Hortonworks Inc. 2014 San Jose, CA June 3 - 5, 2014 CALL FOR PAPERS OPEN Page 36