SlideShare ist ein Scribd-Unternehmen logo
1 von 62
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Boost Performance with Scala
Learn From Those Who’ve Done It!
We do Hadoop.
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Your speakers…
Dhruv Kumar
Partner Solutions Engineer
Hortonworks
Cyrille Chépélov
R&D Director
Transparency Rights Management
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop for the Enterprise:
Implement a Modern Data Architecture with HDP
Customer Momentum
• 437+ customers (as of March 31, 2015)
Hortonworks Data Platform
• Completely open multi-tenant platform for any app & any data.
• A centralized architecture of consistent enterprise services for
resource management, security, operations, and governance.
Partner for Customer Success
• Open source community leadership focus on enterprise needs
• Unrivaled world class support
• Founded in 2011
• Original 24 architects, developers,
operators of Hadoop from Yahoo!
• 600+ Employees
• 1,000+ Ecosystem Partners
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Traditional systems under pressure
Challenges
• Constrains data to app
• Can’t manage new data
• Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012
2.8 Zettabytes
2020
40 Zettabytes
LAGGARDS
INDUSTRY
LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop emerged as foundation of new data architecture
Apache Hadoop is an open source data platform for
managing large volumes of high velocity and variety of data
• Built by Yahoo! to be the heartbeat of its ad & search business
• Donated to Apache Software Foundation in 2005 with rapid adoption by
large web properties & early adopter enterprises
• Incredibly disruptive to current platform economics
Traditional Hadoop Advantages
 Manages new data paradigm
 Handles data at scale
 Cost effective
 Open source
Traditional Hadoop Had Limitations
Batch-only architecture
Single purpose clusters, specific data sets
Difficult to integrate with existing investments
Not enterprise-grade
Application
Storage
HDFS
Batch Processing
MapReduce
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Modern Data Architecture emerges to unify data & processing
Modern Data Architecture
• Enable applications to have access to
all your enterprise data through an
efficient centralized platform
• Supported with a centralized approach
governance, security and operations
• Versatile to handle any applications
and datasets no matter the size or type
Clickstream Web
& Social
Geolocation Sensor
& Machine
Server
Logs
Unstructured
SOURCES
Existing Systems
ERP CRM SCM
ANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
ANALYTICS
Applications
Business
Analytics
Visualization
& Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS
(Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-TimeBatch Partner ISVBatch BatchMP
P
EDW
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks & Concurrent
Hortonworks and Concurrent Advance Enterprise
Data Application Development on Hadoop
HDP Integrates and delivers Cascading SDK
• Collection of tools, documentation, libraries,
tutorials and example projects
• Simplifies SQL integration and enables Scala
development for Hadoop
Hortonworks provides level 1 & 2 support for
Cascading SDK
Cascading is the proven application development
platform for building data applications on Hadoop
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks & Concurrent: Partnership Benefits
• SDK empowers developers to quickly build rich data-centric
enterprise applications on Hadoop
• Leverage existing Java or Scala based skill sets to develop
complex applications
• Combines the robustness and simplicity of Cascading with
the reliability and stability of HDP
• Apps built on Cascading such as Scalding can easily take
advantage of YARN and Tez
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Cascading SDK: Overview
• The most widely used application
development framework for building Big
Data applications
• Enables improved Developer Productivity
for enterprises using HDP
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP Integration of Cascading SDK
• SDKs that enable the the rapid
development of batch and
interactive data-driven applications
• Integration with data processing
layer allows Cascading to take
advantage of advances in
interactive applications
Efficient Cluster Resource
Management & Shared Services
(YARN)
Interactive Data Processing
TEZ
Batch Data Processing
MapReduce
Java
Cascading
Scala
Scalding
SQL
Lingual
ML
Pattern
Java
Cascading
Scala
Scalding
SQL
Lingual
ML
Pattern
Enable both existing and new application to
provide value to the organization
PRESENTATION & APPLICATION
Your Trusted Third Party in the Digital Age™
Scalding on Tez
Copyright©2015TransparencyRightsManagement.Allrightsreserved
12
HOW DID WE CHOOSE SCALDING ?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
13
• A Trusted Third Party
– Data escrow, controlled
execution
– Independent re-computation
– Privacy & Personal Data
compliance assessment
• Big Data Services for
Entertainment
– Metadata enrichment
– IP use certification
– Dataset analysis as a service
Why Scalding?
Transparency Rights Management:
Copyright©2015TransparencyRightsManagement.Allrightsreserved
14
Why Scalding?
« Big Data Services for Entertainment » - a Use Case
Digital Service
Provider
Report
Copyright Owners /
Collective Management
Organizations
Copyright©2015TransparencyRightsManagement.Allrightsreserved
15
Why Scalding?
« Big Data Services for Entertainment » - a Use Case
Digital Service
Provider
Report
Copyright Owners /
Collective Management
Organizations
Data Improvement Automatic Data Feed
(« in your format »)
Independent Report
Conformance Report
Copyright©2015TransparencyRightsManagement.Allrightsreserved
16
• September 2013: SQL Server overheats
• October 2013: using Lingual
12 SQL steps + bash scripts
• September 2014: Cascading + Java
• September 28th: tried out Scalding
• November 2014: delivered first results on
Scalding
• April 2015: First success on Scalding+Tez
Why Scalding?
Dataset analysis (from YouTube monthly reports)
Copyright©2015TransparencyRightsManagement.Allrightsreserved
17
Anatomy of a scalding app
Your App (in scala)
scalding
cascading
Hadoop + Tez platform libraries
You 
@TwitterOSS
Concurrent, Inc.
Apache 
Copyright©2015TransparencyRightsManagement.Allrightsreserved
18
SCALDING ON TEZ,
THE MINI-HOWTO
Copyright©2015TransparencyRightsManagement.Allrightsreserved
19
• Step 0: Prerequisites:
– A YARN cluster
– Cascading 3.0
– TEZ runtime lib in HDFS
– A version of scalding with fabric selection
Scalding on Tez, the mini-howto
0.6.2-SNAPSHOT
0.13.1 + PR1220
Copyright©2015TransparencyRightsManagement.Allrightsreserved
20https://github.com/cchepelov/wcplus/blob/master/build.sbt
Scalding on Tez, the mini-HOWTO
• Step 1: build.sbt
Copyright©2015TransparencyRightsManagement.Allrightsreserved
21
Scalding on Tez, the mini-HOWTO
• Step 1: build.sbt (redux)
1.Regain control on what libraries are included
2.Exclude some « long transitive » dependencies
that pull in junk
3.Put in the desired fabric, in a configurable way
sbt --DCASCADING_FABRIC=hadoop clean assembly
Copyright©2015TransparencyRightsManagement.Allrightsreserved
22
Scalding on Tez, the mini-HOWTO
• Step 1bis: assembly.sbt
We’re using fatjars to simplify deployment.
Because of jar hell, we « need » a complicated assembly.sbt
https://github.com/cchepelov/wcplus/blob/master/assembly.sbt
Copyright©2015TransparencyRightsManagement.Allrightsreserved
23
https://github.com/cchepelov/wcplus/blob/master/src/main/scala/com/transparencyrights/demo/wcplus/CommonJob.scala
Scalding on Tez, the mini-HOWTO
• Step 2: a few job flags
Copyright©2015TransparencyRightsManagement.Allrightsreserved
24
• tez.task.resource.memory.mb
– As large as you can afford to give, per CPU per
node
– The more memory, the less Tez needs to spill
intermediates to disk
• tez.container.max.java.heap.fraction
– Defaults (1024MiB * 0.8) assume the JVM’s Native
memory requirements don’t exceed 208 MiB
– Scalding + the Scala runtime + Cascading on top of
Tez seems to require more.
YARN kills offenders switftly!
– The 460MiB figure we’re using (1024+512)*(1-0.7)
• Step 2: a few job flags (continued)
Copyright©2015TransparencyRightsManagement.Allrightsreserved
25
THAT’S IT.
(ALMOST)
Copyright©2015TransparencyRightsManagement.Allrightsreserved
26
IN PRACTICE…
Copyright©2015TransparencyRightsManagement.Allrightsreserved
27
« A VERSION OF SCALDING WITH FABRIC
SELECTION »
WAIT, WHAT?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
28
Scalding traditional --local and --hdfs
flags:
– Uses either LocalFlowConnector or
HadoopFlowConnector
– Types are hard-coded
Cascading 2.5 introduced a new fabric
concept. You can run either with cascading-
hadoop or with cascading-hadoop2-mr1. But:
– Incompatible jars (can’t load both)
– Main types visible to Scalding are different
In practice
« A version of scalding with fabric selection » Wait,
What?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
29
PR1220:
 No longer hardcodes « either Local or Hadoop
1.X »
 Enables supplying any flow connector
implementation, as long as the jar’s around.
 --hdfs to be deprecated as an alias to --hadoop1
 Still built against Cascading 2.6
In practice
« A version of scalding with fabric selection » Wait,
What?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
30
« STILL BUILT ON CASCADING 2.6 »
WHY?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
31
Cascading 3.0 has carefully updated some argument types
to prepare for the future
This is source- and binary-compatible:
In practice
« Still built on Cascading 2.6 »
Scala enforces generic type safety, and the Cascading 3.0
upgrades are not legal with scalac.
But they still are with the JVM…
libraryconsumer
LibraryV2
Same
consumer
In Java
Copyright©2015TransparencyRightsManagement.Allrightsreserved
32
Scalding will require some adjustment to
become compatible with the java-level source
upgrades.
Can this happen without breaking scalding
application source code ?
In practice
… Going to native Cascading 3.0 ?
Copyright©2015TransparencyRightsManagement.Allrightsreserved
33
GUAVA
Copyright©2015TransparencyRightsManagement.Allrightsreserved
34
GUAVAGUAVA
Copyright©2015TransparencyRightsManagement.Allrightsreserved
35
• Guava is a nice library…
… of little use in Scala (?)
• In a Scalding/Cascading/Tez JVM, multiple
versions of guava are required. Each layer
depends on its own version.
About every single version from 11.0 to 16.0.2
• There have been breaking changes (method
renames & removals) in guava 13
• These happen on really mundane objects
In practice…
Guava
Copyright©2015TransparencyRightsManagement.Allrightsreserved
36
• Discussions and actions in progress to
remove the pain
• In the mean-time, using a patched version
« frankenguava » to provide both older and
newer interfaces, to keep all consumers
happy across the stack.
In practice…
Guava
Copyright©2015TransparencyRightsManagement.Allrightsreserved
37
CASCADING’S TEZ*REGISTRY
Copyright©2015TransparencyRightsManagement.Allrightsreserved
38
• Cascading 3.0 uses a set of mapping
registries to convert cascading patterns
into the back-end API.
The Tez registries are new, and distinct from the MR
registries
• The Tez registries are hardened against
Concurrent’s extensive test library, which
is built on years of MR experience.
Tez has its own trouble spots.
Beware of hash joins.
• It works fine now, but getting the
In practice…
Cascading’s Tez*Registry
Copyright©2015TransparencyRightsManagement.Allrightsreserved
39
• It works mostly fine now, but getting the
scalding test library onboard will help a
long way.
In practice…
Cascading’s Tez*Registry
Last-minute update:
.filterWithValue / .mapWithValue
currently crash the Cascading planner (as
of 3.0.1)
(implementation uses a HashJoin)
Copyright©2015TransparencyRightsManagement.Allrightsreserved
40
AN EXAMPLE
Copyright©2015TransparencyRightsManagement.Allrightsreserved
41
A small test:
Copyright©2015TransparencyRightsManagement.Allrightsreserved
42
A small test: « wc plus »
70 books
1.1M lines
10M words
56M bytes
Word,
relative frequency,
deviation from median relative freq
Two Words,
relative frequency,
deviation from median relative freq
Ten Words,
relative frequency,
deviation from median relative freq
Compute
Frequencies
Ignoring things that are more
frequent than 80% of the max
word frequency
All Expressions (1-W to 10-W),
relative frequency,
deviation from median relative freq
…
Copyright©2015TransparencyRightsManagement.Allrightsreserved
43
A small test: « wc plus »
70 books
1.1M lines
10M words
56M bytes
Word,
relative frequency,
deviation from median relative freq
Two Words,
relative frequency,
deviation from median relative freq
Ten Words,
relative frequency,
deviation from median relative freq
Compute
Frequencies
Ignoring things that are more
frequent than 80% of the max
word frequency
All Expressions (1-W to 10-W),
relative frequency,
deviation from median relative freq
…
No .filterWithValue /
.mapWithValue for now
Roulex45 / Wikipedia
count
count
count
count
Copyright©2015TransparencyRightsManagement.Allrightsreserved
44
A small test: « wc plus »
https://github.com/cchepelov/wcplus
Copyright©2015TransparencyRightsManagement.Allrightsreserved
45
TIPS & TRICKS
Copyright©2015TransparencyRightsManagement.Allrightsreserved
46
Run your job with
-Dcascading.planner.plan.path=/tmp/path/to/plan.lst
The planner will output a lot of useful files. One of them is
…/$(Job)/4-final-flow-steps/0000-step-node-sub-graph.dot
Run that file through graphviz
dot –O –Tpdf 0000-step-node-sub-graph.dot
or, if the PDF is illegible, Firefox’s great at zooming into
SVG files:
dot –O –Tsvg 0000-step-node-sub-graph.dot
Tips & Tricks
0000-step-node-sub-graph.dot
Copyright©2015TransparencyRightsManagement.Allrightsreserved
47
Tips & Tricks
0000-step-node-sub-graph.dot
This is how TEZ names our stuff !
Copyright©2015TransparencyRightsManagement.Allrightsreserved
48
MR
– One flow, many (MANY)
independent steps
– One or more operators
per step
– Step-to-step
communications involve
disk (HDFS)
– Each step is independent
as far as MR is
concerned
– Step scheduling managed
from outside the
cluster, by Cascading
TEZ
– One flow, one DAG. A DAG
includes several nodes.
– One or more operators
per node
– Node-to-Node
communications managed
by TEZ. Memory, direct
network or disk as
necessary
– YARN sees one
« Application » per flow
– Node scheduling managed
by TEZ DAG AppMaster
Tips & Tricks
Major differences between how a cascading job gets
mapped to MR and to TEZ:
Copyright©2015TransparencyRightsManagement.Allrightsreserved
49
Tips & Tricks
yarn-swimlanes.sh
• A tool included in the tez source
distribution, in tez-tools/swimlanes (bash
+ python)
• Requires YARN ATS to work
« yarn logs –applicationId application_1345431315_1511 » must work
• Reports, in a GANTT chart, the per-
container occupation
Copyright©2015TransparencyRightsManagement.Allrightsreserved
50
Tips & Tricks
yarn-swimlanes.sh (2)
application_1435150225179_0474.svg
Copyright©2015TransparencyRightsManagement.Allrightsreserved
51
Tips & Tricks
yarn-swimlanes.sh (3)
time
containers
Copyright©2015TransparencyRightsManagement.Allrightsreserved
52
Tips & Tricks
Consider using .forceToDisk to ensure work is
balanced within the DAG
890 seconds
160 seconds
Copyright©2015TransparencyRightsManagement.Allrightsreserved
53
Tips & Tricks
Consider using .forceToDisk to ensure work is
balanced within the DAG
890 seconds 160 seconds
Copyright©2015TransparencyRightsManagement.Allrightsreserved
54
• .forceToDisk really means « don’t merge
those two TEZ nodes » which implies
« manage appropriate data transmission
between these two nodes »
• TextFile & other FixedPathSource friends
don’t seem to automatically spread out
work as well as they used to (huh?)
• YMMV, WIP.
Tips & Tricks
• Consider using .forceToDisk to ensure work is
balanced within the DAG
Copyright©2015TransparencyRightsManagement.Allrightsreserved
55
PERFORMANCE
Copyright©2015TransparencyRightsManagement.Allrightsreserved
56
Performance
MR vs TEZ
Copyright©2015TransparencyRightsManagement.Allrightsreserved
57
Performance
MR vs TEZ; to scale
Copyright©2015TransparencyRightsManagement.Allrightsreserved
58
Performance
MR vs TEZ; TO SCALE!!!
MR run time:
14:22 (wall)
12:49 (cluster time)
5:43:26 (total CPU)
TEZ run time:
4:03(wall)
2:50(cluster time)
1:25:35 (total CPU)
Copyright©2015TransparencyRightsManagement.Allrightsreserved
59
CONCLUSION
Copyright©2015TransparencyRightsManagement.Allrightsreserved
60
Apache Tez enables very significant
performance gains compared to traditional
MAPREDUCE applications, on the same cluster
and alongside the legacy.
The new Tez back-end built by Concurrent,
enables these exciting performance gains for
existing Cascading and Scalding
applications.
Taking advantage of these performance gains
should become as easy as upgrading and
Conclusion
Next Steps…
Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop 2
More about Concurrent & Hortonworks
http://hortonworks.com/partner/concurrent
More about Transparency Rights Management
http://www.transparencyrights.com/
Contact us: events@hortonworks.com
Page62 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Q&A

Weitere ähnliche Inhalte

Was ist angesagt?

Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Hortonworks
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Hortonworks
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun ConnollyHortonworks
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksHortonworks
 
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Hortonworks
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNHortonworks
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageHortonworks
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextHortonworks
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
 
Bigger Data For Your Budget
Bigger Data For Your BudgetBigger Data For Your Budget
Bigger Data For Your BudgetHortonworks
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalHortonworks
 
Introduction to Hortonworks Data Platform
Introduction to Hortonworks Data PlatformIntroduction to Hortonworks Data Platform
Introduction to Hortonworks Data PlatformHortonworks
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudHortonworks
 
Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?Hortonworks
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseHortonworks
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Hortonworks
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopHortonworks
 

Was ist angesagt? (19)

Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
 
State of the Union with Shaun Connolly
State of the Union with Shaun ConnollyState of the Union with Shaun Connolly
State of the Union with Shaun Connolly
 
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and HortonworksPowering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
Powering Fast Data and the Hadoop Ecosystem with VoltDB and Hortonworks
 
Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'Don't Let Security Be The 'Elephant in the Room'
Don't Let Security Be The 'Elephant in the Room'
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
 
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.nextDiscover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
 
Data Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop ImplementationData Lake for the Cloud: Extending your Hadoop Implementation
Data Lake for the Cloud: Extending your Hadoop Implementation
 
Bigger Data For Your Budget
Bigger Data For Your BudgetBigger Data For Your Budget
Bigger Data For Your Budget
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
Introduction to Hortonworks Data Platform
Introduction to Hortonworks Data PlatformIntroduction to Hortonworks Data Platform
Introduction to Hortonworks Data Platform
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Apache Hadoop on the Open Cloud
Apache Hadoop on the Open CloudApache Hadoop on the Open Cloud
Apache Hadoop on the Open Cloud
 
Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?Your Self-Driving Car - How Did it Get So Smart?
Your Self-Driving Car - How Did it Get So Smart?
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache HadoopEnrich a 360-degree Customer View with Splunk and Apache Hadoop
Enrich a 360-degree Customer View with Splunk and Apache Hadoop
 

Andere mochten auch

Internet of Things by innocent chukwunonyerem website solution developer afrihub
Internet of Things by innocent chukwunonyerem website solution developer afrihubInternet of Things by innocent chukwunonyerem website solution developer afrihub
Internet of Things by innocent chukwunonyerem website solution developer afrihubJOHN INNOCENT
 
MAL ASSUMIU O MANDATO, WADIH DAMOUS DÁ INÍCIO À PERSEGUIÇÃO POLÍTICA
MAL ASSUMIU O MANDATO, WADIH DAMOUS DÁ INÍCIO À PERSEGUIÇÃO POLÍTICAMAL ASSUMIU O MANDATO, WADIH DAMOUS DÁ INÍCIO À PERSEGUIÇÃO POLÍTICA
MAL ASSUMIU O MANDATO, WADIH DAMOUS DÁ INÍCIO À PERSEGUIÇÃO POLÍTICARicardo Fonseca
 
Mail Enhancement
Mail EnhancementMail Enhancement
Mail EnhancementBobbi White
 
台中中心 104年第一場跨單位資源聯繫會議報名簡章(網頁)
台中中心 104年第一場跨單位資源聯繫會議報名簡章(網頁)台中中心 104年第一場跨單位資源聯繫會議報名簡章(網頁)
台中中心 104年第一場跨單位資源聯繫會議報名簡章(網頁)藍 藍
 
CS Konaco Sadska
CS Konaco SadskaCS Konaco Sadska
CS Konaco Sadskapavelborek
 
Top 8 display designer resume samples
Top 8 display designer resume samplesTop 8 display designer resume samples
Top 8 display designer resume samplesBryanAdams789
 
Air conditioning Services Miami
Air conditioning Services MiamiAir conditioning Services Miami
Air conditioning Services Miamicomfactorac
 
Top 8 general manager of a hotel resume samples
Top 8 general manager of a hotel resume samplesTop 8 general manager of a hotel resume samples
Top 8 general manager of a hotel resume samplesKimHeechul999
 
Top 8 lead systems engineer resume samples
Top 8 lead systems engineer resume samplesTop 8 lead systems engineer resume samples
Top 8 lead systems engineer resume samplesMichaelLearns012
 
White Genocide In South Africa - Here Are The Names
White Genocide In South Africa - Here Are The NamesWhite Genocide In South Africa - Here Are The Names
White Genocide In South Africa - Here Are The Namesupsetcapture9458
 

Andere mochten auch (16)

Internet of Things by innocent chukwunonyerem website solution developer afrihub
Internet of Things by innocent chukwunonyerem website solution developer afrihubInternet of Things by innocent chukwunonyerem website solution developer afrihub
Internet of Things by innocent chukwunonyerem website solution developer afrihub
 
RESORTES
RESORTESRESORTES
RESORTES
 
MAL ASSUMIU O MANDATO, WADIH DAMOUS DÁ INÍCIO À PERSEGUIÇÃO POLÍTICA
MAL ASSUMIU O MANDATO, WADIH DAMOUS DÁ INÍCIO À PERSEGUIÇÃO POLÍTICAMAL ASSUMIU O MANDATO, WADIH DAMOUS DÁ INÍCIO À PERSEGUIÇÃO POLÍTICA
MAL ASSUMIU O MANDATO, WADIH DAMOUS DÁ INÍCIO À PERSEGUIÇÃO POLÍTICA
 
Mail Enhancement
Mail EnhancementMail Enhancement
Mail Enhancement
 
台中中心 104年第一場跨單位資源聯繫會議報名簡章(網頁)
台中中心 104年第一場跨單位資源聯繫會議報名簡章(網頁)台中中心 104年第一場跨單位資源聯繫會議報名簡章(網頁)
台中中心 104年第一場跨單位資源聯繫會議報名簡章(網頁)
 
shiv resume[2]
shiv resume[2]shiv resume[2]
shiv resume[2]
 
CS Konaco Sadska
CS Konaco SadskaCS Konaco Sadska
CS Konaco Sadska
 
Lina
LinaLina
Lina
 
Top 8 display designer resume samples
Top 8 display designer resume samplesTop 8 display designer resume samples
Top 8 display designer resume samples
 
TOP TV
TOP TVTOP TV
TOP TV
 
Untitled Presentation
Untitled PresentationUntitled Presentation
Untitled Presentation
 
Air conditioning Services Miami
Air conditioning Services MiamiAir conditioning Services Miami
Air conditioning Services Miami
 
Top 8 general manager of a hotel resume samples
Top 8 general manager of a hotel resume samplesTop 8 general manager of a hotel resume samples
Top 8 general manager of a hotel resume samples
 
CV
CVCV
CV
 
Top 8 lead systems engineer resume samples
Top 8 lead systems engineer resume samplesTop 8 lead systems engineer resume samples
Top 8 lead systems engineer resume samples
 
White Genocide In South Africa - Here Are The Names
White Genocide In South Africa - Here Are The NamesWhite Genocide In South Africa - Here Are The Names
White Genocide In South Africa - Here Are The Names
 

Ähnlich wie Boost Performance with Scala on Hadoop

A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformEMC
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Joan Novino
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Barijaxconf
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyInside Analysis
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emcTaldor Group
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldCA Technologies
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopHortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopMats Johansson
 
Hadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataHadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataSenturus
 

Ähnlich wie Boost Performance with Scala on Hadoop (20)

A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016Azure Cafe Marketplace with Hortonworks March 31 2016
Azure Cafe Marketplace with Hortonworks March 31 2016
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emc
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 
Hortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with HadoopHortonworks & Bilot Data Driven Transformations with Hadoop
Hortonworks & Bilot Data Driven Transformations with Hadoop
 
Hadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big DataHadoop and the Future of SQL: Using BI Tools with Big Data
Hadoop and the Future of SQL: Using BI Tools with Big Data
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 

Kürzlich hochgeladen

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 

Kürzlich hochgeladen (20)

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 

Boost Performance with Scala on Hadoop

Hinweis der Redaktion

  1. Hortonworks has a singular focus - enabling Apache Hadoop as an enterprise data platform for any app and any data type We were founded in 2011 by 24 developers from Yahoo where Hadoop was conceived to address data challenges at internet scale. What we now know of as Hadoop really started in 2005, when a team at Yahoo was directed to build out a large-scale data storage and processing technology that would allow them to improve their most critical application, Search. Their challenge was essentially two-fold. First they needed to capture and archive the contents of the internet, and then process the data so that users could search through it effectively an efficiently. Clearly traditional approaches were both technically (due to the size of the data) and commercially (due to the cost) impractical. The result was the Apache Hadoop project that delivered large scale storage (HDFS) and processing (MapReduce). Today we are over 600 employees and have partnered with over 900 companies who are the leaders in the data center We have also been very fortunate to achieve very significant customer adoption with over 230 customers as of Q3 2014, spanning nearly every vertical.   Hortonworks was founded the sole intent to make Hadoop an enterprise data platform. With YARN as its foundation, HDP delivers a centralized architecture with true multi-tenancy for data-processing and shared services for Security, Governance and Operations to satisfy enterprise requirements, all deeply integrated and certified with leading datacenter technologies. We are uniquely focused on this transformation of Hadoop and doing our work completely in open source. This is all predicated on our leadership in the community, which enables not only to best support users of but also provides uniquely present customer requirements within this open, thriving community.      
  2. Before we dive into Hadoop and its role within the modern data architecture, let’s set the context for why Hadoop has become important. Existing approaches for data management have become both technically and commercially impractical. Technically - these systems were never designed to store or process vast quantities of data Commercially – the licensing structures with the traditonal approach are no longer feasible. These two challenges combined with rate at which data is being produce predicated a need for a new approach to data systems. If we fast-forward another 3 to 5 years, more than half of the data under management within the enterprise will be from these new data sources.
  3. Enter Hadoop. Faced with this challenge the team at yahoo conceived and created apache hadoop to address the challenge. They then were convinced that contribution of this platform into an open community would speed innovation. They open sourced the technology and did so within the governance of the Apache Software Foundation. (ASF) This introduced two distinct significant advantages. Not only could they manage new data types at scale but the now had a commercially feasible approach. However, there will still significant challenges. The first generation of Hadoop was: - designed and optimized for Batch only workloads, - it required dedicated clusters for each application, and, - it didn’t integrate easily with many of the existing technologies present in the data center. Also, like any emerging technology, Hadoop was required to meet a certain level of readiness required by the enterprise. After running Hadoop at scale at yahoo, the team spun out to form Hortonworks with the intent to address these challenges and make Hadoop enterprise ready.
  4. In 2011, Hortonworks was founded with the 24 original Hadoop architects and engineers from Yahoo! This original team had been working on a technology called YARN (Yet Another Resource Negotiator) that enable multiple applications to have access to all your enterprise data through an efficient centralized platform. It is the data operating system for hadoop that provides the versatility to handle any application and dataset no matter the size or type. Moreover, YARN provided the centralized architecture around which the critical enterprise services of Security, Operations, and Governance could be centrally addressed and integrate with existing enterprise policies. This work allowed for a new approach to data to emerge, the modern data architecture. At the heart of this approach is the capability for Hadoop to unify data and processing in an efficient data platform
  5. Meet Jane. Jane loves music. And Jane’s favourite music video platform has all the music Jane loves. So Jane listens to music from the Platform.
  6. After october 2013: went on different things, the topic was left in storage for a while September 2014: new model, same concept; built on plain Cascading to simplify some of the hairiest SQL logic (Optiq lacks(ed) analytic functions, so the pretty much single SQL statement from SQL Server days had to be exploded into the 12 stages) Met guys from Lausanne at the end of September. Was already curious about Scala / Scalding then, decided to spend two days to give it a spin. Never turned back !
  7. TEZ 0.6.2-SNAPSHOT is required, as Warning: TEZ 0.7 runtime is not API-compatible with 0.6 (altough the source-level API is quite close). Cascading might change the Tez dependency from time to time…
  8. The typical Hadoop+Tez stacks pulls in a Jetty, a Tomcat, a Jersey, multiple guavas, and the kitchen sink.
  9. We believe our workload requires 270-ish MiB of native memory. When we have time, we’ll either power down for extra sticks of RAM, or attempt to shave 20 MiB of heap per TezChild.
  10. (reportedly)
  11. Hash joins means hash joins, but also .filter/mapWithValue, joinWithTiny, etc.
  12. Hash joins means hash joins, but also .filter/mapWithValue, joinWithTiny, etc.
  13. Who wants to see another « Word Count » ?
  14. Who wants to see another « Word Count » ?
  15. Who wants to see another « Word Count » ?
  16. I’m not going to look into that, fairly standard code except where I’ve been naïve. You get the idea.