SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Why was it so important to us
To open the MapReduce framework
12/11/2013

Syncsort Confidential and Proprietary - do not copy or distribute
Agenda
Who are we ?
What did we do ?
Why did we do that ?
With whom did we do it with?

For which results ?

Syncsort Confidential and Proprietary - do not copy or distribute

2
Agenda
Who are we ?
What did we do ?
Why did we do that ?
With whom did we do it with?

For which results ?

Syncsort Confidential and Proprietary - do not copy or distribute

3
Syncsort
For 40 years we have been helping companies solve their big data
issues…even before they knew the name Big Data!
Integrating Big Data…
Smarter!

Our customers are achieving the
impossible, every day!

• 50% of all mainframes run Syncsort
• 1,500 Mainframe Customers: Most
used & trusted 3rd party mainframe
software
• Speed leader for ETL & Sort
• A history of innovation
• 25+ Issued & Pending Patents

• Large global customer base
• 15,000+ deployments in 68 countries

• First-to-market, fully integrated
approach to Hadoop ETL
Syncsort Confidential and Proprietary - do not copy or distribute

Key Partners

4
Agenda
Who are we ?
What did we do ?
Why did we do that ?
With whom did we do it with?

For which results ?

Syncsort Confidential and Proprietary - do not copy or distribute

5
Smart Contributions to Improve Hadoop
Augmenting Critical Batch
Processing Capabilities

JIRA Description
4807

Allow MapOutputBuffer to be pluggable

4808

Allow Reduce-side merge to be pluggable

4809

Make classes required for 2454 public

4812

Create reduce input merger plug-in

4842

Shuffle race can hang reducer

2461

HDFS file name globbing in libhdfs

4482

Backport of 2454 to MapReduce 1 & 1.2

Plugin Shipping on CDH 4.2 and later
Syncsort Confidential and Proprietary - do not copy or distribute

6
Opening the MapReduce Framework
Here and here to replace MapReduce native sort

Mapper

Output
Sorter

Here to perform functional
logic on our engine

Syncsort Confidential and Proprietary - do not copy or distribute

Shuffle

Input
Sorter

Reducer

Here to perform functional
logic on our engine

7
Agenda
Who are we ?
What did we do ?
Why did we do that ?
With whom did we do it with?

For which results ?

Syncsort Confidential and Proprietary - do not copy or distribute

8
Syncsort: Just integrating data … faster

 A simple DI engine easy to
deploy, operate, and
administer
 ETL like development GUI
 Auto-tuning
 Best patented algorithms
Sort

Join

Aggregate

Copy

+

Syncsort Confidential and Proprietary - do not copy or distribute

Merge

 Fast, fast, faster than any
other
 The more data the better
9
From Data to Big Data
$$$

Variety
Velocity

Quarterly

Monthly

Weekly

Daily

Intra-day

$$$

Right / Real-time

$$$

Volume
Mainframe

PC

Internet Revolution

Mobile & Social Media Revolution

70s

60s

80s

Syncsort Confidential and Proprietary - do not copy or distribute

90s

2000s

2010s

Next?
10
Smart Architecture
Hadoop Integration… for Real
(No Code Generation. No Compiling. No Bolts. No Nuts!)

 Runs natively within MapReduce
 Small footprint installs on every node
 Open source contributions extend
capabilities of MapReduce
Hadoop Cluster

Unleash Hadoop’s Potential

Syncsort Confidential and Proprietary - do not copy or distribute

Pluggable sort
Expanded use cases (i.e. “No sort” option)
Vertical scalability
Design flexibility (MapMapReduceReduce)

No need to worry
about this…

11
Agenda
Who are we ?
What did we do ?
Why did we do that ?
With whom did we do it with?

For which results ?

Syncsort Confidential and Proprietary - do not copy or distribute

12
Cloudera + Syncsort: Smarter Connectivity… Also for Mainframe
Because Mainframe Is Big Data Too!

Connect

• Read files directly from mainframe
• No software required on mainframe
• Already installed on 50% of mainframes

Translate

• Parse & transform: packed
decimal, EBCDIC/ASCII, multi-format
• No coding required

Load & • Load directly to HDFS
• Offload batch data processing
Process • Find more insights
Syncsort Confidential and Proprietary - do not copy or distribute

13
Syncsort DMX-h + Cloudera Manager
Cloudera Manager

CDH Cluster + ISV software

Support Integration
Monitoring

Syncsort
DMX-h

A
P
I

Management

Installation

CDH Nodes

Syncsort Confidential and Proprietary - do not copy or distribute

DMX-h on every CDH node

14
Agenda
Who are we ?
What did we do ?
Why did we do that ?
With whom did we do it with?

For which results ?

Syncsort Confidential and Proprietary - do not copy or distribute

15
Test cases
Sort Acceleration
– Terasort
• Run terasort with DMX-h and without DMX-h in various configurations to
compare performance.

ETL
– Use DMX-h to perform several different ETL jobs and compare against
equivalent jobs in Pig (Apache Pig version 0.9.2-gphd-1.2.0.0).
• File Change Data Capture (CDC)
• Web Log Aggregation

Syncsort Confidential and Proprietary - do not copy or distribute

16
File CDC
DMX-h

Pig

Java

149
Lines of Code
Syncsort Confidential and Proprietary - do not copy or distribute

70
Lines of Code
Web Log Aggregation
DMX-h

Pig

Java

94
Lines of Code
Syncsort Confidential and Proprietary - do not copy or distribute

48
Lines of Code
Cluster Configuration – DMX-h Ran on 763 Nodes!
Cluster Specs:
– 763 node cluster
•
•
•
•

1 node – job tracker
1 node - name node
1 node – secondary name node
760 data and task nodes

Hadoop cluster configuration changes (from
defaults):
– 128 MB HDFS Block size (file.blocksize)
– 1.5 GB map/ 4GB reduce task JVM
memory (mapred.child.java.opts)
– Maximum 22 map tasks and 4 reduce
tasks per node
(mapred.tasktracker.map.tasks.maximu
m&
mapred.tasktracker.reduce.tasks.maximu
m)

Syncsort Confidential and Proprietary - do not copy or distribute

Cluster Node Specs:
– 12 cores - Dual Intel Westmere (Hexcore) CPUs, 2.93 GHz, 12 MB Cache
– 48GB DDR3 RDIMM Memory
– 12 x 2TB 3.5” drives Seagate 7200rpm.
– Disk 0 + Disk 1 are RAID1 (mirrored)
for OS.
• 100 MB/Sec write
• 115 MB/Sec read

– 10 single disk JBOD
– Mellanox ConnectX®-3 VPI NIC
(Supported data rates 40GbE;10GbE)
– RHEL 6.1 64-bit
– Java 1.6 (jdk.x86_64-2000:1.6.0_29fcs)

19
Sort Acceleration - Terasort

Use Case

TERASORT

TERASORT

TERASORT

TERASORT

TERASORT

TERASORT

TERASORT

Native/A
Mem
ETL or
lternativ
Elapsed
ory
Sort
e
DMX-h Time Native/Alterna
DMX-h Impro Native/Alter
Accele Alterna Data Size Elapsed Elapsed Improv tive Memory
Physical veme native CPU
ration tive
(GB)
time
Time ement
(GB)
Memory (GB) nt
Time
Sort
Accele
ration Native
512
0:01:47 0:01:45
2%
12,863
12,873
0%
114,297
Sort
Accele
ration Native 1,024 0:02:29 0:01:11 52%
14,512
14,522
0%
194,896
Sort
Accele
ration Native 1,536 0:04:02 0:01:23 66%
14,684
14,694
0%
287,055
Sort
Accele
ration Native 4,096 0:03:31 0:02:29 29%
31,520
31,549
0%
927,379
Sort
Accele
ration Native 10,242 0:08:51 0:05:14 41%
47,935
47,951
0% 2,835,927
Sort
Accele
ration Native 20,484 0:14:55 0:12:28 16%
106,153
105,239
1% 6,112,296
Sort
Accele
ration Native 102,400 1:12:12 0:51:59 28%
387,262
387,211
0% 30,436,624

Syncsort Confidential and Proprietary - do not copy or distribute

Native/
CPU Alterna
Impro tive DMX-h
DMX-h CPU veme MB/SecMB/Sec
Time
nt /Node /Node

62,491

45%

6.5

6.6

98,972

49%

9.3

19.4

143,759

50%

8.6

25.0

380,442

59%

26.2

37.0

1,460,101

49%

26.4

44.6

3,696,727

40%

31.0

37.4

16,589,332 45%

32.3

44.9
20
File CDC

Native/
ETL or
Native/Alt
Elapse
Memor
Alterna DMXSort
Data ernative DMX-h d Time Native/Altern
DMX-h
y Native/Alt
CPU tive
h
AccelerAlterna Size Elapsed Elapsed Improv ative Memory Physical Improv ernative DMX-h Improv MB/Se MB/Se
Use Case ation tive (GB)
time
Time ement
(GB)
Memory (GB) ement CPU Time CPU Time ement c/Nodec/Node
FileCDC

ETL

Pig

148

0:05:31

0:01:33

72%

79,876

79,559

0%

79,876

79,559

0%

0.6

2.2

FileCDC

ETL

Pig

450

0:05:11

0:01:58

62%

243,834

182,869

25%

243,834

182,869

25%

1.9

5.3

FileCDC

ETL

Pig

1,515

0:07:49

0:03:44

52%

845,263

557,226

34%

845,263

557,226

34%

4.4

9.4

Syncsort Confidential and Proprietary - do not copy or distribute

21
Web Log Aggregation

Use Case
WebLogAggregation Split Size & fixes
WebLogAggregation Split Size & fixes
WebLogAggregation Split Size & fixes
WebLogAggregation Split Size & fixes

Data Native/Alter
Altern Size
native
ative (GB) Elapsed time

DMX-h
Elapsed
Time

Native/A
Elapsed
lternativ
Time
Memory Native/Alter
CPU
e
DMX-h
Improve Native/Alternativ DMX-h Physical Improve native CPU DMX-h CPU Improve MB/Sec/ MB/Sec/
ment e Memory (GB)
Memory (GB)
ment
Time
Time
ment
Node
Node

Pig

2,067

0:01:12

0:00:58

19%

13,499

7,813

42%

145,972

56,496

61%

40.1

49.8

Pig

4,135

0:01:42

0:01:23

19%

18,003

15,579

13%

300,627

152,390

49%

56.1

69.6

Pig

10,240

0:05:16

0:02:04

61%

40,773

39,091

4%

807,473

335,537

58%

45.3

115.4

Pig

20,480

0:07:54

0:06:58

12%

78,654

78,128

1%

1,339,453

568,107

58%

60.4

68.4

Syncsort Confidential and Proprietary - do not copy or distribute

22
Test Drive DMX-h:
Bridge the Gap Between
Big Iron & Big Data!
• Self-contained image
• Use case accelerators for
• mainframe, Hadoop and more!

Running on CDH
A Smarter Approach…

(

+

)

www.syncsort.com/try
…and Quite Possibly The Only Approach!
23

Weitere ähnliche Inhalte

Was ist angesagt?

IMS01 IMS Keynote
IMS01   IMS KeynoteIMS01   IMS Keynote
IMS01 IMS KeynoteRobert Hain
 
IBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
IBM World of Watson 2016 - DB2 Analytics Accelerator on CloudIBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
IBM World of Watson 2016 - DB2 Analytics Accelerator on CloudDaniel Martin
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Precisely
 
DB2 Real-Time Analytics Meeting Wayne, PA 2015 - IDAA & DB2 Tools Update
DB2 Real-Time Analytics Meeting Wayne, PA 2015 - IDAA & DB2 Tools UpdateDB2 Real-Time Analytics Meeting Wayne, PA 2015 - IDAA & DB2 Tools Update
DB2 Real-Time Analytics Meeting Wayne, PA 2015 - IDAA & DB2 Tools UpdateBaha Majid
 
Open Innovation with Power Systems
Open Innovation with Power Systems Open Innovation with Power Systems
Open Innovation with Power Systems IBM Power Systems
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics systemModusOptimum
 
Understanding the IBM Power Systems Advantage
Understanding the IBM Power Systems AdvantageUnderstanding the IBM Power Systems Advantage
Understanding the IBM Power Systems AdvantageIBM Power Systems
 
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics AcceleratorEDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics AcceleratorDaniel Martin
 
Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsCloudera, Inc.
 
Migration DB2 to EDB - Project Experience
 Migration DB2 to EDB - Project Experience Migration DB2 to EDB - Project Experience
Migration DB2 to EDB - Project ExperienceEDB
 
IBM Power Systems Announcement Update
IBM Power Systems Announcement UpdateIBM Power Systems Announcement Update
IBM Power Systems Announcement UpdateDavid Spurway
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Cost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop ImplementationCost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop ImplementationDataWorks Summit
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseGwen (Chen) Shapira
 
Achieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureAchieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureUtkarsh Pandey
 
A scalable server environment for your applications
A scalable server environment for your applicationsA scalable server environment for your applications
A scalable server environment for your applicationsGigaSpaces
 
The Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseThe Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseAltibase
 
Greenplum Database Overview
Greenplum Database Overview Greenplum Database Overview
Greenplum Database Overview EMC
 

Was ist angesagt? (20)

IMS01 IMS Keynote
IMS01   IMS KeynoteIMS01   IMS Keynote
IMS01 IMS Keynote
 
IBM Power8 announce
IBM Power8 announceIBM Power8 announce
IBM Power8 announce
 
IBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
IBM World of Watson 2016 - DB2 Analytics Accelerator on CloudIBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
IBM World of Watson 2016 - DB2 Analytics Accelerator on Cloud
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
DB2 Real-Time Analytics Meeting Wayne, PA 2015 - IDAA & DB2 Tools Update
DB2 Real-Time Analytics Meeting Wayne, PA 2015 - IDAA & DB2 Tools UpdateDB2 Real-Time Analytics Meeting Wayne, PA 2015 - IDAA & DB2 Tools Update
DB2 Real-Time Analytics Meeting Wayne, PA 2015 - IDAA & DB2 Tools Update
 
Open Innovation with Power Systems
Open Innovation with Power Systems Open Innovation with Power Systems
Open Innovation with Power Systems
 
Ibm integrated analytics system
Ibm integrated analytics systemIbm integrated analytics system
Ibm integrated analytics system
 
Understanding the IBM Power Systems Advantage
Understanding the IBM Power Systems AdvantageUnderstanding the IBM Power Systems Advantage
Understanding the IBM Power Systems Advantage
 
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics AcceleratorEDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
EDBT 2013 - Near Realtime Analytics with IBM DB2 Analytics Accelerator
 
Datacenter 2014: HP - Brian Andersen
Datacenter 2014: HP - Brian AndersenDatacenter 2014: HP - Brian Andersen
Datacenter 2014: HP - Brian Andersen
 
Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data Applications
 
Migration DB2 to EDB - Project Experience
 Migration DB2 to EDB - Project Experience Migration DB2 to EDB - Project Experience
Migration DB2 to EDB - Project Experience
 
IBM Power Systems Announcement Update
IBM Power Systems Announcement UpdateIBM Power Systems Announcement Update
IBM Power Systems Announcement Update
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Cost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop ImplementationCost of Ownership for Hadoop Implementation
Cost of Ownership for Hadoop Implementation
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle Database
 
Achieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azureAchieving cloud scale with microservices based applications on azure
Achieving cloud scale with microservices based applications on azure
 
A scalable server environment for your applications
A scalable server environment for your applicationsA scalable server environment for your applications
A scalable server environment for your applications
 
The Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseThe Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- Altibase
 
Greenplum Database Overview
Greenplum Database Overview Greenplum Database Overview
Greenplum Database Overview
 

Ähnlich wie Why Hadoop is important to Syncsort

Steve Totman Syncsort Big Data Warehousing hug 23 sept Final
Steve Totman Syncsort Big Data Warehousing hug 23 sept FinalSteve Totman Syncsort Big Data Warehousing hug 23 sept Final
Steve Totman Syncsort Big Data Warehousing hug 23 sept FinalSteven Totman
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015Daniela Zuppini
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Etu Solution
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceData Works MD
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreModern Data Stack France
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Sparktsliwowicz
 
Keeping Data in Sync with Syncsort
Keeping Data in Sync with SyncsortKeeping Data in Sync with Syncsort
Keeping Data in Sync with SyncsortPrecisely
 
Keeping your application’s latency SLAs no matter what
Keeping your application’s latency SLAs no matter whatKeeping your application’s latency SLAs no matter what
Keeping your application’s latency SLAs no matter whatScyllaDB
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)tsliwowicz
 
Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort ...
Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort ...Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort ...
Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort ...Steven Totman
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics PlatformSantanu Dey
 
OpenDrives_-_Product_Sheet_v13D (2) (1)
OpenDrives_-_Product_Sheet_v13D (2) (1)OpenDrives_-_Product_Sheet_v13D (2) (1)
OpenDrives_-_Product_Sheet_v13D (2) (1)Scott Eiser
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deckKeithETD_CTO
 
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...Precisely
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keownCisco Canada
 

Ähnlich wie Why Hadoop is important to Syncsort (20)

Steve Totman Syncsort Big Data Warehousing hug 23 sept Final
Steve Totman Syncsort Big Data Warehousing hug 23 sept FinalSteve Totman Syncsort Big Data Warehousing hug 23 sept Final
Steve Totman Syncsort Big Data Warehousing hug 23 sept Final
 
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
 
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Keeping Data in Sync with Syncsort
Keeping Data in Sync with SyncsortKeeping Data in Sync with Syncsort
Keeping Data in Sync with Syncsort
 
Keeping your application’s latency SLAs no matter what
Keeping your application’s latency SLAs no matter whatKeeping your application’s latency SLAs no matter what
Keeping your application’s latency SLAs no matter what
 
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)Taboola's experience with Apache Spark (presentation @ Reversim 2014)
Taboola's experience with Apache Spark (presentation @ Reversim 2014)
 
Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort ...
Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort ...Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort ...
Hadoop Summit Amsterdam 2013 - Making Hadoop Ready for Prime Time - Syncsort ...
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
OpenDrives_-_Product_Sheet_v13D (2) (1)
OpenDrives_-_Product_Sheet_v13D (2) (1)OpenDrives_-_Product_Sheet_v13D (2) (1)
OpenDrives_-_Product_Sheet_v13D (2) (1)
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big Data
 
EMC Isilon Database Converged deck
EMC Isilon Database Converged deckEMC Isilon Database Converged deck
EMC Isilon Database Converged deck
 
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...
Big Data Q2 Customer Education Webcast: New DMX Change Data Capture for Hadoo...
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keown
 

Mehr von huguk

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifactahuguk
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introhuguk
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...huguk
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watsonhuguk
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink huguk
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...huguk
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitchinghuguk
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoringhuguk
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startuphuguk
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapulthuguk
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysishuguk
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analyticshuguk
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Socialhuguk
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligencehuguk
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...huguk
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 

Mehr von huguk (20)

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 

Kürzlich hochgeladen

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Kürzlich hochgeladen (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Why Hadoop is important to Syncsort

  • 1. Why was it so important to us To open the MapReduce framework 12/11/2013 Syncsort Confidential and Proprietary - do not copy or distribute
  • 2. Agenda Who are we ? What did we do ? Why did we do that ? With whom did we do it with? For which results ? Syncsort Confidential and Proprietary - do not copy or distribute 2
  • 3. Agenda Who are we ? What did we do ? Why did we do that ? With whom did we do it with? For which results ? Syncsort Confidential and Proprietary - do not copy or distribute 3
  • 4. Syncsort For 40 years we have been helping companies solve their big data issues…even before they knew the name Big Data! Integrating Big Data… Smarter! Our customers are achieving the impossible, every day! • 50% of all mainframes run Syncsort • 1,500 Mainframe Customers: Most used & trusted 3rd party mainframe software • Speed leader for ETL & Sort • A history of innovation • 25+ Issued & Pending Patents • Large global customer base • 15,000+ deployments in 68 countries • First-to-market, fully integrated approach to Hadoop ETL Syncsort Confidential and Proprietary - do not copy or distribute Key Partners 4
  • 5. Agenda Who are we ? What did we do ? Why did we do that ? With whom did we do it with? For which results ? Syncsort Confidential and Proprietary - do not copy or distribute 5
  • 6. Smart Contributions to Improve Hadoop Augmenting Critical Batch Processing Capabilities JIRA Description 4807 Allow MapOutputBuffer to be pluggable 4808 Allow Reduce-side merge to be pluggable 4809 Make classes required for 2454 public 4812 Create reduce input merger plug-in 4842 Shuffle race can hang reducer 2461 HDFS file name globbing in libhdfs 4482 Backport of 2454 to MapReduce 1 & 1.2 Plugin Shipping on CDH 4.2 and later Syncsort Confidential and Proprietary - do not copy or distribute 6
  • 7. Opening the MapReduce Framework Here and here to replace MapReduce native sort Mapper Output Sorter Here to perform functional logic on our engine Syncsort Confidential and Proprietary - do not copy or distribute Shuffle Input Sorter Reducer Here to perform functional logic on our engine 7
  • 8. Agenda Who are we ? What did we do ? Why did we do that ? With whom did we do it with? For which results ? Syncsort Confidential and Proprietary - do not copy or distribute 8
  • 9. Syncsort: Just integrating data … faster  A simple DI engine easy to deploy, operate, and administer  ETL like development GUI  Auto-tuning  Best patented algorithms Sort Join Aggregate Copy + Syncsort Confidential and Proprietary - do not copy or distribute Merge  Fast, fast, faster than any other  The more data the better 9
  • 10. From Data to Big Data $$$ Variety Velocity Quarterly Monthly Weekly Daily Intra-day $$$ Right / Real-time $$$ Volume Mainframe PC Internet Revolution Mobile & Social Media Revolution 70s 60s 80s Syncsort Confidential and Proprietary - do not copy or distribute 90s 2000s 2010s Next? 10
  • 11. Smart Architecture Hadoop Integration… for Real (No Code Generation. No Compiling. No Bolts. No Nuts!)  Runs natively within MapReduce  Small footprint installs on every node  Open source contributions extend capabilities of MapReduce Hadoop Cluster Unleash Hadoop’s Potential Syncsort Confidential and Proprietary - do not copy or distribute Pluggable sort Expanded use cases (i.e. “No sort” option) Vertical scalability Design flexibility (MapMapReduceReduce) No need to worry about this… 11
  • 12. Agenda Who are we ? What did we do ? Why did we do that ? With whom did we do it with? For which results ? Syncsort Confidential and Proprietary - do not copy or distribute 12
  • 13. Cloudera + Syncsort: Smarter Connectivity… Also for Mainframe Because Mainframe Is Big Data Too! Connect • Read files directly from mainframe • No software required on mainframe • Already installed on 50% of mainframes Translate • Parse & transform: packed decimal, EBCDIC/ASCII, multi-format • No coding required Load & • Load directly to HDFS • Offload batch data processing Process • Find more insights Syncsort Confidential and Proprietary - do not copy or distribute 13
  • 14. Syncsort DMX-h + Cloudera Manager Cloudera Manager CDH Cluster + ISV software Support Integration Monitoring Syncsort DMX-h A P I Management Installation CDH Nodes Syncsort Confidential and Proprietary - do not copy or distribute DMX-h on every CDH node 14
  • 15. Agenda Who are we ? What did we do ? Why did we do that ? With whom did we do it with? For which results ? Syncsort Confidential and Proprietary - do not copy or distribute 15
  • 16. Test cases Sort Acceleration – Terasort • Run terasort with DMX-h and without DMX-h in various configurations to compare performance. ETL – Use DMX-h to perform several different ETL jobs and compare against equivalent jobs in Pig (Apache Pig version 0.9.2-gphd-1.2.0.0). • File Change Data Capture (CDC) • Web Log Aggregation Syncsort Confidential and Proprietary - do not copy or distribute 16
  • 17. File CDC DMX-h Pig Java 149 Lines of Code Syncsort Confidential and Proprietary - do not copy or distribute 70 Lines of Code
  • 18. Web Log Aggregation DMX-h Pig Java 94 Lines of Code Syncsort Confidential and Proprietary - do not copy or distribute 48 Lines of Code
  • 19. Cluster Configuration – DMX-h Ran on 763 Nodes! Cluster Specs: – 763 node cluster • • • • 1 node – job tracker 1 node - name node 1 node – secondary name node 760 data and task nodes Hadoop cluster configuration changes (from defaults): – 128 MB HDFS Block size (file.blocksize) – 1.5 GB map/ 4GB reduce task JVM memory (mapred.child.java.opts) – Maximum 22 map tasks and 4 reduce tasks per node (mapred.tasktracker.map.tasks.maximu m& mapred.tasktracker.reduce.tasks.maximu m) Syncsort Confidential and Proprietary - do not copy or distribute Cluster Node Specs: – 12 cores - Dual Intel Westmere (Hexcore) CPUs, 2.93 GHz, 12 MB Cache – 48GB DDR3 RDIMM Memory – 12 x 2TB 3.5” drives Seagate 7200rpm. – Disk 0 + Disk 1 are RAID1 (mirrored) for OS. • 100 MB/Sec write • 115 MB/Sec read – 10 single disk JBOD – Mellanox ConnectX®-3 VPI NIC (Supported data rates 40GbE;10GbE) – RHEL 6.1 64-bit – Java 1.6 (jdk.x86_64-2000:1.6.0_29fcs) 19
  • 20. Sort Acceleration - Terasort Use Case TERASORT TERASORT TERASORT TERASORT TERASORT TERASORT TERASORT Native/A Mem ETL or lternativ Elapsed ory Sort e DMX-h Time Native/Alterna DMX-h Impro Native/Alter Accele Alterna Data Size Elapsed Elapsed Improv tive Memory Physical veme native CPU ration tive (GB) time Time ement (GB) Memory (GB) nt Time Sort Accele ration Native 512 0:01:47 0:01:45 2% 12,863 12,873 0% 114,297 Sort Accele ration Native 1,024 0:02:29 0:01:11 52% 14,512 14,522 0% 194,896 Sort Accele ration Native 1,536 0:04:02 0:01:23 66% 14,684 14,694 0% 287,055 Sort Accele ration Native 4,096 0:03:31 0:02:29 29% 31,520 31,549 0% 927,379 Sort Accele ration Native 10,242 0:08:51 0:05:14 41% 47,935 47,951 0% 2,835,927 Sort Accele ration Native 20,484 0:14:55 0:12:28 16% 106,153 105,239 1% 6,112,296 Sort Accele ration Native 102,400 1:12:12 0:51:59 28% 387,262 387,211 0% 30,436,624 Syncsort Confidential and Proprietary - do not copy or distribute Native/ CPU Alterna Impro tive DMX-h DMX-h CPU veme MB/SecMB/Sec Time nt /Node /Node 62,491 45% 6.5 6.6 98,972 49% 9.3 19.4 143,759 50% 8.6 25.0 380,442 59% 26.2 37.0 1,460,101 49% 26.4 44.6 3,696,727 40% 31.0 37.4 16,589,332 45% 32.3 44.9 20
  • 21. File CDC Native/ ETL or Native/Alt Elapse Memor Alterna DMXSort Data ernative DMX-h d Time Native/Altern DMX-h y Native/Alt CPU tive h AccelerAlterna Size Elapsed Elapsed Improv ative Memory Physical Improv ernative DMX-h Improv MB/Se MB/Se Use Case ation tive (GB) time Time ement (GB) Memory (GB) ement CPU Time CPU Time ement c/Nodec/Node FileCDC ETL Pig 148 0:05:31 0:01:33 72% 79,876 79,559 0% 79,876 79,559 0% 0.6 2.2 FileCDC ETL Pig 450 0:05:11 0:01:58 62% 243,834 182,869 25% 243,834 182,869 25% 1.9 5.3 FileCDC ETL Pig 1,515 0:07:49 0:03:44 52% 845,263 557,226 34% 845,263 557,226 34% 4.4 9.4 Syncsort Confidential and Proprietary - do not copy or distribute 21
  • 22. Web Log Aggregation Use Case WebLogAggregation Split Size & fixes WebLogAggregation Split Size & fixes WebLogAggregation Split Size & fixes WebLogAggregation Split Size & fixes Data Native/Alter Altern Size native ative (GB) Elapsed time DMX-h Elapsed Time Native/A Elapsed lternativ Time Memory Native/Alter CPU e DMX-h Improve Native/Alternativ DMX-h Physical Improve native CPU DMX-h CPU Improve MB/Sec/ MB/Sec/ ment e Memory (GB) Memory (GB) ment Time Time ment Node Node Pig 2,067 0:01:12 0:00:58 19% 13,499 7,813 42% 145,972 56,496 61% 40.1 49.8 Pig 4,135 0:01:42 0:01:23 19% 18,003 15,579 13% 300,627 152,390 49% 56.1 69.6 Pig 10,240 0:05:16 0:02:04 61% 40,773 39,091 4% 807,473 335,537 58% 45.3 115.4 Pig 20,480 0:07:54 0:06:58 12% 78,654 78,128 1% 1,339,453 568,107 58% 60.4 68.4 Syncsort Confidential and Proprietary - do not copy or distribute 22
  • 23. Test Drive DMX-h: Bridge the Gap Between Big Iron & Big Data! • Self-contained image • Use case accelerators for • mainframe, Hadoop and more! Running on CDH A Smarter Approach… ( + ) www.syncsort.com/try …and Quite Possibly The Only Approach! 23

Hinweis der Redaktion

  1. The ability to process and analyze mainframe data with Hadoop can open up a wealth of opportunities by delivering deeper analytics, at lower cost. Unfortunately, there are no native Hadoop ETL capabilities for mainframe. Simply ingesting mainframe data involves lots of manual effort and coding, plus a combination of mainframe and Hadoop skills that are nearly impossible to find. The Use Case Accelerators for Mainframe Connectivity and Translation combine decades of mainframe expertise with state-of-the-art Hadoop capabilities to provide a painless and seamless approach to leverage mainframe data. Read files directly from the mainframe, parse and transform the data – packed decimal, COMP, EBCDIC/ASCII, multi-format records, and more - without installing any software on the mainframe and without writing any code. SAMPLE EBCDIC data!