Suche senden
Hochladen
Is Spark Replacing Hadoop
•
0 gefällt mir
•
526 views
MapR Technologies
Folgen
Keys at meetup march2016
Weniger lesen
Mehr lesen
Technologie
Diashow-Anzeige
Melden
Teilen
Diashow-Anzeige
Melden
Teilen
1 von 32
Jetzt herunterladen
Downloaden Sie, um offline zu lesen
Empfohlen
Spark Application for Time Series Analysis
Spark Application for Time Series Analysis
MapR Technologies
Hadoop Ecosystem
Hadoop Ecosystem
Lior Sidi
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
Cloudera, Inc.
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Uwe Printz
Apache Spark Overview @ ferret
Apache Spark Overview @ ferret
Andrii Gakhov
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
Senthil Kumar
Apache Hadoop at 10
Apache Hadoop at 10
Cloudera, Inc.
Hadoop overview
Hadoop overview
Siva Pandeti
Empfohlen
Spark Application for Time Series Analysis
Spark Application for Time Series Analysis
MapR Technologies
Hadoop Ecosystem
Hadoop Ecosystem
Lior Sidi
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
Cloudera, Inc.
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Uwe Printz
Apache Spark Overview @ ferret
Apache Spark Overview @ ferret
Andrii Gakhov
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
Senthil Kumar
Apache Hadoop at 10
Apache Hadoop at 10
Cloudera, Inc.
Hadoop overview
Hadoop overview
Siva Pandeti
Big Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
Rajkumar Singh
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal
Apache hadoop introduction and architecture
Apache hadoop introduction and architecture
Harikrishnan K
Introduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
InSemble
Introduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar
Hadoop Ecosystem
Hadoop Ecosystem
Sandip Darwade
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
Debraj GuhaThakurta
Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
Hadoop ecosystem
Hadoop ecosystem
Ran Silberman
Hadoop ecosystem
Hadoop ecosystem
Stanley Wang
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
Geoff Hendrey
Big data and Hadoop
Big data and Hadoop
Rahul Agarwal
Hadoop Presentation - PPT
Hadoop Presentation - PPT
Anand Pandey
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
datastack
Time series database by Harshil Ambagade
Time series database by Harshil Ambagade
Sigmoid
Hadoop project design and a usecase
Hadoop project design and a usecase
sudhakara st
Apache Spark Overview
Apache Spark Overview
Vadim Y. Bichutskiy
Hadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
Cloudera, Inc.
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Asis Mohanty
Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
Mark Kromer
Apache Spark Overview
Apache Spark Overview
Carol McDonald
Weitere ähnliche Inhalte
Was ist angesagt?
Big Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
Rajkumar Singh
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
Rohit Agrawal
Apache hadoop introduction and architecture
Apache hadoop introduction and architecture
Harikrishnan K
Introduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
InSemble
Introduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar
Hadoop Ecosystem
Hadoop Ecosystem
Sandip Darwade
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
Debraj GuhaThakurta
Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
Hadoop ecosystem
Hadoop ecosystem
Ran Silberman
Hadoop ecosystem
Hadoop ecosystem
Stanley Wang
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
Geoff Hendrey
Big data and Hadoop
Big data and Hadoop
Rahul Agarwal
Hadoop Presentation - PPT
Hadoop Presentation - PPT
Anand Pandey
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
datastack
Time series database by Harshil Ambagade
Time series database by Harshil Ambagade
Sigmoid
Hadoop project design and a usecase
Hadoop project design and a usecase
sudhakara st
Apache Spark Overview
Apache Spark Overview
Vadim Y. Bichutskiy
Hadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
Cloudera, Inc.
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Asis Mohanty
Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
sunera pathan
Was ist angesagt?
(20)
Big Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
Apache hadoop introduction and architecture
Apache hadoop introduction and architecture
Introduction To Hadoop Ecosystem
Introduction To Hadoop Ecosystem
Introduction to Hadoop Technology
Introduction to Hadoop Technology
Hadoop Ecosystem
Hadoop Ecosystem
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
Hadoop Overview & Architecture
Hadoop Overview & Architecture
Hadoop ecosystem
Hadoop ecosystem
Hadoop ecosystem
Hadoop ecosystem
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
Big data and Hadoop
Big data and Hadoop
Hadoop Presentation - PPT
Hadoop Presentation - PPT
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
Time series database by Harshil Ambagade
Time series database by Harshil Ambagade
Hadoop project design and a usecase
Hadoop project design and a usecase
Apache Spark Overview
Apache Spark Overview
Hadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
Ähnlich wie Is Spark Replacing Hadoop
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
Mark Kromer
Apache Spark Overview
Apache Spark Overview
Carol McDonald
Apache Spark & Hadoop
Apache Spark & Hadoop
MapR Technologies
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
Carol McDonald
Functional programming for optimization problems in Big Data
Functional programming for optimization problems in Big Data
Paco Nathan
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
Rajan Kanitkar
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
Adam Doyle
Cloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
Geoffrey Fox
Cloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
Geoffrey Fox
Apache spark - History and market overview
Apache spark - History and market overview
Martin Zapletal
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - Spark
Vince Gonzalez
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
MapR Technologies
Putting Apache Drill into Production
Putting Apache Drill into Production
MapR Technologies
An introduction To Apache Spark
An introduction To Apache Spark
Amir Sedighi
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Agile Testing Alliance
Introduction to Spark
Introduction to Spark
Carol McDonald
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Mathieu Dumoulin
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
MapR Technologies
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
Debraj GuhaThakurta
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
Mathieu Dumoulin
Ähnlich wie Is Spark Replacing Hadoop
(20)
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
Apache Spark Overview
Apache Spark Overview
Apache Spark & Hadoop
Apache Spark & Hadoop
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
Functional programming for optimization problems in Big Data
Functional programming for optimization problems in Big Data
Big Data Hoopla Simplified - TDWI Memphis 2014
Big Data Hoopla Simplified - TDWI Memphis 2014
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
Cloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
Apache spark - History and market overview
Apache spark - History and market overview
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - Spark
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Let Spark Fly: Advantages and Use Cases for Spark on Hadoop
Putting Apache Drill into Production
Putting Apache Drill into Production
An introduction To Apache Spark
An introduction To Apache Spark
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction to Spark
Introduction to Spark
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
Mehr von MapR Technologies
Converging your data landscape
Converging your data landscape
MapR Technologies
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
MapR Technologies
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
MapR Technologies
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
MapR Technologies
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
MapR Technologies
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
MapR Technologies
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
MapR Technologies
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
MapR Technologies
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
MapR Technologies
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
MapR Technologies
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
MapR Technologies
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
MapR Technologies
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
MapR Technologies
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Technologies
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
MapR Technologies
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
MapR Technologies
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
MapR Technologies
MapR and Cisco Make IT Better
MapR and Cisco Make IT Better
MapR Technologies
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
MapR Technologies
Mehr von MapR Technologies
(20)
Converging your data landscape
Converging your data landscape
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
MapR and Cisco Make IT Better
MapR and Cisco Make IT Better
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
Kürzlich hochgeladen
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
Memoori
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Rafal Los
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
AndikSusilo4
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
Softradix Technologies
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Alan Dix
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
shyamraj55
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Delhi Call girls
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Pixlogix Infotech
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
XfilesPro
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
hans926745
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
Paola De la Torre
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Katpro Technologies
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
OnBoard
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Patryk Bandurski
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j
Kürzlich hochgeladen
(20)
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Is Spark Replacing Hadoop
1.
® © 2016 MapR
Technologies 1 ® © 2016 MapR Technologies Is Spark Replacing MapReduce? Hadoop? Keys Botzum, Senior Principal Technologist March 2016 Last update: March 29, 2016
2.
® © 2016 MapR
Technologies 2 Companies With Spark on MapR In Production Fortune 500 Global Telecom Fortune 500 Health Care Global Financial Services
3.
® © 2016 MapR
Technologies 3 Cisco: Security Intelligence Operations Sensor data lands in Hadoop Streaming for real time detection and threat alerts Data next processed on GraphX and Mahout to build threat detection models and accelerated reporting Additional SQL querying for end customer reporting and threat detection
4.
® © 2016 MapR
Technologies 4 Circa 2014 …
5.
® © 2016 MapR
Technologies 5 Next-Gen Genomics Existing process takes several weeks to align chemical compounds with genes ADAM on Spark allows realignment in a few hours Geneticists can minimize engineering dependency
6.
® © 2016 MapR
Technologies 6 Is replacing ?
7.
® © 2016 MapR
Technologies 7 How about Prod. Mgr’s favorite tool –checkbox list! DAG Persistent Store Machine Learning Graph Streaming Batch SQL Interactive SQL Security Resource Management Multitenancy Others
8.
® © 2016 MapR
Technologies 8 Pluggable data parallel framework HDFS and HBase API based Persistent Store • Proven MapReduce, Hive, Pig • YARN introduces pluggability • Allows for multiple frameworks • Standard for scale out big data store • Stores data as files and tables • Secure • Includes resource management Wait. What’s Hadoop?
9.
® © 2016 MapR
Technologies 9 Spark and MapReduce are … • Scalable frameworks for executing custom code on a cluster • Nodes in the cluster work independently to process fragments of data and also combine those fragments together when appropriate to yield a final result • Can tolerate loss of a node during a computation • Require a distributed storage layer for common data view
10.
® © 2016 MapR
Technologies 10 What’s MapReduce • Map – Loading of the data and defining a set of keys • Reduce – Collects the organized key-based data to process and output • Performance can be tweaked based on known details of your source files and cluster shape (size, total number)
11.
® © 2016 MapR
Technologies 11 MapReduce Processing Model • Define mappers • Shuffling is automatic • Define reducers • For complex work, chain jobs together
12.
® © 2016 MapR
Technologies 12 MapReduce: The Good • Built in fault tolerance • Optimized IO path • Scalable • Developer focuses on Map/Reduce, not infrastructure • simple? API
13.
® © 2016 MapR
Technologies 13 MapReduce: The Bad • Batch oriented • Optimized for disk IO – Doesn’t leverage memory well – Iterative algorithms go through disk IO path again and again • Primitive API – Developer’s have to build on very simple abstraction – Key/Value in/out – Even basic things like join require extensive code • Result often many files that need to be combined appropriately
14.
® © 2016 MapR
Technologies 14 Batch Interactive Streaming Framework Pluggable Persistent Store • Powerful API • Leverages memory aggressively • Batch and streaming • MapR-FS, HDFS • MapR-DB, HBase, Cassandra • MapR-Streams, Kafka • S3 What’s Spark?
15.
® © 2016 MapR
Technologies 15 Apache Spark • spark.apache.org • Originally developed in 2009 in UC Berkeley’s AMP Lab • Fully open sourced in 2010 – now at Apache Software Foundation
16.
® © 2016 MapR
Technologies 16 Spark: Ease of Use and Performance • Easy to Develop – Rich APIs in Java, Scala, Python, R – Interactive shell • Fast to Run – General execution graphs – In-memory storage Less code, simpler code
17.
® © 2016 MapR
Technologies 17 Resilient Distributed Datasets (RDD) • Spark revolves around RDDs • Fault-tolerant read only collection of elements that can be operated on in parallel • Cached in memory or on disk http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf Newer API based around DataFrames but for this presentation difference isn’t important
18.
® © 2016 MapR
Technologies 18 RDD Operations - Expressive • Transformations – Creation of a new RDD dataset from an existing • map, filter, distinct, union, sample, groupByKey, join, reduce, etc… • Actions – Return a value after running a computation • collect, count, first, takeSample, foreach, etc… Check the documentation for a complete list
19.
® © 2016 MapR
Technologies 19 • Spark Scala Easy: Example – Word Count • Spark Java• Hadoop MapReduce Java public static class WordCountMapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } } } public static class WorkdCountReduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); }} JavaRDD<String> textFile = sc.textFile("hdfs://..."); JavaRDD<String> words = textFile.flatMap(new FlatMapFunction<String, String>() { public Iterable<String> call(String s) { return Arrays.asList(s.split(" ")); } }); JavaPairRDD<String, Integer> pairs = words.mapToPair(new PairFunction<String, String, Integer>() { public Tuple2<String, Integer> call(String s) { return new Tuple2<String, Integer>(s, 1); } }); JavaPairRDD<String, Integer> counts = pairs.reduceByKey(new Function2<Integer, Integer, Integer>() { public Integer call(Integer a, Integer b) { return a + b; } }); counts.saveAsTextFile("hdfs://..."); Source: http://spark.apache.org/examples.html# val textFile = sc.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...")
20.
® © 2016 MapR
Technologies 20 Faster for Iterative: PageRank Performance 171 80 23 14 0 50 100 150 200 30 60 Iterationtime(s) Number of machines Hadoop Spark
21.
® © 2016 MapR
Technologies 21 Spark vs. MapReduce • Spark is faster than MR for iterative algorithms that fit data in memory • Spark code is easier to write and easier to understand than MR – Your programming is closer to the correct abstraction • Spark supports batch and streaming model • Advantage Spark – Caution: not all applications run faster on Spark and Spark may have limitations for some scenarios
22.
® © 2016 MapR
Technologies 22 Is replacing ? Is replacing MapReduce? Quite possibly….with time...with caveats
23.
® © 2016 MapR
Technologies 23 Unified Easy Batch Interactive Streaming Framework Pluggable Persistent Store Pluggable data parallel framework HDFS and HBase Persistent Store Hadoop is more than MapReduce Needs a resource manager Includes a resource manager (YARN)
24.
® © 2016 MapR
Technologies 24 Hadoop Supports so Much • Alternative batch models: Pig, Cascading, Spark • Machine learning: Mahout, SparkML • SQL: Hive, Drill, Hive on Tez, Impala, SparkSQL • Stream processing: Storm, Flink, Spark, DataTorrent • ETL: Sqoop, Flume • Storage: file (HDFS/MapR-FS), table (HBase/MapR-DB/Accumulo), messaging (Kafka/MapR-Streams) • Data exploration: Hue • And too many excellent commercial tools to list • Hypothesis: – Infrastructure and data tend to be sticky while execution frameworks evolve rapidly – Hadoop’s infrastructure and storage supports a vigorous and growing ecosystem of “competing” execution engines
25.
® © 2016 MapR
Technologies 25 Perspective Unified Easy Batch Interactive Streaming Framework Pluggable data parallel framework HDFS and HBase Persistent Store Interactive SQL (Drill, Impala, Hive.next) Streaming (Flink, Storm DataTorrent) RDBMS (e.g SpliceMachine) Ecosystem SLA (YARN resource reservation, distro mgmt tools, Pepperdata, …) Security (Drill Views, Ranger, Sentry, BlueTalon…) Data Wrangling, discovery and governance (Trifacta, Paxata, Waterline…)
26.
® © 2016 MapR
Technologies 26 Unified Easy Batch Interactive Streaming Framework Pluggable Persistent Store Perspective Ecosystem/Environments Resource Management – YARN, Mesos, Kubernetes Deployment – Private OpenStack, Public Cloud, Hybrid NoSQL/Search (Cassandra, ES) In Mem (SAP Hana,MemSQL) RDBMS (mySQL, Oracle, etc) Hadoop (Hbase, HDFS)
27.
® © 2016 MapR
Technologies 27 Which is More Realistic? What about classic applications and data sharing? Spark becomes primary execution framework Hadoop remains primary storage and execution framework
28.
® © 2016 MapR
Technologies 28 Is replacing ? Is replacing MapReduce? Quite possibly….with time...with caveats Seems improbable Hadoop grows to embrace new execution frameworks
29.
® © 2016 MapR
Technologies 29
30.
® © 2016 MapR
Technologies 30 MapR Platform Services: Open API Architecture Assures Interoperability, Avoids Lock-in HDFS API POSIX NFS SQL, Hbase API JSON API Kafka API
31.
® © 2016 MapR
Technologies 31 Q&A maprtech kbotzum@mapr.com Engage with us! MapR maprtech mapr-technologies
32.
® © 2016 MapR
Technologies 32 References • Spark vs. MapReduce: – https://www.mapr.com/blog/apache-spark-vs-mapreduce-whiteboard- walkthrough – http://www.vldb.org/pvldb/vol8/p2110-shi.pdf – http://aptuz.com/blog/is-apache-spark-going-to-replace-hadoop/ • Spark: http://spark.apache.org/ • Spark on MapR: http://maprdocs.mapr.com/51/index.html#Spark/ Spark_26984599.html
Jetzt herunterladen