Suche senden
Hochladen
Apache Spark in Scientific Applications
•
Als PPTX, PDF herunterladen
•
0 gefällt mir
•
1,081 views
Dr. Mirko Kämpf
Folgen
For your Quick Start into Apache Spark!
Weniger lesen
Mehr lesen
Wissenschaft
Melden
Teilen
Melden
Teilen
1 von 39
Jetzt herunterladen
Empfohlen
Enterprise Metadata Integration
Enterprise Metadata Integration
Dr. Mirko Kämpf
Disrupting Big Data with Apache Spark in the Cloud
Disrupting Big Data with Apache Spark in the Cloud
Jen Aman
Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
Dr. Mirko Kämpf
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark Summit
ASPgems - kappa architecture
ASPgems - kappa architecture
Juantomás García Molina
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIO
Jozo Kovac
PCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System Tuning
Dr. Mirko Kämpf
From Events to Networks: Time Series Analysis on Scale
From Events to Networks: Time Series Analysis on Scale
Dr. Mirko Kämpf
Empfohlen
Enterprise Metadata Integration
Enterprise Metadata Integration
Dr. Mirko Kämpf
Disrupting Big Data with Apache Spark in the Cloud
Disrupting Big Data with Apache Spark in the Cloud
Jen Aman
Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
Dr. Mirko Kämpf
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark Summit
ASPgems - kappa architecture
ASPgems - kappa architecture
Juantomás García Molina
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIO
Jozo Kovac
PCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System Tuning
Dr. Mirko Kämpf
From Events to Networks: Time Series Analysis on Scale
From Events to Networks: Time Series Analysis on Scale
Dr. Mirko Kämpf
Big Telco - Yousun Jeong
Big Telco - Yousun Jeong
Spark Summit
Lambda architecture with Spark
Lambda architecture with Spark
Vincent GALOPIN
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
Pat Patterson
Spark - Migration Story
Spark - Migration Story
Roman Chukh
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Databricks
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Databricks
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
Databricks
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Databricks
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Spark Summit
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
Databricks
Fast data for fitness 10 nov 2020
Fast data for fitness 10 nov 2020
Timothy Spann
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
Spark Summit
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
Databricks
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Spark Summit
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Spark Summit
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Spark Summit
Self-Service Analytics on Hadoop: Lessons Learned
Self-Service Analytics on Hadoop: Lessons Learned
DataWorks Summit/Hadoop Summit
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Databricks
Cloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ Netflix
Jerome Boulon
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Databricks
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
Weitere ähnliche Inhalte
Was ist angesagt?
Big Telco - Yousun Jeong
Big Telco - Yousun Jeong
Spark Summit
Lambda architecture with Spark
Lambda architecture with Spark
Vincent GALOPIN
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
Pat Patterson
Spark - Migration Story
Spark - Migration Story
Roman Chukh
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Databricks
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Databricks
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
Databricks
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Databricks
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Spark Summit
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
Databricks
Fast data for fitness 10 nov 2020
Fast data for fitness 10 nov 2020
Timothy Spann
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
Spark Summit
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
Databricks
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Spark Summit
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Spark Summit
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Spark Summit
Self-Service Analytics on Hadoop: Lessons Learned
Self-Service Analytics on Hadoop: Lessons Learned
DataWorks Summit/Hadoop Summit
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Databricks
Cloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ Netflix
Jerome Boulon
Was ist angesagt?
(20)
Big Telco - Yousun Jeong
Big Telco - Yousun Jeong
Lambda architecture with Spark
Lambda architecture with Spark
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
Spark - Migration Story
Spark - Migration Story
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
Fast data for fitness 10 nov 2020
Fast data for fitness 10 nov 2020
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Self-Service Analytics on Hadoop: Lessons Learned
Self-Service Analytics on Hadoop: Lessons Learned
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Near Real-Time Analytics with Apache Spark: Ingestion, ETL, and Interactive Q...
Cloud Connect 2012, Big Data @ Netflix
Cloud Connect 2012, Big Data @ Netflix
Andere mochten auch
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Databricks
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
Apache Spark Briefing
Apache Spark Briefing
Thomas W. Dinsmore
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
tcloudcomputing-tw
Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster Computing
Gerger
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
Amazon Web Services
Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.
Konstantinos Kloudas
What the Spark!? Intro and Use Cases
What the Spark!? Intro and Use Cases
Aerospike, Inc.
Andere mochten auch
(8)
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
Apache Spark Briefing
Apache Spark Briefing
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster Computing
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.
What the Spark!? Intro and Use Cases
What the Spark!? Intro and Use Cases
Ähnlich wie Apache Spark in Scientific Applications
Introduction to spark
Introduction to spark
Home
APACHE SPARK.pptx
APACHE SPARK.pptx
DeepaThirumurugan
Started with-apache-spark
Started with-apache-spark
Happiest Minds Technologies
Spark_Part 1
Spark_Part 1
Shashi Prakash
Data Science and CDSW
Data Science and CDSW
Jason Hubbard
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
MLconf
Atlanta MLConf
Atlanta MLConf
Qubole
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptx
Rahul Borate
Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr
Cloudera, Inc.
Apache Spark Fundamentals
Apache Spark Fundamentals
Zahra Eskandari
Spark 101
Spark 101
Shahaf Azriely {TopLinked} ☁
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
Amanda Casari
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Agile Testing Alliance
Apache Spark in Industry
Apache Spark in Industry
Dorian Beganovic
Spark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Cloudera, Inc.
Real Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
Apache spark
Apache spark
Dona Mary Philip
Ähnlich wie Apache Spark in Scientific Applications
(20)
Introduction to spark
Introduction to spark
APACHE SPARK.pptx
APACHE SPARK.pptx
Started with-apache-spark
Started with-apache-spark
Spark_Part 1
Spark_Part 1
Data Science and CDSW
Data Science and CDSW
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Atlanta MLConf
Atlanta MLConf
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptx
Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr
Apache Spark Fundamentals
Apache Spark Fundamentals
Spark 101
Spark 101
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Apache Spark in Industry
Apache Spark in Industry
Spark One Platform Webinar
Spark One Platform Webinar
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Real Time Analytics with Dse
Real Time Analytics with Dse
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Apache spark
Apache spark
Mehr von Dr. Mirko Kämpf
IoT meets AI in the Clouds
IoT meets AI in the Clouds
Dr. Mirko Kämpf
Improving computer vision models at scale (Strata Data NYC)
Improving computer vision models at scale (Strata Data NYC)
Dr. Mirko Kämpf
Improving computer vision models at scale presentation
Improving computer vision models at scale presentation
Dr. Mirko Kämpf
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road map
Dr. Mirko Kämpf
Apache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
Dr. Mirko Kämpf
DPG Berlin - SOE 18 - talk v1.2.4
DPG Berlin - SOE 18 - talk v1.2.4
Dr. Mirko Kämpf
Information Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation Optimization
Dr. Mirko Kämpf
Hadoop & Complex Systems Research
Hadoop & Complex Systems Research
Dr. Mirko Kämpf
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
Dr. Mirko Kämpf
Mehr von Dr. Mirko Kämpf
(9)
IoT meets AI in the Clouds
IoT meets AI in the Clouds
Improving computer vision models at scale (Strata Data NYC)
Improving computer vision models at scale (Strata Data NYC)
Improving computer vision models at scale presentation
Improving computer vision models at scale presentation
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road map
Apache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
DPG Berlin - SOE 18 - talk v1.2.4
DPG Berlin - SOE 18 - talk v1.2.4
Information Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation Optimization
Hadoop & Complex Systems Research
Hadoop & Complex Systems Research
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
Kürzlich hochgeladen
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
jana861314
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
Areesha Ahmad
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
Sapana Sha
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
PraveenaKalaiselvan1
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
pradhanghanshyam7136
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
Areesha Ahmad
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Diwakar Mishra
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
sankalpkumarsahoo174
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
Sumit Kumar yadav
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
UmerFayaz5
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
Sumit Kumar yadav
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
Sumit Kumar yadav
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
Sérgio Sacani
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
LeenakshiTyagi
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Sarthak Sekhar Mondal
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
Sérgio Sacani
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
kessiyaTpeter
Kürzlich hochgeladen
(20)
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
Apache Spark in Scientific Applications
1.
‹#›© Cloudera, Inc.
All rights reserved. Mirko Kämpf | 2015 Apache Spark: Next Generation Data Processing for Hadoop
2.
‹#›© Cloudera, Inc.
All rights reserved. Agenda • The Data Science Process (DSP) - Why or when to use Spark • The role of: Apache Hadoop and Apache Spark - History & Hadoop Ecosystem • Apache Spark: Overview and Concepts • Practical Tips
3.
‹#›© Cloudera, Inc.
All rights reserved. The Data Science Process Application of Big-Data-Technology Images from: http://semanticommunity.info/Data_Science/Doing_Data_Science
4.
‹#›© Cloudera, Inc.
All rights reserved. Huge Data Sets in Science Application of Big-Data-Technology Images from: http://semanticommunity.info/Data_Science/Doing_Data_Science
5.
‹#›© Cloudera, Inc.
All rights reserved. “Spark offers tools for Data Science and components for Data Products.” —How can Apache Spark fit into my world?
6.
‹#›© Cloudera, Inc.
All rights reserved. Should I use Apache Spark? • If all my data fits into Excel-Spreadsheets? • If I have a special purpose application to work with? • If my current system is just a bit to slow?
7.
‹#›© Cloudera, Inc.
All rights reserved. Should I use Apache Spark? • If all my data fits into Excel-Spreadsheets? • If I have a special purpose application to work with? • If my current system is just a bit to slow? • Just export as CSV / JSON and use a DataFrame to join with other DS. Why not?
8.
‹#›© Cloudera, Inc.
All rights reserved. Should I use Apache Spark? • If all my data fits into Excel-Spreadsheets? • If I have a special purpose application to work with? • If my current system is just a bit to slow? • Just export as CSV / JSON and use a DataFrame to join with other DS. • Think about additional analysis methods! Maybe it is already built into Apache Spark! Why not?
9.
‹#›© Cloudera, Inc.
All rights reserved. Should I use Apache Spark? • If all my data fits into Excel-Spreadsheets? • If I have a special purpose application to work with? • If my current system is just a bit to slow? • Just export as CSV / JSON and use a DataFrame to join with other DS. • Think about additional analysis methods! Maybe it is build into Spark. • OK, Spark will probably not help to speed up your system, but maybe you can offload data to Hadoop, which releases some resources. Why not?
10.
‹#›© Cloudera, Inc.
All rights reserved. “Spark offers fast in memory processing on huge distributed and even on heterogeneous datasets.” —What type of data fits into Spark?
11.
‹#›© Cloudera, Inc.
All rights reserved. History of Spark Spark is really young, but has a very active community!
12.
‹#›© Cloudera, Inc.
All rights reserved. Timeline: Spark Adoption
13.
‹#›© Cloudera, Inc.
All rights reserved. Apache Spark: Overview & Concepts
14.
‹#›© Cloudera, Inc.
All rights reserved. Hadoop Ecosystem incl. Apache Spark Spark can be an entry point to your Big Data world …
15.
‹#›© Cloudera, Inc.
All rights reserved. “Apache Spark is distributed on top of Hadoop and brings parallel processing to powerful workstations.” —Do I need a Hadoop cluster to work with Apache Spark?
16.
‹#›© Cloudera, Inc.
All rights reserved. Spark vs. MapReduce
17.
‹#›© Cloudera, Inc.
All rights reserved. How to interact with Spark?
18.
‹#›© Cloudera, Inc.
All rights reserved. Spark Components
19.
‹#›© Cloudera, Inc.
All rights reserved.
20.
‹#›© Cloudera, Inc.
All rights reserved. MLLib: GraphX: Basic statistics summary statistics, correlations, stratified sampling, hypothesis testing, random data generation Classification and regression linear models (SVMs, logistic / linear regression) naive Bayes, decision trees ensembles of trees (Random Forests / Gradient-Boosted Trees) isotonic regression Collaborative filtering alternating least squares (ALS) Clustering k-means, Gaussian mixture, power iteration clustering (PIC) latent Dirichlet allocation (LDA), streaming k-means Dimensionality reduction singular value decomposition (SVD) principal component analysis (PCA) … PageRank Connected Components Triangle Counting Pregel API
21.
‹#›© Cloudera, Inc.
All rights reserved. How to use your code in Spark? A. Interactively, by loading it into the spark-shell. B. Contribute to existing Spark projects. C. Create your module and use it in a spark-shell session. D. Build a data-product which uses Apache Spark. For simple and reliable usage of Java classes and complete third-party libraries, we define a Spark Module as a self-contained artifact created by Maven. This module can easily be shared by multiple users via repositories. http://blog.cloudera.com/blog/2015/03/how-to-build-re-usable-spark-programs-using-spark-shell-and-maven/
22.
‹#›© Cloudera, Inc.
All rights reserved. Apache Spark: Overview & Concepts
23.
‹#›© Cloudera, Inc.
All rights reserved. Spark Context
24.
‹#›© Cloudera, Inc.
All rights reserved. RDDs and DataFrames
25.
‹#›© Cloudera, Inc.
All rights reserved. Creation of RDDs
26.
‹#›© Cloudera, Inc.
All rights reserved. Datatypes in RDDs
27.
‹#›© Cloudera, Inc.
All rights reserved.
28.
‹#›© Cloudera, Inc.
All rights reserved.
29.
‹#›© Cloudera, Inc.
All rights reserved. Spark in a Cluster
30.
‹#›© Cloudera, Inc.
All rights reserved. Spark in a Cluster
31.
‹#›© Cloudera, Inc.
All rights reserved.
32.
‹#›© Cloudera, Inc.
All rights reserved.
33.
‹#›© Cloudera, Inc.
All rights reserved. DStream: The heart of Spark Streaming
34.
‹#›© Cloudera, Inc.
All rights reserved. “Efficient hardware utilization, caching, simple APIs, and access to a variety of data in Hadoop is key to success.” —What makes Spark so different, compared to core MapReduce?
35.
‹#›© Cloudera, Inc.
All rights reserved. Practical Tips
36.
‹#›© Cloudera, Inc.
All rights reserved. Development Techniques • Build your tools and analysis procedures in small cycles. • Test all phases of your work and document carefully. • Document what you expect! => Requirements management … • Collect what you get! => Operational logs … • Reuse well tested components and modularize your analysis scripts. • Learn „state of the art“ tools and share your work!
37.
‹#›© Cloudera, Inc.
All rights reserved. Data Management • Think about typical access patterns: • random access to each record or field? • access to entire groups of records? • variable size or fixed size sets? • „full table scan“ • OPTIMIZE FOR YOUR DOMINANT ACCESS PATTERN! • Select efficient storage formats: Avro, Parquet • Index your data in SOLR for random access and data exploration • Indexing can be done by just a few clicks in HUE …
38.
‹#›© Cloudera, Inc.
All rights reserved. Collecting Sensor Data with Spark Streaming … • Spark Streaming works on fixed time slices only (in current version, 1.5) • Use the original time stamp? • Requires additional storage and bandwidth • Original system clock defines resolution • Use „Spark-Time“ or a local time reference: • You may lose information! • You have a limited resolution, defined by batch size.
39.
‹#›© Cloudera, Inc.
All rights reserved. Thank you ! Enjoy Apache Spark and all your data …
Jetzt herunterladen