Suche senden
Hochladen
Apache Spark in Scientific Applciations
•
Als PPTX, PDF herunterladen
•
0 gefällt mir
•
381 views
Dr. Mirko Kämpf
Folgen
Quick start into Apache Spark for Scientists on their way to "Data Science"
Weniger lesen
Mehr lesen
Wissenschaft
Melden
Teilen
Melden
Teilen
1 von 39
Jetzt herunterladen
Empfohlen
Hadoop & Complex Systems Research
Hadoop & Complex Systems Research
Dr. Mirko Kämpf
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
Databricks
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
Slim Baltagi
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Alex Zeltov
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySpark
Li Jin
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark Summit
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
Sriram Krishnan
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific Applications
Sriram Krishnan
Empfohlen
Hadoop & Complex Systems Research
Hadoop & Complex Systems Research
Dr. Mirko Kämpf
Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
Databricks
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
Slim Baltagi
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Alex Zeltov
Pandas UDF: Scalable Analysis with Python and PySpark
Pandas UDF: Scalable Analysis with Python and PySpark
Li Jin
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark Summit
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
Sriram Krishnan
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific Applications
Sriram Krishnan
Data Science with Spark & Zeppelin
Data Science with Spark & Zeppelin
Vinay Shukla
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
Databricks
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Databricks
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark Summit
Meeting Performance Goals in multi-tenant Hadoop Clusters
Meeting Performance Goals in multi-tenant Hadoop Clusters
DataWorks Summit/Hadoop Summit
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
Databricks
Impala use case @ Zoosk
Impala use case @ Zoosk
Cloudera, Inc.
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Databricks
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using Spark
Itai Yaffe
Ignite Your Big Data With a Spark!
Ignite Your Big Data With a Spark!
Progress
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
Cloudera, Inc.
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
Sri Ambati
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Databricks
Getting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analytics
airisData
Uber's data science workbench
Uber's data science workbench
Ran Wei
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Timothy Spann
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For Scale
Helena Edelson
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
DataWorks Summit/Hadoop Summit
Web 2.0 y sus usos
Web 2.0 y sus usos
Anthony Maya
Petra Costa em Santos
Petra Costa em Santos
Carlota Cafiero
Weitere ähnliche Inhalte
Was ist angesagt?
Data Science with Spark & Zeppelin
Data Science with Spark & Zeppelin
Vinay Shukla
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
Databricks
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Databricks
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark Summit
Meeting Performance Goals in multi-tenant Hadoop Clusters
Meeting Performance Goals in multi-tenant Hadoop Clusters
DataWorks Summit/Hadoop Summit
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
Databricks
Impala use case @ Zoosk
Impala use case @ Zoosk
Cloudera, Inc.
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Databricks
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using Spark
Itai Yaffe
Ignite Your Big Data With a Spark!
Ignite Your Big Data With a Spark!
Progress
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
Cloudera, Inc.
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
Sri Ambati
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Databricks
Getting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analytics
airisData
Uber's data science workbench
Uber's data science workbench
Ran Wei
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Timothy Spann
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For Scale
Helena Edelson
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
DataWorks Summit/Hadoop Summit
Was ist angesagt?
(20)
Data Science with Spark & Zeppelin
Data Science with Spark & Zeppelin
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan Saldich
Meeting Performance Goals in multi-tenant Hadoop Clusters
Meeting Performance Goals in multi-tenant Hadoop Clusters
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
The Azure Cognitive Services on Spark: Clusters with Embedded Intelligent Ser...
Impala use case @ Zoosk
Impala use case @ Zoosk
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Spark Summit East 2015 Keynote -- Databricks CEO Ion Stoica
Building an ETL pipeline for Elasticsearch using Spark
Building an ETL pipeline for Elasticsearch using Spark
Ignite Your Big Data With a Spark!
Ignite Your Big Data With a Spark!
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Downscaling: The Achilles heel of Autoscaling Apache Spark Clusters
Getting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analytics
Uber's data science workbench
Uber's data science workbench
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For Scale
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
Andere mochten auch
Web 2.0 y sus usos
Web 2.0 y sus usos
Anthony Maya
Petra Costa em Santos
Petra Costa em Santos
Carlota Cafiero
edited publication list_Nicole_L_McNiven
edited publication list_Nicole_L_McNiven
Nicole McNiven
Bbn media kit - 08 - 2016
Bbn media kit - 08 - 2016
Diego Handera
3-й этап Минской Городской Лиги Каратэ сезона 2016-2017.
3-й этап Минской Городской Лиги Каратэ сезона 2016-2017.
Молодежное Общественное Объединение «Спортивный клуб «Сэнкё»»
Dev ops, noops or hypeops - Networkshop44
Dev ops, noops or hypeops - Networkshop44
Jisc
Pres1 ppp
Pres1 ppp
BillNye007
Campus network refresh - Networkshop44
Campus network refresh - Networkshop44
Jisc
Genre presentation
Genre presentation
rdeable
Andere mochten auch
(9)
Web 2.0 y sus usos
Web 2.0 y sus usos
Petra Costa em Santos
Petra Costa em Santos
edited publication list_Nicole_L_McNiven
edited publication list_Nicole_L_McNiven
Bbn media kit - 08 - 2016
Bbn media kit - 08 - 2016
3-й этап Минской Городской Лиги Каратэ сезона 2016-2017.
3-й этап Минской Городской Лиги Каратэ сезона 2016-2017.
Dev ops, noops or hypeops - Networkshop44
Dev ops, noops or hypeops - Networkshop44
Pres1 ppp
Pres1 ppp
Campus network refresh - Networkshop44
Campus network refresh - Networkshop44
Genre presentation
Genre presentation
Ähnlich wie Apache Spark in Scientific Applciations
Introduction to spark
Introduction to spark
Home
APACHE SPARK.pptx
APACHE SPARK.pptx
DeepaThirumurugan
Started with-apache-spark
Started with-apache-spark
Happiest Minds Technologies
Spark_Part 1
Spark_Part 1
Shashi Prakash
Data Science and CDSW
Data Science and CDSW
Jason Hubbard
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
MLconf
Atlanta MLConf
Atlanta MLConf
Qubole
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptx
Rahul Borate
Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr
Cloudera, Inc.
Apache Spark Fundamentals
Apache Spark Fundamentals
Zahra Eskandari
Spark 101
Spark 101
Shahaf Azriely {TopLinked} ☁
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
Amanda Casari
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Agile Testing Alliance
Apache Spark in Industry
Apache Spark in Industry
Dorian Beganovic
Spark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Cloudera, Inc.
Real Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
Apache spark
Apache spark
Dona Mary Philip
Ähnlich wie Apache Spark in Scientific Applciations
(20)
Introduction to spark
Introduction to spark
APACHE SPARK.pptx
APACHE SPARK.pptx
Started with-apache-spark
Started with-apache-spark
Spark_Part 1
Spark_Part 1
Data Science and CDSW
Data Science and CDSW
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Atlanta MLConf
Atlanta MLConf
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptx
Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr
Apache Spark Fundamentals
Apache Spark Fundamentals
Spark 101
Spark 101
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Apache Spark in Industry
Apache Spark in Industry
Spark One Platform Webinar
Spark One Platform Webinar
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Real Time Analytics with Dse
Real Time Analytics with Dse
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Apache spark
Apache spark
Mehr von Dr. Mirko Kämpf
Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
Dr. Mirko Kämpf
IoT meets AI in the Clouds
IoT meets AI in the Clouds
Dr. Mirko Kämpf
Improving computer vision models at scale (Strata Data NYC)
Improving computer vision models at scale (Strata Data NYC)
Dr. Mirko Kämpf
Improving computer vision models at scale presentation
Improving computer vision models at scale presentation
Dr. Mirko Kämpf
Enterprise Metadata Integration
Enterprise Metadata Integration
Dr. Mirko Kämpf
PCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System Tuning
Dr. Mirko Kämpf
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road map
Dr. Mirko Kämpf
From Events to Networks: Time Series Analysis on Scale
From Events to Networks: Time Series Analysis on Scale
Dr. Mirko Kämpf
Apache Spark in Scientific Applications
Apache Spark in Scientific Applications
Dr. Mirko Kämpf
DPG Berlin - SOE 18 - talk v1.2.4
DPG Berlin - SOE 18 - talk v1.2.4
Dr. Mirko Kämpf
Information Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation Optimization
Dr. Mirko Kämpf
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
Dr. Mirko Kämpf
Mehr von Dr. Mirko Kämpf
(12)
Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
IoT meets AI in the Clouds
IoT meets AI in the Clouds
Improving computer vision models at scale (Strata Data NYC)
Improving computer vision models at scale (Strata Data NYC)
Improving computer vision models at scale presentation
Improving computer vision models at scale presentation
Enterprise Metadata Integration
Enterprise Metadata Integration
PCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System Tuning
Etosha - Data Asset Manager : Status and road map
Etosha - Data Asset Manager : Status and road map
From Events to Networks: Time Series Analysis on Scale
From Events to Networks: Time Series Analysis on Scale
Apache Spark in Scientific Applications
Apache Spark in Scientific Applications
DPG Berlin - SOE 18 - talk v1.2.4
DPG Berlin - SOE 18 - talk v1.2.4
Information Spread in the Context of Evacuation Optimization
Information Spread in the Context of Evacuation Optimization
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
DPG 2014: "Context Sensitive and Time Dependent Relevance of Wikipedia Articles"
Kürzlich hochgeladen
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
kantirani197
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
Areesha Ahmad
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
Silpa
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
rohankumarsinghrore1
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Monika Rani
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Sheetal Arora
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Delhi Call girls
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
module for grade 9 for distance learning
module for grade 9 for distance learning
levieagacer
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
Areesha Ahmad
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
Sumit Kumar yadav
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Joonhun Lee
Site Acceptance Test .
Site Acceptance Test .
Poonam Aher Patil
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
Kürzlich hochgeladen
(20)
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
module for grade 9 for distance learning
module for grade 9 for distance learning
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Site Acceptance Test .
Site Acceptance Test .
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Apache Spark in Scientific Applciations
1.
‹#›© Cloudera, Inc.
All rights reserved. Mirko Kämpf | 2015 Apache Spark: Next Generation Data Processing for Hadoop
2.
‹#›© Cloudera, Inc.
All rights reserved. Agenda • The Data Science Process (DSP) - Why or when to use Spark • The role of: Apache Hadoop and Apache Spark - History & Hadoop Ecosystem • Apache Spark: Overview and Concepts • Practical Tips
3.
‹#›© Cloudera, Inc.
All rights reserved. The Data Science Process Application of Big-Data-Technology Images from: http://semanticommunity.info/Data_Science/Doing_Data_Science
4.
‹#›© Cloudera, Inc.
All rights reserved. Huge Data Sets in Science Application of Big-Data-Technology Images from: http://semanticommunity.info/Data_Science/Doing_Data_Science
5.
‹#›© Cloudera, Inc.
All rights reserved. “Spark offers tools for Data Science and components for Data Products.” —How can Apache Spark fit into my world?
6.
‹#›© Cloudera, Inc.
All rights reserved. Should I use Apache Spark? • If all my data fits into Excel-Spreadsheets? • If I have a special purpose application to work with? • If my current system is just a bit to slow?
7.
‹#›© Cloudera, Inc.
All rights reserved. Should I use Apache Spark? • If all my data fits into Excel-Spreadsheets? • If I have a special purpose application to work with? • If my current system is just a bit to slow? • Just export as CSV / JSON and use a DataFrame to join with other DS. Why not?
8.
‹#›© Cloudera, Inc.
All rights reserved. Should I use Apache Spark? • If all my data fits into Excel-Spreadsheets? • If I have a special purpose application to work with? • If my current system is just a bit to slow? • Just export as CSV / JSON and use a DataFrame to join with other DS. • Think about additional analysis methods! Maybe it is already built into Apache Spark! Why not?
9.
‹#›© Cloudera, Inc.
All rights reserved. Should I use Apache Spark? • If all my data fits into Excel-Spreadsheets? • If I have a special purpose application to work with? • If my current system is just a bit to slow? • Just export as CSV / JSON and use a DataFrame to join with other DS. • Think about additional analysis methods! Maybe it is build into Spark. • OK, Spark will probably not help to speed up your system, but maybe you can offload data to Hadoop, which releases some resources. Why not?
10.
‹#›© Cloudera, Inc.
All rights reserved. “Spark offers fast in memory processing on huge distributed and even on heterogeneous datasets.” —What type of data fits into Spark?
11.
‹#›© Cloudera, Inc.
All rights reserved. History of Spark Spark is really young, but has a very active community!
12.
‹#›© Cloudera, Inc.
All rights reserved. Timeline: Spark Adoption
13.
‹#›© Cloudera, Inc.
All rights reserved. Apache Spark: Overview & Concepts
14.
‹#›© Cloudera, Inc.
All rights reserved. Hadoop Ecosystem incl. Apache Spark Spark can be an entry point to your Big Data world …
15.
‹#›© Cloudera, Inc.
All rights reserved. “Apache Spark is distributed on top of Hadoop and brings parallel processing to powerful workstations.” —Do I need a Hadoop cluster to work with Apache Spark?
16.
‹#›© Cloudera, Inc.
All rights reserved. Spark vs. MapReduce
17.
‹#›© Cloudera, Inc.
All rights reserved. How to interact with Spark?
18.
‹#›© Cloudera, Inc.
All rights reserved. Spark Components
19.
‹#›© Cloudera, Inc.
All rights reserved.
20.
‹#›© Cloudera, Inc.
All rights reserved. MLLib: GraphX: Basic statistics summary statistics, correlations, stratified sampling, hypothesis testing, random data generation Classification and regression linear models (SVMs, logistic / linear regression) naive Bayes, decision trees ensembles of trees (Random Forests / Gradient-Boosted Trees) isotonic regression Collaborative filtering alternating least squares (ALS) Clustering k-means, Gaussian mixture, power iteration clustering (PIC) latent Dirichlet allocation (LDA), streaming k-means Dimensionality reduction singular value decomposition (SVD) principal component analysis (PCA) … PageRank Connected Components Triangle Counting Pregel API
21.
‹#›© Cloudera, Inc.
All rights reserved. How to use your code in Spark? A. Interactively, by loading it into the spark-shell. B. Contribute to existing Spark projects. C. Create your module and use it in a spark-shell session. D. Build a data-product which uses Apache Spark. For simple and reliable usage of Java classes and complete third-party libraries, we define a Spark Module as a self-contained artifact created by Maven. This module can easily be shared by multiple users via repositories. http://blog.cloudera.com/blog/2015/03/how-to-build-re-usable-spark-programs-using-spark-shell-and-maven/
22.
‹#›© Cloudera, Inc.
All rights reserved. Apache Spark: Overview & Concepts
23.
‹#›© Cloudera, Inc.
All rights reserved. Spark Context
24.
‹#›© Cloudera, Inc.
All rights reserved. RDDs and DataFrames
25.
‹#›© Cloudera, Inc.
All rights reserved. Creation of RDDs
26.
‹#›© Cloudera, Inc.
All rights reserved. Datatypes in RDDs
27.
‹#›© Cloudera, Inc.
All rights reserved.
28.
‹#›© Cloudera, Inc.
All rights reserved.
29.
‹#›© Cloudera, Inc.
All rights reserved. Spark in a Cluster
30.
‹#›© Cloudera, Inc.
All rights reserved. Spark in a Cluster
31.
‹#›© Cloudera, Inc.
All rights reserved.
32.
‹#›© Cloudera, Inc.
All rights reserved.
33.
‹#›© Cloudera, Inc.
All rights reserved. DStream: The heart of Spark Streaming
34.
‹#›© Cloudera, Inc.
All rights reserved. “Efficient hardware utilization, caching, simple APIs, and access to a variety of data in Hadoop is key to success.” —What makes Spark so different, compared to core MapReduce?
35.
‹#›© Cloudera, Inc.
All rights reserved. Practical Tips
36.
‹#›© Cloudera, Inc.
All rights reserved. Development Techniques • Build your tools and analysis procedures in small cycles. • Test all phases of your work and document carefully. • Document what you expect! => Requirements management … • Collect what you get! => Operational logs … • Reuse well tested components and modularize your analysis scripts. • Learn „state of the art“ tools and share your work!
37.
‹#›© Cloudera, Inc.
All rights reserved. Data Management • Think about typical access patterns: • random access to each record or field? • access to entire groups of records? • variable size or fixed size sets? • „full table scan“ • OPTIMIZE FOR YOUR DOMINANT ACCESS PATTERN! • Select efficient storage formats: Avro, Parquet • Index your data in SOLR for random access and data exploration • Indexing can be done by just a few clicks in HUE …
38.
‹#›© Cloudera, Inc.
All rights reserved. Collecting Sensor Data with Spark Streaming … • Spark Streaming works on fixed time slices only (in current version, 1.5) • Use the original time stamp? • Requires additional storage and bandwidth • Original system clock defines resolution • Use „Spark-Time“ or a local time reference: • You may lose information! • You have a limited resolution, defined by batch size.
39.
‹#›© Cloudera, Inc.
All rights reserved. Thank you ! Enjoy Apache Spark and all your data …
Jetzt herunterladen