SlideShare ist ein Scribd-Unternehmen logo
1 von 10
Dr. Stephan Schenk
Dr. Frank Heilmann
Combining Big Data and
HPC in a GRIDScaler
environment
BASF’s segments
Chemicals
Petrochemicals
Intermediates
Materials
Performance Materials
Monomers
Industrial
Solutions
Dispersions & Pigments
Performance Chemicals
Surface
Technologies
Catalysts
Coatings
Construction Chemicals*
Nutrition &
Care
Nutrition & Health
Care Chemicals
Agricultural
Solutions
* We are considering the possibility of merging our construction chemicals business with a strong partner, as well as the option of divesting this business. The
outcome of this review is open. The Construction Chemicals division will be reported under the Surface Technologies segment until signing of a transaction
agreement.
Integrating digital technologies into BASF’s R&D operations
will boost innovative power
Digital Capabilities
Data and knowledge management
Algorithms and statistical applications
Scientific modeling and simulation
Machine Learning
Research & Development
Hypothesis
Experiments
Analysis
Validation of models
This Photo by Unknown Author is licensed under CC BY-SA
1996 2000 2004 2008 2012 2016 2019
Supercomputing at BASF
PeakPerformance(GFLOPS)
BASF HPC history Quriosity Specifications
 Quriosity debuted at #65 in June 2017
with Rmax = 1.75 PFLOPS
 HPE Apollo 6000 Gen10, 888 nodes
 2x Intel® Xeon Gold 6148 („Skylake“)
 192/384/768/3072 GB RAM
 Intel® Omnipath interconnect
 DDN GRIDScaler 5 PByte (GPFS)
 Red Hat Enterprise Linux 7
 Altair PBSPro scheduler
Significant opportunity for BASF to establish leadership in R&D supercomputing
109
106
103
100
#1 among
TOP500 computers
largest computer
system in BASF
Quriosity
Apache Spark on Quriosity and Spectrum Scale:
Big-Data workflows to complement HPC
Example I: Image classification
Train
classifier
(HPC/AI)
Use classifier in a
Spark job on a huge
numbers of images
Apache Spark job can use
complete API
Spark job is scheduled and
runs like any other job
Job uses existing global
filesystem
Example II: Full-text indexing and text mining
Machine learning,
e.g. document
clustering
Full-text indexing
This Photo by Unknown author
is licensed under CC BY-ND.
This Photo by Unknown author is
licensed under CC BY-SA.
Deploying Apache Spark on an HPC system
 Deploy Spark in standalone mode (untar)
 Spin-up Spark cluster at beginning of HPC job
 Integration with PBS by setting appropriate
environment variables
 Spark job has complete API available
(Python, Scala, Libraries)
 Files can be accessed directly
sc.textFile("/gpfs/big_data")
sc.saveAsTextFile("/gpfs/results")
 Multi-node jobs require global filesystem of your
choice
#!/bin/bash
#PBS -l select=2:ncpus=40:mem=160GB
#PBS -l place=scatter:excl
#PBS –N spark-on-hpc
module load spark
# Spawn the Spark cluster
export SPARK_MASTER_HOST="$(hostname -f)"
export SPARK_MASTER_PORT="7077“
export SPARK_SLAVES="${PBS_NODEFILE}"
${SPARK_HOME}/sbin/start-all.sh
sparkmaster="spark://${SPARK_MASTER_HOST}:${SPARK_MASTER_PORT}"
# Run the Spark script
${SPARK_HOME}/bin/spark-submit --master ${sparkmaster} script.py
# Teardown the Spark cluster
${SPARK_HOME}/sbin/stop-all.sh --wait
 Inspired by https://github.com/glennklockwood/hpchadoop
Experimenting with HDFS Transparency in Spectrum Scale
 HDFS Transparency
integrated with
Hortonworks HDP
Hadoop Applications
Spark MapReduce Hive HBase …….
Namespace hdfs://quriosity-hdfs:8020
Block Management using
Spectrum Scale HDFS NameNode
Spectrum Scale DataNode1 Spectrum Scale DataNode2
Namespace hdfs://native-hdfs:8020
Block Management using
native HDFS NameNode
Native HDFS DataNode3Native HDFS DataNode2Native HDFS DataNode1
ViewFS
Benchmarking HDFS Transparency on Quriosity
 Benchmark TestDFSIO executed on a single
compute node
 Consistent performance across all test data
sizes
 I/O rate essentially limited by 10G network used
for communication
10GB 20GB 30GB 40GB 50GB
Avg I/O Rate Write 854.38 861.69 860.52 862 866.59
Avg I/O Rate Read 906.7 904.39 890.99 876.82 892.98
0
100
200
300
400
500
600
700
800
900
1000
I/OrateinMB/s
Size of test files
Avg I/O Rate TestDFSIO
https://www.basf.com/supercomputer
Further information
Combining Big Data and HPC in a GRIDScalar Environment

Weitere ähnliche Inhalte

Was ist angesagt?

20171012 found IT #9 PySparkの勘所
20171012 found  IT #9 PySparkの勘所20171012 found  IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所Ryuji Tamagawa
 
PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) Ryuji Tamagawa
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystemSlideCentral
 
Rfhoc a random forest approach to auto-tuning hadoop's configuration
Rfhoc a random forest approach to auto-tuning hadoop's configurationRfhoc a random forest approach to auto-tuning hadoop's configuration
Rfhoc a random forest approach to auto-tuning hadoop's configurationieeepondy
 
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所Ryuji Tamagawa
 
Presentation at the EMBL-EBI Industry RDF meeting
Presentation at the EMBL-EBI  Industry RDF meetingPresentation at the EMBL-EBI  Industry RDF meeting
Presentation at the EMBL-EBI Industry RDF meetingJohannes Keizer
 
Deadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Deadline-aware MapReduce Job Scheduling with Dynamic Resource AvailabilityDeadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Deadline-aware MapReduce Job Scheduling with Dynamic Resource AvailabilityJAYAPRAKASH JPINFOTECH
 
Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010Yahoo Developer Network
 
Introduction of Spark
Introduction of SparkIntroduction of Spark
Introduction of SparkShao-Yen Hung
 
Hadoop development series(1)
Hadoop development series(1)Hadoop development series(1)
Hadoop development series(1)Amar kumar
 
Are you a Tortoise or a Hare?
Are you a Tortoise or a Hare?Are you a Tortoise or a Hare?
Are you a Tortoise or a Hare?ArangoDB Database
 
Quandl, r and power bi
Quandl, r and power biQuandl, r and power bi
Quandl, r and power biJohann Krugell
 
Hw09 Hadoop Development At Facebook Hive And Hdfs
Hw09   Hadoop Development At Facebook  Hive And HdfsHw09   Hadoop Development At Facebook  Hive And Hdfs
Hw09 Hadoop Development At Facebook Hive And HdfsCloudera, Inc.
 

Was ist angesagt? (19)

20171012 found IT #9 PySparkの勘所
20171012 found  IT #9 PySparkの勘所20171012 found  IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所
 
PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase)
 
Enterprise Data Lakes
Enterprise Data LakesEnterprise Data Lakes
Enterprise Data Lakes
 
Big data ecosystem
Big data ecosystemBig data ecosystem
Big data ecosystem
 
Rfhoc a random forest approach to auto-tuning hadoop's configuration
Rfhoc a random forest approach to auto-tuning hadoop's configurationRfhoc a random forest approach to auto-tuning hadoop's configuration
Rfhoc a random forest approach to auto-tuning hadoop's configuration
 
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
 
Presentation at the EMBL-EBI Industry RDF meeting
Presentation at the EMBL-EBI  Industry RDF meetingPresentation at the EMBL-EBI  Industry RDF meeting
Presentation at the EMBL-EBI Industry RDF meeting
 
Deadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Deadline-aware MapReduce Job Scheduling with Dynamic Resource AvailabilityDeadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
Deadline-aware MapReduce Job Scheduling with Dynamic Resource Availability
 
Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010Hive integration: HBase and Rcfile__HadoopSummit2010
Hive integration: HBase and Rcfile__HadoopSummit2010
 
Cluj meetup bigdata-final-version
Cluj meetup bigdata-final-versionCluj meetup bigdata-final-version
Cluj meetup bigdata-final-version
 
Meeting20150109 v1
Meeting20150109 v1Meeting20150109 v1
Meeting20150109 v1
 
Introduction of Spark
Introduction of SparkIntroduction of Spark
Introduction of Spark
 
Hadoop development series(1)
Hadoop development series(1)Hadoop development series(1)
Hadoop development series(1)
 
Hadoop 1 vs hadoop2
Hadoop 1 vs hadoop2Hadoop 1 vs hadoop2
Hadoop 1 vs hadoop2
 
Mapreduce Tutorial
Mapreduce TutorialMapreduce Tutorial
Mapreduce Tutorial
 
Are you a Tortoise or a Hare?
Are you a Tortoise or a Hare?Are you a Tortoise or a Hare?
Are you a Tortoise or a Hare?
 
Quandl, r and power bi
Quandl, r and power biQuandl, r and power bi
Quandl, r and power bi
 
DevTalks Bucharest
DevTalks BucharestDevTalks Bucharest
DevTalks Bucharest
 
Hw09 Hadoop Development At Facebook Hive And Hdfs
Hw09   Hadoop Development At Facebook  Hive And HdfsHw09   Hadoop Development At Facebook  Hive And Hdfs
Hw09 Hadoop Development At Facebook Hive And Hdfs
 

Ähnlich wie Combining Big Data and HPC in a GRIDScalar Environment

IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016Anand Haridass
 
OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkDatabricks
 
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...inside-BigData.com
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling WaterSri Ambati
 
OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019OpenACC
 
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache SparkBig Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache SparkIRJET Journal
 
HDPCD Spark using Python (pyspark)
HDPCD Spark using Python (pyspark)HDPCD Spark using Python (pyspark)
HDPCD Spark using Python (pyspark)Durga Gadiraju
 
OpenACC Monthly Highlights: August 2020
OpenACC Monthly Highlights: August 2020OpenACC Monthly Highlights: August 2020
OpenACC Monthly Highlights: August 2020OpenACC
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020OpenACC
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsIgor José F. Freitas
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overviewMartin Zapletal
 
Programmability in spss 14
Programmability in spss 14Programmability in spss 14
Programmability in spss 14Armand Ruis
 

Ähnlich wie Combining Big Data and HPC in a GRIDScalar Environment (20)

IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
 
Spark ML Pipeline serving
Spark ML Pipeline servingSpark ML Pipeline serving
Spark ML Pipeline serving
 
OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
H2O PySparkling Water
H2O PySparkling WaterH2O PySparkling Water
H2O PySparkling Water
 
BigData_Krishna Kumar Sharma
BigData_Krishna Kumar SharmaBigData_Krishna Kumar Sharma
BigData_Krishna Kumar Sharma
 
OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019OpenACC Monthly Highlights September 2019
OpenACC Monthly Highlights September 2019
 
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache SparkBig Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
 
HDPCD Spark using Python (pyspark)
HDPCD Spark using Python (pyspark)HDPCD Spark using Python (pyspark)
HDPCD Spark using Python (pyspark)
 
OpenACC Monthly Highlights: August 2020
OpenACC Monthly Highlights: August 2020OpenACC Monthly Highlights: August 2020
OpenACC Monthly Highlights: August 2020
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020OpenACC Monthly Highlights: June 2020
OpenACC Monthly Highlights: June 2020
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systems
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overview
 
Programmability in spss 14
Programmability in spss 14Programmability in spss 14
Programmability in spss 14
 

Mehr von inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

Mehr von inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Kürzlich hochgeladen

TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?Paolo Missier
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPTiSEO AI
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfUK Journal
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 

Kürzlich hochgeladen (20)

Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 

Combining Big Data and HPC in a GRIDScalar Environment

  • 1. Dr. Stephan Schenk Dr. Frank Heilmann Combining Big Data and HPC in a GRIDScaler environment
  • 2. BASF’s segments Chemicals Petrochemicals Intermediates Materials Performance Materials Monomers Industrial Solutions Dispersions & Pigments Performance Chemicals Surface Technologies Catalysts Coatings Construction Chemicals* Nutrition & Care Nutrition & Health Care Chemicals Agricultural Solutions * We are considering the possibility of merging our construction chemicals business with a strong partner, as well as the option of divesting this business. The outcome of this review is open. The Construction Chemicals division will be reported under the Surface Technologies segment until signing of a transaction agreement.
  • 3. Integrating digital technologies into BASF’s R&D operations will boost innovative power Digital Capabilities Data and knowledge management Algorithms and statistical applications Scientific modeling and simulation Machine Learning Research & Development Hypothesis Experiments Analysis Validation of models This Photo by Unknown Author is licensed under CC BY-SA
  • 4. 1996 2000 2004 2008 2012 2016 2019 Supercomputing at BASF PeakPerformance(GFLOPS) BASF HPC history Quriosity Specifications  Quriosity debuted at #65 in June 2017 with Rmax = 1.75 PFLOPS  HPE Apollo 6000 Gen10, 888 nodes  2x Intel® Xeon Gold 6148 („Skylake“)  192/384/768/3072 GB RAM  Intel® Omnipath interconnect  DDN GRIDScaler 5 PByte (GPFS)  Red Hat Enterprise Linux 7  Altair PBSPro scheduler Significant opportunity for BASF to establish leadership in R&D supercomputing 109 106 103 100 #1 among TOP500 computers largest computer system in BASF Quriosity
  • 5. Apache Spark on Quriosity and Spectrum Scale: Big-Data workflows to complement HPC Example I: Image classification Train classifier (HPC/AI) Use classifier in a Spark job on a huge numbers of images Apache Spark job can use complete API Spark job is scheduled and runs like any other job Job uses existing global filesystem Example II: Full-text indexing and text mining Machine learning, e.g. document clustering Full-text indexing This Photo by Unknown author is licensed under CC BY-ND. This Photo by Unknown author is licensed under CC BY-SA.
  • 6. Deploying Apache Spark on an HPC system  Deploy Spark in standalone mode (untar)  Spin-up Spark cluster at beginning of HPC job  Integration with PBS by setting appropriate environment variables  Spark job has complete API available (Python, Scala, Libraries)  Files can be accessed directly sc.textFile("/gpfs/big_data") sc.saveAsTextFile("/gpfs/results")  Multi-node jobs require global filesystem of your choice #!/bin/bash #PBS -l select=2:ncpus=40:mem=160GB #PBS -l place=scatter:excl #PBS –N spark-on-hpc module load spark # Spawn the Spark cluster export SPARK_MASTER_HOST="$(hostname -f)" export SPARK_MASTER_PORT="7077“ export SPARK_SLAVES="${PBS_NODEFILE}" ${SPARK_HOME}/sbin/start-all.sh sparkmaster="spark://${SPARK_MASTER_HOST}:${SPARK_MASTER_PORT}" # Run the Spark script ${SPARK_HOME}/bin/spark-submit --master ${sparkmaster} script.py # Teardown the Spark cluster ${SPARK_HOME}/sbin/stop-all.sh --wait  Inspired by https://github.com/glennklockwood/hpchadoop
  • 7. Experimenting with HDFS Transparency in Spectrum Scale  HDFS Transparency integrated with Hortonworks HDP Hadoop Applications Spark MapReduce Hive HBase ……. Namespace hdfs://quriosity-hdfs:8020 Block Management using Spectrum Scale HDFS NameNode Spectrum Scale DataNode1 Spectrum Scale DataNode2 Namespace hdfs://native-hdfs:8020 Block Management using native HDFS NameNode Native HDFS DataNode3Native HDFS DataNode2Native HDFS DataNode1 ViewFS
  • 8. Benchmarking HDFS Transparency on Quriosity  Benchmark TestDFSIO executed on a single compute node  Consistent performance across all test data sizes  I/O rate essentially limited by 10G network used for communication 10GB 20GB 30GB 40GB 50GB Avg I/O Rate Write 854.38 861.69 860.52 862 866.59 Avg I/O Rate Read 906.7 904.39 890.99 876.82 892.98 0 100 200 300 400 500 600 700 800 900 1000 I/OrateinMB/s Size of test files Avg I/O Rate TestDFSIO

Hinweis der Redaktion

  1. Inspired by talk of Prof. Joel Zysman, Director, HPC, University of Miami at DDN User Group Meeting in 2017
  2. As of January 1, 2019, we have grouped our twelve divisions into six segments: The Chemicals segment will remain the cornerstone of our Verbund structure. It supplies the other segments with basic chemicals and intermediates, contributing to the organic growth of our key value chains. Alongside internal accounts, our customers include the chemical and plastics industries. We aim to increase our competitiveness through technological leadership and operational excellence. The Materials segment’s portfolio comprises advanced materials and their precursors for new applications and systems. These include isocyanates and polyamides as well as inorganic basic products and specialties for the plastics and plastics processing industries. We aim to grow organically through differentiation via specific technological expertise, industry know-how and customer proximity to maximize value in the isocyanate and polyamide value chains. The Industrial Solutions segment develops and markets ingredients and additives for industrial applications such as polymer dispersions, pigments, resins, electronic materials, antioxidants and admixtures. We aim to drive organic growth in key industries such as automotive, plastics or electronics and expand our position in value-enhancing ingredients and solutions by leveraging our comprehensive industry expertise and application know-how. The Surface Technologies segment comprises our businesses that offer chemical solutions on and for surfaces. Its portfolio includes coatings, rust protection products, catalysts and battery materials for the automotive and chemical industries. The aim is to drive organic growth by leveraging our portfolio of technologies and know-how, and to establish BASF as a leading and innovative provider of battery materials as well. In the Nutrition & Care segment, we strive to expand our position as a leading provider of nutrition and care ingredients for consumer products in the area of nutrition, home and personal care. Customers include food and feed producers as well as the pharmaceutical, cosmetics, detergent and cleaner industries. We aim to enhance and broaden our product and technology portfolio. Our goal is to drive organic growth by focusing on emerging markets, new business models and sustainability trends in consumer markets, supported by targeted acquisitions. The Agricultural Solutions segment aims to further strengthen our market position as an integrated provider of crop protection products and seeds. Its portfolio comprises fungicides, herbicides, insecticides and biological crop protection products, as well as seeds and seed treatment products. We also offer farmers digital solutions combined with practical advice. Our main focus is on innovation-driven organic growth, targeted portfolio expansion as well as leveraging synergies from the acquired businesses. Source: BASF Report 2018, page 19
  3. Benchmark with one compute node only I/O bandwidth is limited by 10G network