SlideShare ist ein Scribd-Unternehmen logo
Next Generation Grid: Integrating Parallel and
Distributed Computing Runtimes for an HPC
Enhanced Cloud and Fog Spanning IoT Big Data
and Big Simulations
`
Geoffrey Fox, Supun Kamburugamuve, Judy Qiu, Shantenu Jha
June 28, 2017
IEEE Cloud 2017 Honolulu Hawaii
gcf@indiana.edu
http://www.dsc.soic.indiana.edu/, http://spidal.org/
Department of Intelligent Systems Engineering
School of Informatics and Computing, Digital Science Center
Indiana University Bloomington
1
“Next Generation Grid – HPC Cloud” Problem Statement
• Design a dataflow event-driven FaaS (microservice) framework running across
application and geographic domains.
• Build on Cloud best practice but use HPC wherever possible and useful to get high
performance
• Smoothly support current paradigms Hadoop, Spark, Flink, Heron, MPI, DARMA …
• Use interoperable common abstractions but multiple polymorphic
implementations.
• i.e. do not require a single runtime
• Focus on Runtime but this implicitly suggests programming and execution model
• This next generation Grid based on data and edge devices – not computing as in
old Grid
2
• Data gaining in importance compared to simulations
• Data analysis techniques changing with old and new applications
• All forms of IT increasing in importance; both data and simulations increasing
• Internet of Things and Edge Computing growing in importance
• Exascale initiative driving large supercomputers
• Use of public clouds increasing rapidly
• Clouds becoming diverse with subsystems containing GPU’s, FPGA’s, high
performance networks, storage, memory …
• They have economies of scale; hard to compete with
• Serverless computing attractive to user:
“No server is easier to manage than no server”
Important Trends I
3
• Rich software stacks:
• HPC for Parallel Computing
• Apache for Big Data including some edge computing (streaming data)
• On general principles parallel and distributed computing has different requirements even if
sometimes similar functionalities
• Apache stack typically uses distributed computing concepts
• For example, Reduce operation is different in MPI (Harp) and Spark
• Important to put grain size into analysis
• Its easier to make dataflow efficient if grain size large
• Streaming Data ubiquitous including data from edge
• Edge computing has some time-sensitive applications
• Choosing a good restaurant can wait seconds
• Avoiding collisions must be finished in milliseconds
Important Trends II
4
• Classic Supercomputers will continue for large simulations and may run other
applications but these codes will be developed on
• Next-Generation Commodity Systems which are dominant force
• Merge Cloud HPC and Edge computing
• Clouds running in multiple giant datacenters offering all types of computing
• Distributed data sources associated with device and Fog processing resources
• Server-hidden computing for user pleasure
• Support a distributed event driven dataflow computing model covering batch
and streaming data
• Needing parallel and distributed (Grid) computing ideas
Predictions/Assumptions
5
Motivation Summary
• Explosion of Internet of Things and Cloud Computing
• Clouds will continue to grow and will include more use cases
• Edge Computing is adding an additional dimension to Cloud Computing
• Device --- Fog ---Cloud
• Event driven computing is becoming dominant
• Signal generated by a Sensor is an edge event
• Accessing a HPC linear algebra function could be event driven and replace traditional libraries
by FaaS (as NetSolve GridSolve Neos did in old Grid)
• Services will be packaged as a powerful Function as a Service FaaS
• Serverless must be important: users not interested in low level details of IaaS or
even PaaS?
• Applications will span from Edge to Multiple Clouds
6
Implementing these ideas
at a high level
7
• Unit of Processing is an Event driven Function
• Can have state that may need to be preserved in place (Iterative MapReduce)
• Can be hierarchical as in invoking a parallel job
• Functions can be single or 1 of 100,000 maps in large parallel code
• Processing units run in clouds, fogs or devices but these all have similar architecture
• Fog (e.g. car) looks like a cloud to a device (radar sensor) while public cloud looks
like a cloud to the fog (car)
• Use polymorphic runtime that uses different implementations depending on
environment e.g. on fault-tolerance – latency (performance) tradeoffs
• Data locality (minimize explicit dataflow) properly supported as in HPF alignment
commands (specify which data and computing needs to be kept together)
Proposed Approach I
8
• Analyze the runtime of existing systems
• Hadoop, Spark, Flink, Naiad Big Data Processing
• Storm, Heron Streaming Dataflow
• Kepler, Pegasus, NiFi workflow
• Harp Map-Collective, MPI and HPC AMT runtime like DARMA
• And approaches such as GridFTP and CORBA/HLA (!) for wide area data links
• Propose polymorphic unification (given function can have different
implementations)
• Choose powerful scheduler (Mesos?)
• Support processing locality/alignment including MPI’s never move model with
grain size consideration
• One should integrate HPC and Clouds
Proposed Approach II
9
Implementing these ideas
in detail
10
• Google likes to show a timeline; we can build on (Apache version of) this
• 2002 Google File System GFS ~HDFS
• 2004 MapReduce Apache Hadoop
• 2006 Big Table Apache Hbase
• 2008 Dremel Apache Drill
• 2009 Pregel Apache Giraph
• 2010 FlumeJava Apache Crunch
• 2010 Colossus better GFS
• 2012 Spanner horizontally scalable NewSQL database ~CockroachDB
• 2013 F1 horizontally scalable SQL database
• 2013 MillWheel ~Apache Storm, Twitter Heron (Google not first!)
• 2015 Cloud Dataflow Apache Beam with Spark or Flink (dataflow) engine
• Functionalities not identified: Security, Data Transfer, Scheduling, DevOps, serverless
computing (assume OpenWhisk will improve to handle robustly lots of large functions)
Components of Big Data Stack
11
HPC-ABDS
Integrated
wide range of
HPC and Big
Data
technologies.
I gave up
updating!
Kaleidoscope of (Apache) Big Data Stack (ABDS) and HPC Technologies
Cross-
Cutting
Functions
1) Message
and Data
Protocols:
Avro, Thrift,
Protobuf
2) Distributed
Coordination
: Google
Chubby,
Zookeeper,
Giraffe,
JGroups
3) Security &
Privacy:
InCommon,
Eduroam
OpenStack
Keystone,
LDAP, Sentry,
Sqrrl, OpenID,
SAML OAuth
4)
Monitoring:
Ambari,
Ganglia,
Nagios, Inca
17) Workflow-Orchestration: ODE, ActiveBPEL, Airavata, Pegasus, Kepler, Swift, Taverna, Triana, Trident, BioKepler, Galaxy, IPython, Dryad,
Naiad, Oozie, Tez, Google FlumeJava, Crunch, Cascading, Scalding, e-Science Central, Azure Data Factory, Google Cloud Dataflow, NiFi (NSA),
Jitterbit, Talend, Pentaho, Apatar, Docker Compose, KeystoneML
16) Application and Analytics: Mahout , MLlib , MLbase, DataFu, R, pbdR, Bioconductor, ImageJ, OpenCV, Scalapack, PetSc, PLASMA MAGMA,
Azure Machine Learning, Google Prediction API & Translation API, mlpy, scikit-learn, PyBrain, CompLearn, DAAL(Intel), Caffe, Torch, Theano, DL4j,
H2O, IBM Watson, Oracle PGX, GraphLab, GraphX, IBM System G, GraphBuilder(Intel), TinkerPop, Parasol, Dream:Lab, Google Fusion Tables,
CINET, NWB, Elasticsearch, Kibana, Logstash, Graylog, Splunk, Tableau, D3.js, three.js, Potree, DC.js, TensorFlow, CNTK
15B) Application Hosting Frameworks: Google App Engine, AppScale, Red Hat OpenShift, Heroku, Aerobatic, AWS Elastic Beanstalk, Azure, Cloud
Foundry, Pivotal, IBM BlueMix, Ninefold, Jelastic, Stackato, appfog, CloudBees, Engine Yard, CloudControl, dotCloud, Dokku, OSGi, HUBzero, OODT,
Agave, Atmosphere
15A) High level Programming: Kite, Hive, HCatalog, Tajo, Shark, Phoenix, Impala, MRQL, SAP HANA, HadoopDB, PolyBase, Pivotal HD/Hawq,
Presto, Google Dremel, Google BigQuery, Amazon Redshift, Drill, Kyoto Cabinet, Pig, Sawzall, Google Cloud DataFlow, Summingbird
14B) Streams: Storm, S4, Samza, Granules, Neptune, Google MillWheel, Amazon Kinesis, LinkedIn, Twitter Heron, Databus, Facebook
Puma/Ptail/Scribe/ODS, Azure Stream Analytics, Floe, Spark Streaming, Flink Streaming, DataTurbine
14A) Basic Programming model and runtime, SPMD, MapReduce: Hadoop, Spark, Twister, MR-MPI, Stratosphere (Apache Flink), Reef, Disco,
Hama, Giraph, Pregel, Pegasus, Ligra, GraphChi, Galois, Medusa-GPU, MapGraph, Totem
13) Inter process communication Collectives, point-to-point, publish-subscribe: MPI, HPX-5, Argo BEAST HPX-5 BEAST PULSAR, Harp, Netty,
ZeroMQ, ActiveMQ, RabbitMQ, NaradaBrokering, QPid, Kafka, Kestrel, JMS, AMQP, Stomp, MQTT, Marionette Collective, Public Cloud: Amazon
SNS, Lambda, Google Pub Sub, Azure Queues, Event Hubs
12) In-memory databases/caches: Gora (general object from NoSQL), Memcached, Redis, LMDB (key value), Hazelcast, Ehcache, Infinispan, VoltDB,
H-Store
12) Object-relational mapping: Hibernate, OpenJPA, EclipseLink, DataNucleus, ODBC/JDBC
12) Extraction Tools: UIMA, Tika
11C) SQL(NewSQL): Oracle, DB2, SQL Server, SQLite, MySQL, PostgreSQL, CUBRID, Galera Cluster, SciDB, Rasdaman, Apache Derby, Pivotal
Greenplum, Google Cloud SQL, Azure SQL, Amazon RDS, Google F1, IBM dashDB, N1QL, BlinkDB, Spark SQL
11B) NoSQL: Lucene, Solr, Solandra, Voldemort, Riak, ZHT, Berkeley DB, Kyoto/Tokyo Cabinet, Tycoon, Tyrant, MongoDB, Espresso, CouchDB,
Couchbase, IBM Cloudant, Pivotal Gemfire, HBase, Google Bigtable, LevelDB, Megastore and Spanner, Accumulo, Cassandra, RYA, Sqrrl, Neo4J,
graphdb, Yarcdata, AllegroGraph, Blazegraph, Facebook Tao, Titan:db, Jena, Sesame
Public Cloud: Azure Table, Amazon Dynamo, Google DataStore
11A) File management: iRODS, NetCDF, CDF, HDF, OPeNDAP, FITS, RCFile, ORC, Parquet
10) Data Transport: BitTorrent, HTTP, FTP, SSH, Globus Online (GridFTP), Flume, Sqoop, Pivotal GPLOAD/GPFDIST
9) Cluster Resource Management: Mesos, Yarn, Helix, Llama, Google Omega, Facebook Corona, Celery, HTCondor, SGE, OpenPBS, Moab, Slurm,
Torque, Globus Tools, Pilot Jobs
8) File systems: HDFS, Swift, Haystack, f4, Cinder, Ceph, FUSE, Gluster, Lustre, GPFS, GFFS
Public Cloud: Amazon S3, Azure Blob, Google Cloud Storage
7) Interoperability: Libvirt, Libcloud, JClouds, TOSCA, OCCI, CDMI, Whirr, Saga, Genesis
6) DevOps: Docker (Machine, Swarm), Puppet, Chef, Ansible, SaltStack, Boto, Cobbler, Xcat, Razor, CloudMesh, Juju, Foreman, OpenStack Heat,
Sahara, Rocks, Cisco Intelligent Automation for Cloud, Ubuntu MaaS, Facebook Tupperware, AWS OpsWorks, OpenStack Ironic, Google Kubernetes,
Buildstep, Gitreceive, OpenTOSCA, Winery, CloudML, Blueprints, Terraform, DevOpSlang, Any2Api
5) IaaS Management from HPC to hypervisors: Xen, KVM, QEMU, Hyper-V, VirtualBox, OpenVZ, LXC, Linux-Vserver, OpenStack, OpenNebula,
Eucalyptus, Nimbus, CloudStack, CoreOS, rkt, VMware ESXi, vSphere and vCloud, Amazon, Azure, Google and other public Clouds
Networking: Google Cloud DNS, Amazon Route 53
21 layers
Over 350
Software
Packages
January
29
2016
12
What do we need in runtime for distributed HPC FaaS
• Finish examination of all the current tools
• Handle Events
• Handle State
• Handle Scheduling and Invocation of Function
• Define data-flow graph that needs to be analyzed
• Handle data flow execution graph with internal event-driven model
• Handle geographic distribution of Functions and Events
• Design dataflow collective and P2P communication model
• Decide which streaming approach to adopt and integrate
• Design in-memory dataset model for backup and exchange of data in data flow (fault
tolerance)
• Support DevOps and server-hidden cloud models
• Support elasticity for FaaS (connected to server-hidden)
13
Communication Primitives
• Big data systems do not
implement optimized
communications
• It is interesting to see no
AllReduce
implementations
• AllReduce has to be done
with Reduce + Broadcast
• No consideration of
RDMA except as add-on
14
Optimized Dataflow Communications
• Novel feature of our approach
• Optimize the dataflow graph to
facilitate different algorithms
• Example - Reduce
• Add subtasks and arrange them
according to an optimized
algorithm
• Trees, Pipelines
• Preserves the asynchronous
nature of dataflow
computation
Reduce communication as a
dataflow graph modification
15
Dataflow Graph State and Scheduling
• State is a key issue and handled differently in systems
• CORBA, AMT, MPI and Storm/Heron have long running tasks that preserve
state
• Spark and Flink preserve datasets across dataflow node
• All systems agree on coarse grain dataflow; only keep state in exchanged
data.
• Scheduling is one key area where dataflow systems differ
• Dynamic Scheduling
• Fine grain control of dataflow graph
• Graph cannot be optimized
• Static Scheduling
• Less control of the dataflow graph
• Graph can be optimized
16
Dataflow Graph Task Scheduling
17
Fault Tolerance
• Similar form of check-pointing mechanism is used in HPC and Big Data
• MPI, Flink, Spark
• Flink and Spark do better than MPI due to use of database technologies; MPI is a bit
harder due to richer state
• Checkpoint after each stage of the dataflow graph
• Natural synchronization point
• Generally allows user to choose when to checkpoint (not every stage)
• Executors (processes) don’t have external state, so can be considered as
coarse grained operations
18
Spark Kmeans Flink Streaming Dataflow
• P = loadPoints()
• C = loadInitCenters()
• for (int i = 0; i < 10; i++) {
• T = P.map().withBroadcast(C)
• C = T.reduce() }
19
Flink MDS Dataflow Graph
20
Heron Streaming Architecture
Inter node
Intranode
Typical Dataflow Processing Topology
Parallelism 2; 4 stages
Add HPC
Infiniband
Omnipath
System Management
• User Specified Dataflow
• All Tasks Long running
• No context shared apart from
dataflow
21
Naiad Timely Dataflow HLA Distributed Simulation
22
NiFi Workflow
23
Dataflow for a linear algebra kernel
Typical target of HPC AMT System
Danalis 2016 24
Dataflow Frameworks
• Every major big data framework is
designed according to dataflow
model
• Batch Systems
• Hadoop, Spark, Flink, Apex
• Streaming Systems
• Storm, Heron, Samza, Flink, Apex
• HPC AMT Systems
• Legion, Charm++, HPX-5, Dague, COMPs
• Design choices in dataflow
• Efficient in different application areas
25
HPC Runtime versus ABDS distributed Computing Model
on Data Analytics
Hadoop writes to disk and is slowest; Spark
and Flink spawn many processes and do
not support AllReduce directly;
MPI does in-place combined
reduce/broadcast and is fastest
Need Polymorphic Reduction capability
choosing best implementation
Use HPC architecture with
Mutable model
Immutable data
26
Illustration of In-Place AllReduce in MPI
Multidimensional Scaling
MDS execution time on 16 nodes
with 20 processes in each node with
varying number of points
MDS execution time with 32000
points on varying number of nodes.
Each node runs 20 parallel tasks
28
K-Means Clustering in Spark, Flink, MPI
Map (nearest
centroid
calculation)
Reduce (update
centroids)
Data Set
<Points>
Data Set <Initial
Centroids>
Data Set
<Updated
Centroids>
Broadcast
Dataflow for K-means
K-Means execution time on 16 nodes
with 20 parallel tasks in each node with
10 million points and varying number of
centroids. Each point has 100 attributes.
K-Means execution time on varying number
of nodes with 20 processes in each node
with 10 million points and 16000 centroids.
Each point has 100 attributes.
Heron High Performance Interconnects
• Infiniband & Intel Omni-Path
integrations
• Using Libfabric as a library
• Natively integrated to Heron through
Stream Manager without needing to
go through JNI
30
Summary of HPC Cloud – Next Generation Grid
• We suggest an event driven computing model built around Cloud and HPC
and spanning batch, streaming, batch and edge applications
• Expand current technology of FaaS (Function as a Service) and server-
hidden computing
• We have integrated HPC into many Apache systems with HPC-ABDS
• We have analyzed the different runtimes of Hadoop, Spark, Flink, Storm,
Heron, Naiad, DARMA (HPC Asynchronous Many Task)
• There are different technologies for different circumstances but can be unified by
high level abstractions such as communication collectives
• Need to be careful about treatment of state – more research needed
31

Weitere ähnliche Inhalte

Was ist angesagt?

HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
Geoffrey Fox
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Geoffrey Fox
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
Saliya Ekanayake
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
Geoffrey Fox
 
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Geoffrey Fox
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
Geoffrey Fox
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack
Geoffrey Fox
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
Geoffrey Fox
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
Marco Quartulli
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
Robert Grossman
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
nitesh saxena
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
Marco Quartulli
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
csandit
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
Robert Grossman
 
Scientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on Summit
Ganesan Narayanasamy
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Robert Grossman
 
Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013
Vijay Srinivas Agneeswaran, Ph.D
 

Was ist angesagt? (18)

HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
 
Big Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other thingsBig Data HPC Convergence and a bunch of other things
Big Data HPC Convergence and a bunch of other things
 
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
 
Classification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different FacetsClassification of Big Data Use Cases by different Facets
Classification of Big Data Use Cases by different Facets
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 
51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
 
Scientific Application Development and Early results on Summit
Scientific Application Development and Early results on SummitScientific Application Development and Early results on Summit
Scientific Application Development and Early results on Summit
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
 
Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013Big data analytics_7_giants_public_24_sep_2013
Big data analytics_7_giants_public_24_sep_2013
 

Ähnlich wie Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes for an HPC Enhanced Cloud and Fog Spanning IoT Big Data and Big Simulations

Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
Geoffrey Fox
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
inside-BigData.com
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overview
Martin Zapletal
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
Aamir Ameen
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
Sathish24111
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School
Adam Doyle
 
Dibbs spidal april6-2016
Dibbs spidal april6-2016Dibbs spidal april6-2016
Dibbs spidal april6-2016
Gregor von Laszewski
 
PEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCPEARC 17: Spark On the ARC
PEARC 17: Spark On the ARC
Himanshu Bedi
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
DataWorks Summit/Hadoop Summit
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
DataWorks Summit/Hadoop Summit
 
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale SystemsDesigning HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
inside-BigData.com
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
Paco Nathan
 
Hadoop and OpenStack
Hadoop and OpenStackHadoop and OpenStack
Hadoop and OpenStack
DataWorks Summit
 
Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014
spinningmatt
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systems
inside-BigData.com
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
Geoffrey Fox
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
Mark Kromer
 
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data ScienceDesigning High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
Object Automation
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
Cloudera, Inc.
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
OpenStack
 

Ähnlich wie Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes for an HPC Enhanced Cloud and Fog Spanning IoT Big Data and Big Simulations (20)

Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overview
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School
 
Dibbs spidal april6-2016
Dibbs spidal april6-2016Dibbs spidal april6-2016
Dibbs spidal april6-2016
 
PEARC 17: Spark On the ARC
PEARC 17: Spark On the ARCPEARC 17: Spark On the ARC
PEARC 17: Spark On the ARC
 
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
Accelerating Apache Hadoop through High-Performance Networking and I/O Techno...
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale SystemsDesigning HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
Hadoop and OpenStack
Hadoop and OpenStackHadoop and OpenStack
Hadoop and OpenStack
 
Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systems
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data ScienceDesigning High-Performance and Scalable Middleware for HPC, AI and Data Science
Designing High-Performance and Scalable Middleware for HPC, AI and Data Science
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
 

Mehr von Geoffrey Fox

AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
Geoffrey Fox
 
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Geoffrey Fox
 
Data Science and Online Education
Data Science and Online EducationData Science and Online Education
Data Science and Online Education
Geoffrey Fox
 
Lessons from Data Science Program at Indiana University: Curriculum, Students...
Lessons from Data Science Program at Indiana University: Curriculum, Students...Lessons from Data Science Program at Indiana University: Curriculum, Students...
Lessons from Data Science Program at Indiana University: Curriculum, Students...
Geoffrey Fox
 
Data Science Curriculum at Indiana University
Data Science Curriculum at Indiana UniversityData Science Curriculum at Indiana University
Data Science Curriculum at Indiana University
Geoffrey Fox
 
Experience with Online Teaching with Open Source MOOC Technology
Experience with Online Teaching with Open Source MOOC TechnologyExperience with Online Teaching with Open Source MOOC Technology
Experience with Online Teaching with Open Source MOOC Technology
Geoffrey Fox
 
Big Data and Clouds: Research and Education
Big Data and Clouds: Research and EducationBig Data and Clouds: Research and Education
Big Data and Clouds: Research and Education
Geoffrey Fox
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Geoffrey Fox
 
Remarks on MOOC's
Remarks on MOOC'sRemarks on MOOC's
Remarks on MOOC's
Geoffrey Fox
 
FutureGrid Computing Testbed as a Service
 FutureGrid Computing Testbed as a Service FutureGrid Computing Testbed as a Service
FutureGrid Computing Testbed as a Service
Geoffrey Fox
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Geoffrey Fox
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
Geoffrey Fox
 
Linking Programming models between Grids, Web 2.0 and Multicore
Linking Programming models between Grids, Web 2.0 and Multicore Linking Programming models between Grids, Web 2.0 and Multicore
Linking Programming models between Grids, Web 2.0 and Multicore
Geoffrey Fox
 
CTS Conference Web 2.0 Tutorial Part 2
CTS Conference Web 2.0 Tutorial Part 2CTS Conference Web 2.0 Tutorial Part 2
CTS Conference Web 2.0 Tutorial Part 2
Geoffrey Fox
 
CTS Conference Web 2.0 Tutorial Part 1
CTS Conference Web 2.0 Tutorial Part 1CTS Conference Web 2.0 Tutorial Part 1
CTS Conference Web 2.0 Tutorial Part 1
Geoffrey Fox
 

Mehr von Geoffrey Fox (15)

AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
AI-Driven Science and Engineering with the Global AI and Modeling Supercomput...
 
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
Spidal Java: High Performance Data Analytics with Java on Large Multicore HPC...
 
Data Science and Online Education
Data Science and Online EducationData Science and Online Education
Data Science and Online Education
 
Lessons from Data Science Program at Indiana University: Curriculum, Students...
Lessons from Data Science Program at Indiana University: Curriculum, Students...Lessons from Data Science Program at Indiana University: Curriculum, Students...
Lessons from Data Science Program at Indiana University: Curriculum, Students...
 
Data Science Curriculum at Indiana University
Data Science Curriculum at Indiana UniversityData Science Curriculum at Indiana University
Data Science Curriculum at Indiana University
 
Experience with Online Teaching with Open Source MOOC Technology
Experience with Online Teaching with Open Source MOOC TechnologyExperience with Online Teaching with Open Source MOOC Technology
Experience with Online Teaching with Open Source MOOC Technology
 
Big Data and Clouds: Research and Education
Big Data and Clouds: Research and EducationBig Data and Clouds: Research and Education
Big Data and Clouds: Research and Education
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
Remarks on MOOC's
Remarks on MOOC'sRemarks on MOOC's
Remarks on MOOC's
 
FutureGrid Computing Testbed as a Service
 FutureGrid Computing Testbed as a Service FutureGrid Computing Testbed as a Service
FutureGrid Computing Testbed as a Service
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
 
Linking Programming models between Grids, Web 2.0 and Multicore
Linking Programming models between Grids, Web 2.0 and Multicore Linking Programming models between Grids, Web 2.0 and Multicore
Linking Programming models between Grids, Web 2.0 and Multicore
 
CTS Conference Web 2.0 Tutorial Part 2
CTS Conference Web 2.0 Tutorial Part 2CTS Conference Web 2.0 Tutorial Part 2
CTS Conference Web 2.0 Tutorial Part 2
 
CTS Conference Web 2.0 Tutorial Part 1
CTS Conference Web 2.0 Tutorial Part 1CTS Conference Web 2.0 Tutorial Part 1
CTS Conference Web 2.0 Tutorial Part 1
 

Kürzlich hochgeladen

LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
Anant Corporation
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
People as resource Grade IX.pdf minimala
People as resource Grade IX.pdf minimalaPeople as resource Grade IX.pdf minimala
People as resource Grade IX.pdf minimala
riddhimaagrawal986
 
Data Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptxData Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptx
ramrag33
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
LAXMAREDDY22
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
RamonNovais6
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
integral complex analysis chapter 06 .pdf
integral complex analysis chapter 06 .pdfintegral complex analysis chapter 06 .pdf
integral complex analysis chapter 06 .pdf
gaafergoudaay7aga
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
SakkaravarthiShanmug
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
Gino153088
 
AI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptxAI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptx
architagupta876
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
gowrishankartb2005
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
ecqow
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
TaghreedAltamimi
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
artificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptxartificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptx
GauravCar
 

Kürzlich hochgeladen (20)

LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by AnantLLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
LLM Fine Tuning with QLoRA Cassandra Lunch 4, presented by Anant
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
People as resource Grade IX.pdf minimala
People as resource Grade IX.pdf minimalaPeople as resource Grade IX.pdf minimala
People as resource Grade IX.pdf minimala
 
Data Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptxData Control Language.pptx Data Control Language.pptx
Data Control Language.pptx Data Control Language.pptx
 
BRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdfBRAIN TUMOR DETECTION for seminar ppt.pdf
BRAIN TUMOR DETECTION for seminar ppt.pdf
 
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURSCompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
CompEx~Manual~1210 (2).pdf COMPEX GAS AND VAPOURS
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
integral complex analysis chapter 06 .pdf
integral complex analysis chapter 06 .pdfintegral complex analysis chapter 06 .pdf
integral complex analysis chapter 06 .pdf
 
cnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classicationcnn.pptx Convolutional neural network used for image classication
cnn.pptx Convolutional neural network used for image classication
 
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
4. Mosca vol I -Fisica-Tipler-5ta-Edicion-Vol-1.pdf
 
AI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptxAI assisted telemedicine KIOSK for Rural India.pptx
AI assisted telemedicine KIOSK for Rural India.pptx
 
Material for memory and display system h
Material for memory and display system hMaterial for memory and display system h
Material for memory and display system h
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
一比一原版(CalArts毕业证)加利福尼亚艺术学院毕业证如何办理
 
Software Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.pptSoftware Quality Assurance-se412-v11.ppt
Software Quality Assurance-se412-v11.ppt
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
artificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptxartificial intelligence and data science contents.pptx
artificial intelligence and data science contents.pptx
 

Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes for an HPC Enhanced Cloud and Fog Spanning IoT Big Data and Big Simulations

  • 1. Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes for an HPC Enhanced Cloud and Fog Spanning IoT Big Data and Big Simulations ` Geoffrey Fox, Supun Kamburugamuve, Judy Qiu, Shantenu Jha June 28, 2017 IEEE Cloud 2017 Honolulu Hawaii gcf@indiana.edu http://www.dsc.soic.indiana.edu/, http://spidal.org/ Department of Intelligent Systems Engineering School of Informatics and Computing, Digital Science Center Indiana University Bloomington 1
  • 2. “Next Generation Grid – HPC Cloud” Problem Statement • Design a dataflow event-driven FaaS (microservice) framework running across application and geographic domains. • Build on Cloud best practice but use HPC wherever possible and useful to get high performance • Smoothly support current paradigms Hadoop, Spark, Flink, Heron, MPI, DARMA … • Use interoperable common abstractions but multiple polymorphic implementations. • i.e. do not require a single runtime • Focus on Runtime but this implicitly suggests programming and execution model • This next generation Grid based on data and edge devices – not computing as in old Grid 2
  • 3. • Data gaining in importance compared to simulations • Data analysis techniques changing with old and new applications • All forms of IT increasing in importance; both data and simulations increasing • Internet of Things and Edge Computing growing in importance • Exascale initiative driving large supercomputers • Use of public clouds increasing rapidly • Clouds becoming diverse with subsystems containing GPU’s, FPGA’s, high performance networks, storage, memory … • They have economies of scale; hard to compete with • Serverless computing attractive to user: “No server is easier to manage than no server” Important Trends I 3
  • 4. • Rich software stacks: • HPC for Parallel Computing • Apache for Big Data including some edge computing (streaming data) • On general principles parallel and distributed computing has different requirements even if sometimes similar functionalities • Apache stack typically uses distributed computing concepts • For example, Reduce operation is different in MPI (Harp) and Spark • Important to put grain size into analysis • Its easier to make dataflow efficient if grain size large • Streaming Data ubiquitous including data from edge • Edge computing has some time-sensitive applications • Choosing a good restaurant can wait seconds • Avoiding collisions must be finished in milliseconds Important Trends II 4
  • 5. • Classic Supercomputers will continue for large simulations and may run other applications but these codes will be developed on • Next-Generation Commodity Systems which are dominant force • Merge Cloud HPC and Edge computing • Clouds running in multiple giant datacenters offering all types of computing • Distributed data sources associated with device and Fog processing resources • Server-hidden computing for user pleasure • Support a distributed event driven dataflow computing model covering batch and streaming data • Needing parallel and distributed (Grid) computing ideas Predictions/Assumptions 5
  • 6. Motivation Summary • Explosion of Internet of Things and Cloud Computing • Clouds will continue to grow and will include more use cases • Edge Computing is adding an additional dimension to Cloud Computing • Device --- Fog ---Cloud • Event driven computing is becoming dominant • Signal generated by a Sensor is an edge event • Accessing a HPC linear algebra function could be event driven and replace traditional libraries by FaaS (as NetSolve GridSolve Neos did in old Grid) • Services will be packaged as a powerful Function as a Service FaaS • Serverless must be important: users not interested in low level details of IaaS or even PaaS? • Applications will span from Edge to Multiple Clouds 6
  • 8. • Unit of Processing is an Event driven Function • Can have state that may need to be preserved in place (Iterative MapReduce) • Can be hierarchical as in invoking a parallel job • Functions can be single or 1 of 100,000 maps in large parallel code • Processing units run in clouds, fogs or devices but these all have similar architecture • Fog (e.g. car) looks like a cloud to a device (radar sensor) while public cloud looks like a cloud to the fog (car) • Use polymorphic runtime that uses different implementations depending on environment e.g. on fault-tolerance – latency (performance) tradeoffs • Data locality (minimize explicit dataflow) properly supported as in HPF alignment commands (specify which data and computing needs to be kept together) Proposed Approach I 8
  • 9. • Analyze the runtime of existing systems • Hadoop, Spark, Flink, Naiad Big Data Processing • Storm, Heron Streaming Dataflow • Kepler, Pegasus, NiFi workflow • Harp Map-Collective, MPI and HPC AMT runtime like DARMA • And approaches such as GridFTP and CORBA/HLA (!) for wide area data links • Propose polymorphic unification (given function can have different implementations) • Choose powerful scheduler (Mesos?) • Support processing locality/alignment including MPI’s never move model with grain size consideration • One should integrate HPC and Clouds Proposed Approach II 9
  • 11. • Google likes to show a timeline; we can build on (Apache version of) this • 2002 Google File System GFS ~HDFS • 2004 MapReduce Apache Hadoop • 2006 Big Table Apache Hbase • 2008 Dremel Apache Drill • 2009 Pregel Apache Giraph • 2010 FlumeJava Apache Crunch • 2010 Colossus better GFS • 2012 Spanner horizontally scalable NewSQL database ~CockroachDB • 2013 F1 horizontally scalable SQL database • 2013 MillWheel ~Apache Storm, Twitter Heron (Google not first!) • 2015 Cloud Dataflow Apache Beam with Spark or Flink (dataflow) engine • Functionalities not identified: Security, Data Transfer, Scheduling, DevOps, serverless computing (assume OpenWhisk will improve to handle robustly lots of large functions) Components of Big Data Stack 11
  • 12. HPC-ABDS Integrated wide range of HPC and Big Data technologies. I gave up updating! Kaleidoscope of (Apache) Big Data Stack (ABDS) and HPC Technologies Cross- Cutting Functions 1) Message and Data Protocols: Avro, Thrift, Protobuf 2) Distributed Coordination : Google Chubby, Zookeeper, Giraffe, JGroups 3) Security & Privacy: InCommon, Eduroam OpenStack Keystone, LDAP, Sentry, Sqrrl, OpenID, SAML OAuth 4) Monitoring: Ambari, Ganglia, Nagios, Inca 17) Workflow-Orchestration: ODE, ActiveBPEL, Airavata, Pegasus, Kepler, Swift, Taverna, Triana, Trident, BioKepler, Galaxy, IPython, Dryad, Naiad, Oozie, Tez, Google FlumeJava, Crunch, Cascading, Scalding, e-Science Central, Azure Data Factory, Google Cloud Dataflow, NiFi (NSA), Jitterbit, Talend, Pentaho, Apatar, Docker Compose, KeystoneML 16) Application and Analytics: Mahout , MLlib , MLbase, DataFu, R, pbdR, Bioconductor, ImageJ, OpenCV, Scalapack, PetSc, PLASMA MAGMA, Azure Machine Learning, Google Prediction API & Translation API, mlpy, scikit-learn, PyBrain, CompLearn, DAAL(Intel), Caffe, Torch, Theano, DL4j, H2O, IBM Watson, Oracle PGX, GraphLab, GraphX, IBM System G, GraphBuilder(Intel), TinkerPop, Parasol, Dream:Lab, Google Fusion Tables, CINET, NWB, Elasticsearch, Kibana, Logstash, Graylog, Splunk, Tableau, D3.js, three.js, Potree, DC.js, TensorFlow, CNTK 15B) Application Hosting Frameworks: Google App Engine, AppScale, Red Hat OpenShift, Heroku, Aerobatic, AWS Elastic Beanstalk, Azure, Cloud Foundry, Pivotal, IBM BlueMix, Ninefold, Jelastic, Stackato, appfog, CloudBees, Engine Yard, CloudControl, dotCloud, Dokku, OSGi, HUBzero, OODT, Agave, Atmosphere 15A) High level Programming: Kite, Hive, HCatalog, Tajo, Shark, Phoenix, Impala, MRQL, SAP HANA, HadoopDB, PolyBase, Pivotal HD/Hawq, Presto, Google Dremel, Google BigQuery, Amazon Redshift, Drill, Kyoto Cabinet, Pig, Sawzall, Google Cloud DataFlow, Summingbird 14B) Streams: Storm, S4, Samza, Granules, Neptune, Google MillWheel, Amazon Kinesis, LinkedIn, Twitter Heron, Databus, Facebook Puma/Ptail/Scribe/ODS, Azure Stream Analytics, Floe, Spark Streaming, Flink Streaming, DataTurbine 14A) Basic Programming model and runtime, SPMD, MapReduce: Hadoop, Spark, Twister, MR-MPI, Stratosphere (Apache Flink), Reef, Disco, Hama, Giraph, Pregel, Pegasus, Ligra, GraphChi, Galois, Medusa-GPU, MapGraph, Totem 13) Inter process communication Collectives, point-to-point, publish-subscribe: MPI, HPX-5, Argo BEAST HPX-5 BEAST PULSAR, Harp, Netty, ZeroMQ, ActiveMQ, RabbitMQ, NaradaBrokering, QPid, Kafka, Kestrel, JMS, AMQP, Stomp, MQTT, Marionette Collective, Public Cloud: Amazon SNS, Lambda, Google Pub Sub, Azure Queues, Event Hubs 12) In-memory databases/caches: Gora (general object from NoSQL), Memcached, Redis, LMDB (key value), Hazelcast, Ehcache, Infinispan, VoltDB, H-Store 12) Object-relational mapping: Hibernate, OpenJPA, EclipseLink, DataNucleus, ODBC/JDBC 12) Extraction Tools: UIMA, Tika 11C) SQL(NewSQL): Oracle, DB2, SQL Server, SQLite, MySQL, PostgreSQL, CUBRID, Galera Cluster, SciDB, Rasdaman, Apache Derby, Pivotal Greenplum, Google Cloud SQL, Azure SQL, Amazon RDS, Google F1, IBM dashDB, N1QL, BlinkDB, Spark SQL 11B) NoSQL: Lucene, Solr, Solandra, Voldemort, Riak, ZHT, Berkeley DB, Kyoto/Tokyo Cabinet, Tycoon, Tyrant, MongoDB, Espresso, CouchDB, Couchbase, IBM Cloudant, Pivotal Gemfire, HBase, Google Bigtable, LevelDB, Megastore and Spanner, Accumulo, Cassandra, RYA, Sqrrl, Neo4J, graphdb, Yarcdata, AllegroGraph, Blazegraph, Facebook Tao, Titan:db, Jena, Sesame Public Cloud: Azure Table, Amazon Dynamo, Google DataStore 11A) File management: iRODS, NetCDF, CDF, HDF, OPeNDAP, FITS, RCFile, ORC, Parquet 10) Data Transport: BitTorrent, HTTP, FTP, SSH, Globus Online (GridFTP), Flume, Sqoop, Pivotal GPLOAD/GPFDIST 9) Cluster Resource Management: Mesos, Yarn, Helix, Llama, Google Omega, Facebook Corona, Celery, HTCondor, SGE, OpenPBS, Moab, Slurm, Torque, Globus Tools, Pilot Jobs 8) File systems: HDFS, Swift, Haystack, f4, Cinder, Ceph, FUSE, Gluster, Lustre, GPFS, GFFS Public Cloud: Amazon S3, Azure Blob, Google Cloud Storage 7) Interoperability: Libvirt, Libcloud, JClouds, TOSCA, OCCI, CDMI, Whirr, Saga, Genesis 6) DevOps: Docker (Machine, Swarm), Puppet, Chef, Ansible, SaltStack, Boto, Cobbler, Xcat, Razor, CloudMesh, Juju, Foreman, OpenStack Heat, Sahara, Rocks, Cisco Intelligent Automation for Cloud, Ubuntu MaaS, Facebook Tupperware, AWS OpsWorks, OpenStack Ironic, Google Kubernetes, Buildstep, Gitreceive, OpenTOSCA, Winery, CloudML, Blueprints, Terraform, DevOpSlang, Any2Api 5) IaaS Management from HPC to hypervisors: Xen, KVM, QEMU, Hyper-V, VirtualBox, OpenVZ, LXC, Linux-Vserver, OpenStack, OpenNebula, Eucalyptus, Nimbus, CloudStack, CoreOS, rkt, VMware ESXi, vSphere and vCloud, Amazon, Azure, Google and other public Clouds Networking: Google Cloud DNS, Amazon Route 53 21 layers Over 350 Software Packages January 29 2016 12
  • 13. What do we need in runtime for distributed HPC FaaS • Finish examination of all the current tools • Handle Events • Handle State • Handle Scheduling and Invocation of Function • Define data-flow graph that needs to be analyzed • Handle data flow execution graph with internal event-driven model • Handle geographic distribution of Functions and Events • Design dataflow collective and P2P communication model • Decide which streaming approach to adopt and integrate • Design in-memory dataset model for backup and exchange of data in data flow (fault tolerance) • Support DevOps and server-hidden cloud models • Support elasticity for FaaS (connected to server-hidden) 13
  • 14. Communication Primitives • Big data systems do not implement optimized communications • It is interesting to see no AllReduce implementations • AllReduce has to be done with Reduce + Broadcast • No consideration of RDMA except as add-on 14
  • 15. Optimized Dataflow Communications • Novel feature of our approach • Optimize the dataflow graph to facilitate different algorithms • Example - Reduce • Add subtasks and arrange them according to an optimized algorithm • Trees, Pipelines • Preserves the asynchronous nature of dataflow computation Reduce communication as a dataflow graph modification 15
  • 16. Dataflow Graph State and Scheduling • State is a key issue and handled differently in systems • CORBA, AMT, MPI and Storm/Heron have long running tasks that preserve state • Spark and Flink preserve datasets across dataflow node • All systems agree on coarse grain dataflow; only keep state in exchanged data. • Scheduling is one key area where dataflow systems differ • Dynamic Scheduling • Fine grain control of dataflow graph • Graph cannot be optimized • Static Scheduling • Less control of the dataflow graph • Graph can be optimized 16
  • 17. Dataflow Graph Task Scheduling 17
  • 18. Fault Tolerance • Similar form of check-pointing mechanism is used in HPC and Big Data • MPI, Flink, Spark • Flink and Spark do better than MPI due to use of database technologies; MPI is a bit harder due to richer state • Checkpoint after each stage of the dataflow graph • Natural synchronization point • Generally allows user to choose when to checkpoint (not every stage) • Executors (processes) don’t have external state, so can be considered as coarse grained operations 18
  • 19. Spark Kmeans Flink Streaming Dataflow • P = loadPoints() • C = loadInitCenters() • for (int i = 0; i < 10; i++) { • T = P.map().withBroadcast(C) • C = T.reduce() } 19
  • 20. Flink MDS Dataflow Graph 20
  • 21. Heron Streaming Architecture Inter node Intranode Typical Dataflow Processing Topology Parallelism 2; 4 stages Add HPC Infiniband Omnipath System Management • User Specified Dataflow • All Tasks Long running • No context shared apart from dataflow 21
  • 22. Naiad Timely Dataflow HLA Distributed Simulation 22
  • 24. Dataflow for a linear algebra kernel Typical target of HPC AMT System Danalis 2016 24
  • 25. Dataflow Frameworks • Every major big data framework is designed according to dataflow model • Batch Systems • Hadoop, Spark, Flink, Apex • Streaming Systems • Storm, Heron, Samza, Flink, Apex • HPC AMT Systems • Legion, Charm++, HPX-5, Dague, COMPs • Design choices in dataflow • Efficient in different application areas 25
  • 26. HPC Runtime versus ABDS distributed Computing Model on Data Analytics Hadoop writes to disk and is slowest; Spark and Flink spawn many processes and do not support AllReduce directly; MPI does in-place combined reduce/broadcast and is fastest Need Polymorphic Reduction capability choosing best implementation Use HPC architecture with Mutable model Immutable data 26
  • 27. Illustration of In-Place AllReduce in MPI
  • 28. Multidimensional Scaling MDS execution time on 16 nodes with 20 processes in each node with varying number of points MDS execution time with 32000 points on varying number of nodes. Each node runs 20 parallel tasks 28
  • 29. K-Means Clustering in Spark, Flink, MPI Map (nearest centroid calculation) Reduce (update centroids) Data Set <Points> Data Set <Initial Centroids> Data Set <Updated Centroids> Broadcast Dataflow for K-means K-Means execution time on 16 nodes with 20 parallel tasks in each node with 10 million points and varying number of centroids. Each point has 100 attributes. K-Means execution time on varying number of nodes with 20 processes in each node with 10 million points and 16000 centroids. Each point has 100 attributes.
  • 30. Heron High Performance Interconnects • Infiniband & Intel Omni-Path integrations • Using Libfabric as a library • Natively integrated to Heron through Stream Manager without needing to go through JNI 30
  • 31. Summary of HPC Cloud – Next Generation Grid • We suggest an event driven computing model built around Cloud and HPC and spanning batch, streaming, batch and edge applications • Expand current technology of FaaS (Function as a Service) and server- hidden computing • We have integrated HPC into many Apache systems with HPC-ABDS • We have analyzed the different runtimes of Hadoop, Spark, Flink, Storm, Heron, Naiad, DARMA (HPC Asynchronous Many Task) • There are different technologies for different circumstances but can be unified by high level abstractions such as communication collectives • Need to be careful about treatment of state – more research needed 31

Hinweis der Redaktion

  1. Note the differences in communication architectures Note times are in log scale Bars indicate compute only times, which is similar across these frameworks Overhead is dominated by communications in Flink and Spark