SlideShare ist ein Scribd-Unternehmen logo
1 von 75
Downloaden Sie, um offline zu lesen
Introduction
Amir H. Payberah
amir@sics.se
Amirkabir University of Technology
(Tehran Polytechnic)
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 1 / 48
Course Information
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 2 / 48
Course Objective
Introduction to main concepts and principles of cloud computing
and data intensive computing.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 3 / 48
Course Objective
Introduction to main concepts and principles of cloud computing
and data intensive computing.
How to read, review and present a scientific paper.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 3 / 48
Topics of Study
Topics we will cover include:
• Cloud platforms
• Cloud storage, NoSQL and NewSQL databases
• Cloud resource management
• Batch processing frameworks
• Stream processing frameworks
• Graph processing frameworks
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 4 / 48
Course Material
Mainly based on research papers.
You will find all the material on the course web page:
http://www.sics.se/∼amir/cloud14.htm
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 5 / 48
Course Examination
Mid term exam: 20%
Final exam: 20%
Reading assignments: 27%
Final presentation: 23%
Final project: 10%
Lab assignments: 0% :)
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 6 / 48
Reading Assignments
Nine reading assignments.
You should write a review for each paper (at most two pages).
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 48
Reading Assignments
Nine reading assignments.
You should write a review for each paper (at most two pages).
For each paper you should:
• Identify and motivate the problem.
• Pinpoint the main contributions.
• Identify positive/negative aspects of the solution/paper.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 48
Reading Assignments
Nine reading assignments.
You should write a review for each paper (at most two pages).
For each paper you should:
• Identify and motivate the problem.
• Pinpoint the main contributions.
• Identify positive/negative aspects of the solution/paper.
For each paper, you might be given some questions to answer.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 48
Reading Assignments
Nine reading assignments.
You should write a review for each paper (at most two pages).
For each paper you should:
• Identify and motivate the problem.
• Pinpoint the main contributions.
• Identify positive/negative aspects of the solution/paper.
For each paper, you might be given some questions to answer.
Students will work in groups of two.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 48
Presentation
Each group give a 30 minutes talk on a scientific paper.
The list of papers will be available in the course web page.
You are also free to choose any other paper, but it should be con-
firmed.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 8 / 48
Lab Assignments and the Project
Implement simple applications on different frameworks through the
course.
The solution of each lab assignment will be uploaded on the course
page, one week after their start dates.
The final project on top of Spark.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 9 / 48
Discussion Forum
Use the course discussion forum if you have any questions:
http://www.sics.se/∼amir/cloud14.htm
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 10 / 48
Course Overview
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 11 / 48
Data is not information, information is not knowledge, knowledge is
not understanding, understanding is not wisdom.
- Clifford Stoll
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 12 / 48
Big Data
small data big data
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 13 / 48
Big Data refers to datasets and flows large
enough that has outpaced our capability to
store, process, analyze, and understand.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 14 / 48
The Four Dimensions of Big Data
Volume: data size
Velocity: data generation rate
Variety: data heterogeneity
This 4th V is for Vacillation:
Veracity/Variability/Value
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 15 / 48
Where Does
Big Data Come From?
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 16 / 48
Big Data Market Driving Factors
The number of web pages indexed by Google, which were around
one million in 1998, have exceeded one trillion in 2008, and its
expansion is accelerated by appearance of the social networks.∗
[∗Wei Fan et al., Mining big data: current status, and forecast to the future, 2013]
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 17 / 48
Big Data Market Driving Factors
The amount of mobile data traffic is expected to grow to 10.8
Exabyte per month by 2016.∗
[∗Dan Vesset et al., Worldwide Big Data Technology and Services 2012-2015 Forecast, 2013]
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 18 / 48
Big Data Market Driving Factors
More than 65 billion devices were connected to the Internet by
2010, and this number will go up to 230 billion by 2020.∗
[∗John Mahoney et al., The Internet of Things Is Coming, 2013]
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 19 / 48
Big Data Market Driving Factors
Open source communities
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 20 / 48
Big Data Market Driving Factors
Many companies are moving towards using Cloud services to
access Big Data analytical tools.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 21 / 48
History of Data
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 22 / 48
4000 B.C
Manual recording
From tablets to papyrus, to parchment, and then to paper
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 23 / 48
1450
Gutenberg’s printing press
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 24 / 48
1800’s - 1940’s
Punched cards (no fault-tolerance)
Binary data
1890: US census
1911: IBM appeared
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 25 / 48
1940’s - 1950’s
Magnetic tapes
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 26 / 48
1950’s - 1960’s
Large-scale mainframe computers
Batch transaction processing
File-oriented record processing model (e.g., COBOL)
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 27 / 48
1960’s - 1970’s
Hierarchical DBMS (one-to-many)
Network DBMS (many-to-many)
VM OS by IBM → multiple VMs on a single physical node.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 28 / 48
1970’s - 1980’s
Relational DBMS (tables) and SQL
ACID
Client-server computing
Parallel processing
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 29 / 48
1990’s - 2000’s
Virtualized Private Network connections (VPN)
The Internet...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 30 / 48
2000’s - Now
Cloud computing
NoSQL: BASE instead of ACID
Big Data
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 31 / 48
Cloud and Big Data
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 32 / 48
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 33 / 48
Big Data Analytics Stack
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 34 / 48
Hadoop Big Data Analytics Stack
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 35 / 48
Spark Big Data Analytics Stack
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 36 / 48
Big Data - File systems
Traditional file-systems are not well-designed for large-scale data
processing systems.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 37 / 48
Big Data - File systems
Traditional file-systems are not well-designed for large-scale data
processing systems.
Efficiency has a higher priority than other features, e.g., directory
service.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 37 / 48
Big Data - File systems
Traditional file-systems are not well-designed for large-scale data
processing systems.
Efficiency has a higher priority than other features, e.g., directory
service.
Massive size of data tends to store it across multiple machines in a
distributed way.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 37 / 48
Big Data - File systems
Traditional file-systems are not well-designed for large-scale data
processing systems.
Efficiency has a higher priority than other features, e.g., directory
service.
Massive size of data tends to store it across multiple machines in a
distributed way.
HDFS/GFS, Amazon S3, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 37 / 48
Big Data - Database
Relational Databases Management Systems (RDMS) were not de-
signed to be distributed.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 38 / 48
Big Data - Database
Relational Databases Management Systems (RDMS) were not de-
signed to be distributed.
NoSQL databases relax one or more of the ACID properties: BASE
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 38 / 48
Big Data - Database
Relational Databases Management Systems (RDMS) were not de-
signed to be distributed.
NoSQL databases relax one or more of the ACID properties: BASE
Different data models: key/value, column-family, graph, document.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 38 / 48
Big Data - Database
Relational Databases Management Systems (RDMS) were not de-
signed to be distributed.
NoSQL databases relax one or more of the ACID properties: BASE
Different data models: key/value, column-family, graph, document.
Hbase/BigTable, Dynamo, Scalaris, Cassandra, MongoDB, Volde-
mort, Riak, Neo4J, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 38 / 48
Big Data - Resource Management
Different frameworks require different computing resources.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 39 / 48
Big Data - Resource Management
Different frameworks require different computing resources.
Large organizations need the ability to share data and resources
between multiple frameworks.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 39 / 48
Big Data - Resource Management
Different frameworks require different computing resources.
Large organizations need the ability to share data and resources
between multiple frameworks.
Resource management share resources in a cluster between multiple
frameworks while providing resource isolation.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 39 / 48
Big Data - Resource Management
Different frameworks require different computing resources.
Large organizations need the ability to share data and resources
between multiple frameworks.
Resource management share resources in a cluster between multiple
frameworks while providing resource isolation.
Mesos, YARN, Quincy, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 39 / 48
Big Data - Execution Engine
Scalable and fault tolerance parallel data processing on clusters of
unreliable machines.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 40 / 48
Big Data - Execution Engine
Scalable and fault tolerance parallel data processing on clusters of
unreliable machines.
Data-parallel programming model for clusters of commodity ma-
chines.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 40 / 48
Big Data - Execution Engine
Scalable and fault tolerance parallel data processing on clusters of
unreliable machines.
Data-parallel programming model for clusters of commodity ma-
chines.
MapReduce, Spark, Stratosphere, Dryad, Hyracks, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 40 / 48
Big Data - Query/Scripting Language
Low-level programming of execution engines, e.g., MapReduce, is
not easy for end users.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 41 / 48
Big Data - Query/Scripting Language
Low-level programming of execution engines, e.g., MapReduce, is
not easy for end users.
Need high-level language to improve the query capabilities of exe-
cution engines.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 41 / 48
Big Data - Query/Scripting Language
Low-level programming of execution engines, e.g., MapReduce, is
not easy for end users.
Need high-level language to improve the query capabilities of exe-
cution engines.
It translates user-defined functions to low-level API of the execution
engines.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 41 / 48
Big Data - Query/Scripting Language
Low-level programming of execution engines, e.g., MapReduce, is
not easy for end users.
Need high-level language to improve the query capabilities of exe-
cution engines.
It translates user-defined functions to low-level API of the execution
engines.
Pig, Hive, Shark, Meteor, DryadLINQ, SCOPE, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 41 / 48
Big Data - Stream Processing
Providing users with fresh and low latency results.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 42 / 48
Big Data - Stream Processing
Providing users with fresh and low latency results.
Database Management Systems (DBMS) vs. Data Stream Man-
agement Systems (DSMS)
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 42 / 48
Big Data - Stream Processing
Providing users with fresh and low latency results.
Database Management Systems (DBMS) vs. Data Stream Man-
agement Systems (DSMS)
Storm, S4, SEEP, D-Stream, Naiad, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 42 / 48
Big Data - Graph Processing
Many problems are expressed using graphs: sparse computational
dependencies, and multiple iterations to converge.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 43 / 48
Big Data - Graph Processing
Many problems are expressed using graphs: sparse computational
dependencies, and multiple iterations to converge.
Data-parallel frameworks, such as MapReduce, are not ideal for
these problems: slow
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 43 / 48
Big Data - Graph Processing
Many problems are expressed using graphs: sparse computational
dependencies, and multiple iterations to converge.
Data-parallel frameworks, such as MapReduce, are not ideal for
these problems: slow
Graph processing frameworks are optimized for graph-based prob-
lems.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 43 / 48
Big Data - Graph Processing
Many problems are expressed using graphs: sparse computational
dependencies, and multiple iterations to converge.
Data-parallel frameworks, such as MapReduce, are not ideal for
these problems: slow
Graph processing frameworks are optimized for graph-based prob-
lems.
Pregel, Giraph, GraphX, GraphLab, PowerGraph, GraphChi, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 43 / 48
Big Data - Machine Learning
Implementing and consuming machine learning techniques at scale
are difficult tasks for developers and end users.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 44 / 48
Big Data - Machine Learning
Implementing and consuming machine learning techniques at scale
are difficult tasks for developers and end users.
There exist platforms that address it by providing scalable machine-
learning and data mining libraries.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 44 / 48
Big Data - Machine Learning
Implementing and consuming machine learning techniques at scale
are difficult tasks for developers and end users.
There exist platforms that address it by providing scalable machine-
learning and data mining libraries.
Mahout, MLBase, SystemML, Ricardo, Presto, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 44 / 48
Big Data - Configuration and Synchronization Service
A means to synchronize distributed applications accesses to shared
resources.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 45 / 48
Big Data - Configuration and Synchronization Service
A means to synchronize distributed applications accesses to shared
resources.
Allows distributed processes to coordinate with each other.
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 45 / 48
Big Data - Configuration and Synchronization Service
A means to synchronize distributed applications accesses to shared
resources.
Allows distributed processes to coordinate with each other.
Zookeeper, Chubby, ...
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 45 / 48
Summary
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 46 / 48
Summary
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 47 / 48
Questions?
Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 48 / 48

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesDataWorks Summit
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Databricks
 
ETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureMark Kromer
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...Edureka!
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark InternalsPietro Michiardi
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Edureka!
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storySunil Govindan
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataAsis Mohanty
 
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
OVERVIEW  OF FACEBOOK SCALABLE ARCHITECTURE.OVERVIEW  OF FACEBOOK SCALABLE ARCHITECTURE.
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.Rishikese MR
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Kai Wähner
 
MegaStore and Spanner
MegaStore and SpannerMegaStore and Spanner
MegaStore and SpannerAmir Payberah
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Takrim Ul Islam Laskar
 

Was ist angesagt? (20)

Spark etl
Spark etlSpark etl
Spark etl
 
Kudu Deep-Dive
Kudu Deep-DiveKudu Deep-Dive
Kudu Deep-Dive
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
ETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft Azure
 
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Tra...
 
Apache Spark Notes
Apache Spark NotesApache Spark Notes
Apache Spark Notes
 
Apache Spark & Streaming
Apache Spark & StreamingApache Spark & Streaming
Apache Spark & Streaming
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
 
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Netezza vs Teradata vs Exadata
Netezza vs Teradata vs ExadataNetezza vs Teradata vs Exadata
Netezza vs Teradata vs Exadata
 
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
OVERVIEW  OF FACEBOOK SCALABLE ARCHITECTURE.OVERVIEW  OF FACEBOOK SCALABLE ARCHITECTURE.
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Hadoop
Hadoop Hadoop
Hadoop
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
 
MegaStore and Spanner
MegaStore and SpannerMegaStore and Spanner
MegaStore and Spanner
 
Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)Introduction to Apache Hive(Big Data, Final Seminar)
Introduction to Apache Hive(Big Data, Final Seminar)
 

Andere mochten auch

Big Data in the Cloud
Big Data in the CloudBig Data in the Cloud
Big Data in the CloudNati Shalom
 
Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2Amir Payberah
 
Process Management - Part3
Process Management - Part3Process Management - Part3
Process Management - Part3Amir Payberah
 
Process Management - Part1
Process Management - Part1Process Management - Part1
Process Management - Part1Amir Payberah
 
Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3Amir Payberah
 
Process Management - Part2
Process Management - Part2Process Management - Part2
Process Management - Part2Amir Payberah
 
Process Synchronization - Part2
Process Synchronization -  Part2Process Synchronization -  Part2
Process Synchronization - Part2Amir Payberah
 
Process Synchronization - Part1
Process Synchronization -  Part1Process Synchronization -  Part1
Process Synchronization - Part1Amir Payberah
 
Introduction to Operating Systems - Part1
Introduction to Operating Systems - Part1Introduction to Operating Systems - Part1
Introduction to Operating Systems - Part1Amir Payberah
 
P2P Content Distribution Network
P2P Content Distribution NetworkP2P Content Distribution Network
P2P Content Distribution NetworkAmir Payberah
 
Relationship between cloud computing and big data
Relationship between cloud computing and big dataRelationship between cloud computing and big data
Relationship between cloud computing and big dataJazan University
 
Qatar Digital Library Project Workshop
Qatar Digital Library Project WorkshopQatar Digital Library Project Workshop
Qatar Digital Library Project WorkshopAsad Nafees
 
Issues on Big Data & Cloud Computing
Issues on Big Data & Cloud Computing Issues on Big Data & Cloud Computing
Issues on Big Data & Cloud Computing Seungyun Lee
 
Project NGX - Proposal
Project NGX - ProposalProject NGX - Proposal
Project NGX - ProposalMatthew Chang
 
197x 20090704 Scalaで並行プログラミング
197x 20090704 Scalaで並行プログラミング197x 20090704 Scalaで並行プログラミング
197x 20090704 Scalaで並行プログラミングNet Penguin
 

Andere mochten auch (20)

Big Data in the Cloud
Big Data in the CloudBig Data in the Cloud
Big Data in the Cloud
 
Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2
 
Process Management - Part3
Process Management - Part3Process Management - Part3
Process Management - Part3
 
Process Management - Part1
Process Management - Part1Process Management - Part1
Process Management - Part1
 
Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3
 
Threads
ThreadsThreads
Threads
 
Process Management - Part2
Process Management - Part2Process Management - Part2
Process Management - Part2
 
Process Synchronization - Part2
Process Synchronization -  Part2Process Synchronization -  Part2
Process Synchronization - Part2
 
Process Synchronization - Part1
Process Synchronization -  Part1Process Synchronization -  Part1
Process Synchronization - Part1
 
Introduction to Operating Systems - Part1
Introduction to Operating Systems - Part1Introduction to Operating Systems - Part1
Introduction to Operating Systems - Part1
 
P2P Content Distribution Network
P2P Content Distribution NetworkP2P Content Distribution Network
P2P Content Distribution Network
 
Relationship between cloud computing and big data
Relationship between cloud computing and big dataRelationship between cloud computing and big data
Relationship between cloud computing and big data
 
Main Memory - Part1
Main Memory - Part1Main Memory - Part1
Main Memory - Part1
 
Qatar Digital Library Project Workshop
Qatar Digital Library Project WorkshopQatar Digital Library Project Workshop
Qatar Digital Library Project Workshop
 
big data and cloud computing
big data and cloud computingbig data and cloud computing
big data and cloud computing
 
Introduction to Amazon EC2
Introduction to Amazon EC2Introduction to Amazon EC2
Introduction to Amazon EC2
 
Issues on Big Data & Cloud Computing
Issues on Big Data & Cloud Computing Issues on Big Data & Cloud Computing
Issues on Big Data & Cloud Computing
 
Introduction to Amazon EC2
Introduction to Amazon EC2Introduction to Amazon EC2
Introduction to Amazon EC2
 
Project NGX - Proposal
Project NGX - ProposalProject NGX - Proposal
Project NGX - Proposal
 
197x 20090704 Scalaで並行プログラミング
197x 20090704 Scalaで並行プログラミング197x 20090704 Scalaで並行プログラミング
197x 20090704 Scalaで並行プログラミング
 

Ähnlich wie Introduction to cloud computing and big data - part1

Introduction to stream processing
Introduction to stream processingIntroduction to stream processing
Introduction to stream processingAmir Payberah
 
Dynamo and NoSQL Databases
Dynamo and NoSQL DatabasesDynamo and NoSQL Databases
Dynamo and NoSQL DatabasesAmir Payberah
 
Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Amir Payberah
 
The Ultimate Guide to C2090 558 informix 11.70 fundamentals
The Ultimate Guide to C2090 558 informix 11.70 fundamentalsThe Ultimate Guide to C2090 558 informix 11.70 fundamentals
The Ultimate Guide to C2090 558 informix 11.70 fundamentalsSoniaSrivastva
 
Worldranking universities final documentation
Worldranking universities final documentationWorldranking universities final documentation
Worldranking universities final documentationBhadra Gowdra
 
Current Trends and Challenges in Big Data Benchmarking
Current Trends and Challenges in Big Data BenchmarkingCurrent Trends and Challenges in Big Data Benchmarking
Current Trends and Challenges in Big Data BenchmarkingeXascale Infolab
 
The Ultimate Guide to C2090 552 ibm info sphere optim for distributed systems...
The Ultimate Guide to C2090 552 ibm info sphere optim for distributed systems...The Ultimate Guide to C2090 552 ibm info sphere optim for distributed systems...
The Ultimate Guide to C2090 552 ibm info sphere optim for distributed systems...SoniaSrivastva
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
Jarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course OutlineJarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course OutlineMustafa Jarrar
 
Symmetry 13-00195-v2
Symmetry 13-00195-v2Symmetry 13-00195-v2
Symmetry 13-00195-v2AdamsOdanji
 
Computer Architecture Formulas1. CPU time = Instru.docx
Computer Architecture Formulas1. CPU time = Instru.docxComputer Architecture Formulas1. CPU time = Instru.docx
Computer Architecture Formulas1. CPU time = Instru.docxzollyjenkins
 
Big Machine Learning Libraries & Open Challenges
Big Machine Learning Libraries & Open ChallengesBig Machine Learning Libraries & Open Challenges
Big Machine Learning Libraries & Open ChallengesPetr Novotný
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectStuart Chalk
 
Principles of Distributed Database Systems.pdf
Principles of Distributed Database Systems.pdfPrinciples of Distributed Database Systems.pdf
Principles of Distributed Database Systems.pdfTahiraAnjum2
 
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scaleCarolyn Duby
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) SkillsOscar Corcho
 

Ähnlich wie Introduction to cloud computing and big data - part1 (20)

Introduction to stream processing
Introduction to stream processingIntroduction to stream processing
Introduction to stream processing
 
Bigtable
BigtableBigtable
Bigtable
 
Dynamo and NoSQL Databases
Dynamo and NoSQL DatabasesDynamo and NoSQL Databases
Dynamo and NoSQL Databases
 
Chubby
ChubbyChubby
Chubby
 
Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2
 
The Ultimate Guide to C2090 558 informix 11.70 fundamentals
The Ultimate Guide to C2090 558 informix 11.70 fundamentalsThe Ultimate Guide to C2090 558 informix 11.70 fundamentals
The Ultimate Guide to C2090 558 informix 11.70 fundamentals
 
Worldranking universities final documentation
Worldranking universities final documentationWorldranking universities final documentation
Worldranking universities final documentation
 
Current Trends and Challenges in Big Data Benchmarking
Current Trends and Challenges in Big Data BenchmarkingCurrent Trends and Challenges in Big Data Benchmarking
Current Trends and Challenges in Big Data Benchmarking
 
The Ultimate Guide to C2090 552 ibm info sphere optim for distributed systems...
The Ultimate Guide to C2090 552 ibm info sphere optim for distributed systems...The Ultimate Guide to C2090 552 ibm info sphere optim for distributed systems...
The Ultimate Guide to C2090 552 ibm info sphere optim for distributed systems...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
Jarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course OutlineJarrar: Knowledge Engineering- Course Outline
Jarrar: Knowledge Engineering- Course Outline
 
Symmetry 13-00195-v2
Symmetry 13-00195-v2Symmetry 13-00195-v2
Symmetry 13-00195-v2
 
Computer Architecture Formulas1. CPU time = Instru.docx
Computer Architecture Formulas1. CPU time = Instru.docxComputer Architecture Formulas1. CPU time = Instru.docx
Computer Architecture Formulas1. CPU time = Instru.docx
 
Big Machine Learning Libraries & Open Challenges
Big Machine Learning Libraries & Open ChallengesBig Machine Learning Libraries & Open Challenges
Big Machine Learning Libraries & Open Challenges
 
Printing without printers
Printing without printersPrinting without printers
Printing without printers
 
Building a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP ProjectBuilding a Standard for Standards: The ChAMP Project
Building a Standard for Standards: The ChAMP Project
 
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
CLIM Program: Remote Sensing Workshop, High Performance Computing and Spatial...
 
Principles of Distributed Database Systems.pdf
Principles of Distributed Database Systems.pdfPrinciples of Distributed Database Systems.pdf
Principles of Distributed Database Systems.pdf
 
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scale
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 

Mehr von Amir Payberah

Data Intensive Computing Frameworks
Data Intensive Computing FrameworksData Intensive Computing Frameworks
Data Intensive Computing FrameworksAmir Payberah
 
The Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformThe Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformAmir Payberah
 
The Spark Big Data Analytics Platform
The Spark Big Data Analytics PlatformThe Spark Big Data Analytics Platform
The Spark Big Data Analytics PlatformAmir Payberah
 
Linux Module Programming
Linux Module ProgrammingLinux Module Programming
Linux Module ProgrammingAmir Payberah
 
File System Implementation - Part2
File System Implementation - Part2File System Implementation - Part2
File System Implementation - Part2Amir Payberah
 
File System Implementation - Part1
File System Implementation - Part1File System Implementation - Part1
File System Implementation - Part1Amir Payberah
 
File System Interface
File System InterfaceFile System Interface
File System InterfaceAmir Payberah
 
Virtual Memory - Part2
Virtual Memory - Part2Virtual Memory - Part2
Virtual Memory - Part2Amir Payberah
 
Virtual Memory - Part1
Virtual Memory - Part1Virtual Memory - Part1
Virtual Memory - Part1Amir Payberah
 
CPU Scheduling - Part2
CPU Scheduling - Part2CPU Scheduling - Part2
CPU Scheduling - Part2Amir Payberah
 
CPU Scheduling - Part1
CPU Scheduling - Part1CPU Scheduling - Part1
CPU Scheduling - Part1Amir Payberah
 

Mehr von Amir Payberah (19)

Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Data Intensive Computing Frameworks
Data Intensive Computing FrameworksData Intensive Computing Frameworks
Data Intensive Computing Frameworks
 
The Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformThe Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics Platform
 
The Spark Big Data Analytics Platform
The Spark Big Data Analytics PlatformThe Spark Big Data Analytics Platform
The Spark Big Data Analytics Platform
 
Security
SecuritySecurity
Security
 
Protection
ProtectionProtection
Protection
 
Linux Module Programming
Linux Module ProgrammingLinux Module Programming
Linux Module Programming
 
IO Systems
IO SystemsIO Systems
IO Systems
 
File System Implementation - Part2
File System Implementation - Part2File System Implementation - Part2
File System Implementation - Part2
 
File System Implementation - Part1
File System Implementation - Part1File System Implementation - Part1
File System Implementation - Part1
 
File System Interface
File System InterfaceFile System Interface
File System Interface
 
Storage
StorageStorage
Storage
 
Virtual Memory - Part2
Virtual Memory - Part2Virtual Memory - Part2
Virtual Memory - Part2
 
Virtual Memory - Part1
Virtual Memory - Part1Virtual Memory - Part1
Virtual Memory - Part1
 
Main Memory - Part2
Main Memory - Part2Main Memory - Part2
Main Memory - Part2
 
Deadlocks
DeadlocksDeadlocks
Deadlocks
 
CPU Scheduling - Part2
CPU Scheduling - Part2CPU Scheduling - Part2
CPU Scheduling - Part2
 
CPU Scheduling - Part1
CPU Scheduling - Part1CPU Scheduling - Part1
CPU Scheduling - Part1
 
Mesos and YARN
Mesos and YARNMesos and YARN
Mesos and YARN
 

Kürzlich hochgeladen

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Introduction to cloud computing and big data - part1

  • 1. Introduction Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 1 / 48
  • 2. Course Information Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 2 / 48
  • 3. Course Objective Introduction to main concepts and principles of cloud computing and data intensive computing. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 3 / 48
  • 4. Course Objective Introduction to main concepts and principles of cloud computing and data intensive computing. How to read, review and present a scientific paper. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 3 / 48
  • 5. Topics of Study Topics we will cover include: • Cloud platforms • Cloud storage, NoSQL and NewSQL databases • Cloud resource management • Batch processing frameworks • Stream processing frameworks • Graph processing frameworks Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 4 / 48
  • 6. Course Material Mainly based on research papers. You will find all the material on the course web page: http://www.sics.se/∼amir/cloud14.htm Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 5 / 48
  • 7. Course Examination Mid term exam: 20% Final exam: 20% Reading assignments: 27% Final presentation: 23% Final project: 10% Lab assignments: 0% :) Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 6 / 48
  • 8. Reading Assignments Nine reading assignments. You should write a review for each paper (at most two pages). Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 48
  • 9. Reading Assignments Nine reading assignments. You should write a review for each paper (at most two pages). For each paper you should: • Identify and motivate the problem. • Pinpoint the main contributions. • Identify positive/negative aspects of the solution/paper. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 48
  • 10. Reading Assignments Nine reading assignments. You should write a review for each paper (at most two pages). For each paper you should: • Identify and motivate the problem. • Pinpoint the main contributions. • Identify positive/negative aspects of the solution/paper. For each paper, you might be given some questions to answer. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 48
  • 11. Reading Assignments Nine reading assignments. You should write a review for each paper (at most two pages). For each paper you should: • Identify and motivate the problem. • Pinpoint the main contributions. • Identify positive/negative aspects of the solution/paper. For each paper, you might be given some questions to answer. Students will work in groups of two. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 7 / 48
  • 12. Presentation Each group give a 30 minutes talk on a scientific paper. The list of papers will be available in the course web page. You are also free to choose any other paper, but it should be con- firmed. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 8 / 48
  • 13. Lab Assignments and the Project Implement simple applications on different frameworks through the course. The solution of each lab assignment will be uploaded on the course page, one week after their start dates. The final project on top of Spark. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 9 / 48
  • 14. Discussion Forum Use the course discussion forum if you have any questions: http://www.sics.se/∼amir/cloud14.htm Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 10 / 48
  • 15. Course Overview Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 11 / 48
  • 16. Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom. - Clifford Stoll Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 12 / 48
  • 17. Big Data small data big data Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 13 / 48
  • 18. Big Data refers to datasets and flows large enough that has outpaced our capability to store, process, analyze, and understand. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 14 / 48
  • 19. The Four Dimensions of Big Data Volume: data size Velocity: data generation rate Variety: data heterogeneity This 4th V is for Vacillation: Veracity/Variability/Value Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 15 / 48
  • 20. Where Does Big Data Come From? Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 16 / 48
  • 21. Big Data Market Driving Factors The number of web pages indexed by Google, which were around one million in 1998, have exceeded one trillion in 2008, and its expansion is accelerated by appearance of the social networks.∗ [∗Wei Fan et al., Mining big data: current status, and forecast to the future, 2013] Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 17 / 48
  • 22. Big Data Market Driving Factors The amount of mobile data traffic is expected to grow to 10.8 Exabyte per month by 2016.∗ [∗Dan Vesset et al., Worldwide Big Data Technology and Services 2012-2015 Forecast, 2013] Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 18 / 48
  • 23. Big Data Market Driving Factors More than 65 billion devices were connected to the Internet by 2010, and this number will go up to 230 billion by 2020.∗ [∗John Mahoney et al., The Internet of Things Is Coming, 2013] Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 19 / 48
  • 24. Big Data Market Driving Factors Open source communities Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 20 / 48
  • 25. Big Data Market Driving Factors Many companies are moving towards using Cloud services to access Big Data analytical tools. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 21 / 48
  • 26. History of Data Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 22 / 48
  • 27. 4000 B.C Manual recording From tablets to papyrus, to parchment, and then to paper Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 23 / 48
  • 28. 1450 Gutenberg’s printing press Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 24 / 48
  • 29. 1800’s - 1940’s Punched cards (no fault-tolerance) Binary data 1890: US census 1911: IBM appeared Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 25 / 48
  • 30. 1940’s - 1950’s Magnetic tapes Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 26 / 48
  • 31. 1950’s - 1960’s Large-scale mainframe computers Batch transaction processing File-oriented record processing model (e.g., COBOL) Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 27 / 48
  • 32. 1960’s - 1970’s Hierarchical DBMS (one-to-many) Network DBMS (many-to-many) VM OS by IBM → multiple VMs on a single physical node. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 28 / 48
  • 33. 1970’s - 1980’s Relational DBMS (tables) and SQL ACID Client-server computing Parallel processing Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 29 / 48
  • 34. 1990’s - 2000’s Virtualized Private Network connections (VPN) The Internet... Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 30 / 48
  • 35. 2000’s - Now Cloud computing NoSQL: BASE instead of ACID Big Data Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 31 / 48
  • 36. Cloud and Big Data Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 32 / 48
  • 37. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 33 / 48
  • 38. Big Data Analytics Stack Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 34 / 48
  • 39. Hadoop Big Data Analytics Stack Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 35 / 48
  • 40. Spark Big Data Analytics Stack Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 36 / 48
  • 41. Big Data - File systems Traditional file-systems are not well-designed for large-scale data processing systems. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 37 / 48
  • 42. Big Data - File systems Traditional file-systems are not well-designed for large-scale data processing systems. Efficiency has a higher priority than other features, e.g., directory service. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 37 / 48
  • 43. Big Data - File systems Traditional file-systems are not well-designed for large-scale data processing systems. Efficiency has a higher priority than other features, e.g., directory service. Massive size of data tends to store it across multiple machines in a distributed way. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 37 / 48
  • 44. Big Data - File systems Traditional file-systems are not well-designed for large-scale data processing systems. Efficiency has a higher priority than other features, e.g., directory service. Massive size of data tends to store it across multiple machines in a distributed way. HDFS/GFS, Amazon S3, ... Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 37 / 48
  • 45. Big Data - Database Relational Databases Management Systems (RDMS) were not de- signed to be distributed. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 38 / 48
  • 46. Big Data - Database Relational Databases Management Systems (RDMS) were not de- signed to be distributed. NoSQL databases relax one or more of the ACID properties: BASE Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 38 / 48
  • 47. Big Data - Database Relational Databases Management Systems (RDMS) were not de- signed to be distributed. NoSQL databases relax one or more of the ACID properties: BASE Different data models: key/value, column-family, graph, document. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 38 / 48
  • 48. Big Data - Database Relational Databases Management Systems (RDMS) were not de- signed to be distributed. NoSQL databases relax one or more of the ACID properties: BASE Different data models: key/value, column-family, graph, document. Hbase/BigTable, Dynamo, Scalaris, Cassandra, MongoDB, Volde- mort, Riak, Neo4J, ... Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 38 / 48
  • 49. Big Data - Resource Management Different frameworks require different computing resources. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 39 / 48
  • 50. Big Data - Resource Management Different frameworks require different computing resources. Large organizations need the ability to share data and resources between multiple frameworks. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 39 / 48
  • 51. Big Data - Resource Management Different frameworks require different computing resources. Large organizations need the ability to share data and resources between multiple frameworks. Resource management share resources in a cluster between multiple frameworks while providing resource isolation. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 39 / 48
  • 52. Big Data - Resource Management Different frameworks require different computing resources. Large organizations need the ability to share data and resources between multiple frameworks. Resource management share resources in a cluster between multiple frameworks while providing resource isolation. Mesos, YARN, Quincy, ... Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 39 / 48
  • 53. Big Data - Execution Engine Scalable and fault tolerance parallel data processing on clusters of unreliable machines. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 40 / 48
  • 54. Big Data - Execution Engine Scalable and fault tolerance parallel data processing on clusters of unreliable machines. Data-parallel programming model for clusters of commodity ma- chines. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 40 / 48
  • 55. Big Data - Execution Engine Scalable and fault tolerance parallel data processing on clusters of unreliable machines. Data-parallel programming model for clusters of commodity ma- chines. MapReduce, Spark, Stratosphere, Dryad, Hyracks, ... Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 40 / 48
  • 56. Big Data - Query/Scripting Language Low-level programming of execution engines, e.g., MapReduce, is not easy for end users. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 41 / 48
  • 57. Big Data - Query/Scripting Language Low-level programming of execution engines, e.g., MapReduce, is not easy for end users. Need high-level language to improve the query capabilities of exe- cution engines. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 41 / 48
  • 58. Big Data - Query/Scripting Language Low-level programming of execution engines, e.g., MapReduce, is not easy for end users. Need high-level language to improve the query capabilities of exe- cution engines. It translates user-defined functions to low-level API of the execution engines. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 41 / 48
  • 59. Big Data - Query/Scripting Language Low-level programming of execution engines, e.g., MapReduce, is not easy for end users. Need high-level language to improve the query capabilities of exe- cution engines. It translates user-defined functions to low-level API of the execution engines. Pig, Hive, Shark, Meteor, DryadLINQ, SCOPE, ... Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 41 / 48
  • 60. Big Data - Stream Processing Providing users with fresh and low latency results. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 42 / 48
  • 61. Big Data - Stream Processing Providing users with fresh and low latency results. Database Management Systems (DBMS) vs. Data Stream Man- agement Systems (DSMS) Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 42 / 48
  • 62. Big Data - Stream Processing Providing users with fresh and low latency results. Database Management Systems (DBMS) vs. Data Stream Man- agement Systems (DSMS) Storm, S4, SEEP, D-Stream, Naiad, ... Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 42 / 48
  • 63. Big Data - Graph Processing Many problems are expressed using graphs: sparse computational dependencies, and multiple iterations to converge. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 43 / 48
  • 64. Big Data - Graph Processing Many problems are expressed using graphs: sparse computational dependencies, and multiple iterations to converge. Data-parallel frameworks, such as MapReduce, are not ideal for these problems: slow Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 43 / 48
  • 65. Big Data - Graph Processing Many problems are expressed using graphs: sparse computational dependencies, and multiple iterations to converge. Data-parallel frameworks, such as MapReduce, are not ideal for these problems: slow Graph processing frameworks are optimized for graph-based prob- lems. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 43 / 48
  • 66. Big Data - Graph Processing Many problems are expressed using graphs: sparse computational dependencies, and multiple iterations to converge. Data-parallel frameworks, such as MapReduce, are not ideal for these problems: slow Graph processing frameworks are optimized for graph-based prob- lems. Pregel, Giraph, GraphX, GraphLab, PowerGraph, GraphChi, ... Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 43 / 48
  • 67. Big Data - Machine Learning Implementing and consuming machine learning techniques at scale are difficult tasks for developers and end users. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 44 / 48
  • 68. Big Data - Machine Learning Implementing and consuming machine learning techniques at scale are difficult tasks for developers and end users. There exist platforms that address it by providing scalable machine- learning and data mining libraries. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 44 / 48
  • 69. Big Data - Machine Learning Implementing and consuming machine learning techniques at scale are difficult tasks for developers and end users. There exist platforms that address it by providing scalable machine- learning and data mining libraries. Mahout, MLBase, SystemML, Ricardo, Presto, ... Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 44 / 48
  • 70. Big Data - Configuration and Synchronization Service A means to synchronize distributed applications accesses to shared resources. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 45 / 48
  • 71. Big Data - Configuration and Synchronization Service A means to synchronize distributed applications accesses to shared resources. Allows distributed processes to coordinate with each other. Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 45 / 48
  • 72. Big Data - Configuration and Synchronization Service A means to synchronize distributed applications accesses to shared resources. Allows distributed processes to coordinate with each other. Zookeeper, Chubby, ... Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 45 / 48
  • 73. Summary Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 46 / 48
  • 74. Summary Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 47 / 48
  • 75. Questions? Amir H. Payberah (Tehran Polytechnic) Introduction 1393/6/22 48 / 48