SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Introduction to Data Science
Frank Kienle
High level introduction to Data Bases
Big Data Landscape
06.09.17 Frank Kienle p. 2
Overview of data sources
•  http://www.knuggets.com/datasets/index.html
Machine learning data
•  UCI Machine Learning Repository: archive.ics.uci.edu
Data Shop: the world’s largest repository of learning interaction data
•  https://pslcdatashop.web.cmu.edu
Getting Data is not the problem
- Very large flavor of Data Sources
06.09.17 Frank Kienle 3
•  Formally, a "database" refers to a set of related data and the way it is organized.
•  A database manages data efficiently and allows users to perform multiple tasks
with ease. The efcient access to the data is usually provided by a "database
management system" (DBMS)
•  A database management system stores, organizes and manages a large amount
of information within a single software application.
•  Use of this system increases efficiency of business operations and reduces
overall costs.
•  Different database systems exist which are designed with respect to:
•  the data to be stored in the database
•  the relationships between the different data elements. Dependencies within the data which can
be modeled by mathematical relations
•  the logical structure upon the data on the basis of these relationships. The goal is to arrange
the data into a logical structure which can then be mapped into the storage objects
Database
06.09.17 Frank Kienle p. 4
Databases overview
06.09.17 Frank Kienle 5
Scale up: using more and more main memory
Scale out: using more and more computers
Denition (m complexity order):
Scalability for N data items an algorithms scales with Nm.
E.g polynomial complexity
Parallelize it (k nodes): The algorithm scales with Nm/k
Goal nd algorithms with complexity: N log(N) which relates e.g. with trees (one
touch)
Scalability in big data
06.09.17 6Frank Kienle
CAP theorem
06.09.17 Frank Kienle p. 7
C: consistency
(do all applications see all the same data)
Any data written to the database must be valid
According to all defined rules
A: availability
(can I interact with the system
In the presence of failures)
P: partitioning
If two sections of your system cannot talk to each
Other, can they make forward progress on their own
-  If not you sacrifice availability
-  If so, you might have to sacrifice consistency
Dynamo
Riak
Voldemort
Cassandra
CouchDB
Bigtable
Hbase
Hypertable
Megastore
Spanner
Accumulo
RDBMS
Relational Data Bases
Relational data bases key idea:
§  storage and retrieval of large quantities of related data.
§  When creating a database you should think about which tables needed and
what relationships exist between the data in your tables.
§  Relational algebra,
§  Physical/logical data independence
Think about the design in advance
Relational Data Bases
06.09.17 Frank Kienle p. 9
A database is created for the storage and retrieval of data.
we want to be able to INSERT data into the database and we want to be able to
SELECT data from the database.
A database query language was invented for these tasks called the Structured
Query Language,
Structured query language (SQL)
06.09.17 Frank Kienle p. 10
When you can do JOIN’s its good for analytics
When a data base does not provide joins the work is it is all up for the users
(Leave the work on the client side)
Fundamental of data exploring (joins)
06.09.17 Frank Kienle p. 11
Outer Relational Join (on time stamp)
06.09.17 Frank Kienle p. 12
Time	stamp	[s]	 Value	room	
[Wa2]	
1	 30	
2	 25	
5	 12	
Time	stamp	[s]	 Value	Home	
[Wa2]	
1	 100	
2	 78	
3	 99	
4	 70	
Time	stamp	[s]	 Value	Room	
[Wa2|	
Value	Home	
[Wa2]	
1	 30	 100	
2	 25	 78	
3	 NaN	 99	
4	 NaN	 70	
5	 12	 NaN
Left Join (on time stamp)
06.09.17 Frank Kienle p. 13
Time	stamp	[s]	 Value	room	
[Wa2]	
1	 30	
2	 25	
5	 12	
Time	stamp	[s]	 Value	Home	
[Wa2]	
1	 100	
2	 78	
3	 99	
4	 70	
Time	stamp	[s]	 Value	Room	
[Wa2|	
Value	Home	
[Wa2]	
1	 30	 100	
2	 25	 78	
5	 12	 NaN
Storing data efciently is all about the application
schema less vs. schema
writing centric vs. reading centric
transactional vs. analytics
batch vs. stream
Key-Value object
•  A set of key-value pairs
Extensible record (XML or JSON)
•  Families of attributes have a schema
•  New attributes may be added
•  Many predictive analytics tasks will require a kind of record
•  Many REST APIs will deliver JSON, (YAML, XML) structures
•  Example: tweeter feeds
Key Value stores (Document store might be a subset)
•  No schema, no exposed nesting
•  often raw data (scalable to peta bytes)
•  on top simple analytics tasks
Different data structure
06.09.17 Frank Kienle p. 15
45777
Ux_78
321-87
Frank Kienle, Germany
Please learn
Random data
key value
JSON Example
06.09.17 Frank Kienle p. 16
Example JSON Twitter feed
06.09.17 Frank Kienle p. 17
The ability to replicate and partition data over many serves
•  Sharding: horizontal partitioning of the data set
No query language: a simple API dened
Ability to scale operations over many serves
•  Throughput increase
•  Due to missing (language) query layer each operation has to design towards the API
Operations have often restrictions to data locality
New features can be added dynamically to data records (no xed schema)
Consistency model often weak (no modeling of transaction)
(typical) NoSQL data base features
06.09.17 Frank Kienle p. 18
In-memory database
•  primarily relies on main memory for computer data storage
•  main purpose is faster analytics on data
•  relational or unstructured data structure
•  memory optimized data structures
Main memory database system (MMDB)
06.09.17 Frank Kienle p. 19
Advantage Column-oriented:
•  Reading efficiency: more efficient when an aggregate needs to be computed over
many rows but only for a notably smaller subset of all columns of data
select col_1,col_2 from table where col_2>5 and col_2<45;
•  Writing efficiency: more efficient when new values of a column are supplied for
all rows at once
Advantage row-oriented:
•  Reading efficiency: more efficient when many columns of a single row are
required at the same time, and when row-size is relatively small
•  Writing efficiency: more efficient when writing a new row if all of the row data is
supplied at the same time, as the entire row can be written with a single disk
seek.
Row vs. Column data stores
06.09.17 Frank Kienle p. 20
Processing types
06.09.17 Frank Kienle p. 21
OLTP: On-line Transaction Processing
e.g. Business transactions
(insert, update, delete)
OLAP: On-line Analytical Processing
e.g. complex analytics
(aggregating of historical data)
for data analytics a column oriented
in-memory data base is a must have
06.09.17 Frank Kienle p. 22
Spanner Idea: Planet scale data base system
….we believe it is better to have application programmers deal with performance
problems due to overuse of transactions as bottlenecks arise, rather than always coding
around the lack of transactions …
Loose consistency for predictive analytics is horrible
Loose consistency is a no go for prescriptive analytics (dynamic pricing)
Systems should always be designed for usability
Many trends in data bases are going back to data
consistency
06.09.17 Frank Kienle p. 23

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Casesboorad
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataJoey Li
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAmpoolIO
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataVipin Batra
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKristof Jozsa
 
Big data storage
Big data storageBig data storage
Big data storageVikram Nandini
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTAmrit Chhetri
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data EcosystemLucian Neghina
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introductionFrans van Noort
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsTyrone Systems
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionSteve Loughran
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsTokyo University of Science
 
Big data landscape
Big data landscapeBig data landscape
Big data landscapeNatalino Busa
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview pptVIKAS KATARE
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsNguyen Cao
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big DataLewis Crawford
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!Khalid Salama
 

Was ist angesagt? (20)

Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data storage
Big data storageBig data storage
Big data storage
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 edition
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
 
Big data landscape
Big data landscapeBig data landscape
Big data landscape
 
big data overview ppt
big data overview pptbig data overview ppt
big data overview ppt
 
Bigdata " new level"
Bigdata " new level"Bigdata " new level"
Bigdata " new level"
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
 
Big Tools for Big Data
Big Tools for Big DataBig Tools for Big Data
Big Tools for Big Data
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 

Ähnlich wie Data Bases - Introduction to data science

Report 2.0.docx
Report 2.0.docxReport 2.0.docx
Report 2.0.docxpinstechwork
 
Analysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho benchAnalysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho benchStevenChike
 
MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018Dave Stokes
 
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoMySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoDave Stokes
 
Report 1.0.docx
Report 1.0.docxReport 1.0.docx
Report 1.0.docxpinstechwork
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...IJCERT JOURNAL
 
No SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability MeetupNo SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability MeetupHyderabad Scalability Meetup
 
Assignment_4
Assignment_4Assignment_4
Assignment_4Kirti J
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptxRithikRaj25
 
Cassandra Essentials Day Cambridge
Cassandra Essentials Day CambridgeCassandra Essentials Day Cambridge
Cassandra Essentials Day CambridgeMarc Fielding
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology LandscapeShivanandaVSeeri
 
6269441.ppt
6269441.ppt6269441.ppt
6269441.pptSwapna Jk
 
Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodellingmeghu123
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
Database Technologies
Database TechnologiesDatabase Technologies
Database TechnologiesMichel de Goede
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0Tuan Luong
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesEditor Jacotech
 

Ähnlich wie Data Bases - Introduction to data science (20)

NOSQL
NOSQLNOSQL
NOSQL
 
Report 2.0.docx
Report 2.0.docxReport 2.0.docx
Report 2.0.docx
 
Analysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho benchAnalysis and evaluation of riak kv cluster environment using basho bench
Analysis and evaluation of riak kv cluster environment using basho bench
 
MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018
 
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoMySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
 
Report 1.0.docx
Report 1.0.docxReport 1.0.docx
Report 1.0.docx
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
 
No SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability MeetupNo SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability Meetup
 
Assignment_4
Assignment_4Assignment_4
Assignment_4
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
Cassandra Essentials Day Cambridge
Cassandra Essentials Day CambridgeCassandra Essentials Day Cambridge
Cassandra Essentials Day Cambridge
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
6269441.ppt
6269441.ppt6269441.ppt
6269441.ppt
 
Database Management & Models
Database Management & ModelsDatabase Management & Models
Database Management & Models
 
Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodelling
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Database Technologies
Database TechnologiesDatabase Technologies
Database Technologies
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
Lecture3.ppt
Lecture3.pptLecture3.ppt
Lecture3.ppt
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
 

Mehr von Frank Kienle

AI for good summary
AI for good summaryAI for good summary
AI for good summaryFrank Kienle
 
Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Frank Kienle
 
Machine Learning part 2 - Introduction to Data Science
Machine Learning part 2 -  Introduction to Data Science Machine Learning part 2 -  Introduction to Data Science
Machine Learning part 2 - Introduction to Data Science Frank Kienle
 
Machine Learning part1 - Introduction to Data Science
Machine Learning part1 - Introduction to Data Science Machine Learning part1 - Introduction to Data Science
Machine Learning part1 - Introduction to Data Science Frank Kienle
 
Business Models - Introduction to Data Science
Business Models -  Introduction to Data ScienceBusiness Models -  Introduction to Data Science
Business Models - Introduction to Data ScienceFrank Kienle
 
Data Science Lecture: Overview and Information Collateral
Data Science Lecture: Overview and Information CollateralData Science Lecture: Overview and Information Collateral
Data Science Lecture: Overview and Information CollateralFrank Kienle
 
Lecture summary: architectures for baseband signal processing of wireless com...
Lecture summary: architectures for baseband signal processing of wireless com...Lecture summary: architectures for baseband signal processing of wireless com...
Lecture summary: architectures for baseband signal processing of wireless com...Frank Kienle
 
Lecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsLecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsFrank Kienle
 
data scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st centurydata scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st centuryFrank Kienle
 

Mehr von Frank Kienle (9)

AI for good summary
AI for good summaryAI for good summary
AI for good summary
 
Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science
 
Machine Learning part 2 - Introduction to Data Science
Machine Learning part 2 -  Introduction to Data Science Machine Learning part 2 -  Introduction to Data Science
Machine Learning part 2 - Introduction to Data Science
 
Machine Learning part1 - Introduction to Data Science
Machine Learning part1 - Introduction to Data Science Machine Learning part1 - Introduction to Data Science
Machine Learning part1 - Introduction to Data Science
 
Business Models - Introduction to Data Science
Business Models -  Introduction to Data ScienceBusiness Models -  Introduction to Data Science
Business Models - Introduction to Data Science
 
Data Science Lecture: Overview and Information Collateral
Data Science Lecture: Overview and Information CollateralData Science Lecture: Overview and Information Collateral
Data Science Lecture: Overview and Information Collateral
 
Lecture summary: architectures for baseband signal processing of wireless com...
Lecture summary: architectures for baseband signal processing of wireless com...Lecture summary: architectures for baseband signal processing of wireless com...
Lecture summary: architectures for baseband signal processing of wireless com...
 
Lecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsLecture: Monte Carlo Methods
Lecture: Monte Carlo Methods
 
data scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st centurydata scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st century
 

KĂźrzlich hochgeladen

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 

KĂźrzlich hochgeladen (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

Data Bases - Introduction to data science

  • 1. Introduction to Data Science Frank Kienle High level introduction to Data Bases
  • 2. Big Data Landscape 06.09.17 Frank Kienle p. 2
  • 3. Overview of data sources •  http://www.knuggets.com/datasets/index.html Machine learning data •  UCI Machine Learning Repository: archive.ics.uci.edu Data Shop: the world’s largest repository of learning interaction data •  https://pslcdatashop.web.cmu.edu Getting Data is not the problem - Very large flavor of Data Sources 06.09.17 Frank Kienle 3
  • 4. •  Formally, a "database" refers to a set of related data and the way it is organized. •  A database manages data efciently and allows users to perform multiple tasks with ease. The efcient access to the data is usually provided by a "database management system" (DBMS) •  A database management system stores, organizes and manages a large amount of information within a single software application. •  Use of this system increases efciency of business operations and reduces overall costs. •  Different database systems exist which are designed with respect to: •  the data to be stored in the database •  the relationships between the different data elements. Dependencies within the data which can be modeled by mathematical relations •  the logical structure upon the data on the basis of these relationships. The goal is to arrange the data into a logical structure which can then be mapped into the storage objects Database 06.09.17 Frank Kienle p. 4
  • 6. Scale up: using more and more main memory Scale out: using more and more computers Denition (m complexity order): Scalability for N data items an algorithms scales with Nm. E.g polynomial complexity Parallelize it (k nodes): The algorithm scales with Nm/k Goal nd algorithms with complexity: N log(N) which relates e.g. with trees (one touch) Scalability in big data 06.09.17 6Frank Kienle
  • 7. CAP theorem 06.09.17 Frank Kienle p. 7 C: consistency (do all applications see all the same data) Any data written to the database must be valid According to all defined rules A: availability (can I interact with the system In the presence of failures) P: partitioning If two sections of your system cannot talk to each Other, can they make forward progress on their own -  If not you sacrifice availability -  If so, you might have to sacrifice consistency Dynamo Riak Voldemort Cassandra CouchDB Bigtable Hbase Hypertable Megastore Spanner Accumulo RDBMS
  • 9. Relational data bases key idea: §  storage and retrieval of large quantities of related data. §  When creating a database you should think about which tables needed and what relationships exist between the data in your tables. §  Relational algebra, §  Physical/logical data independence Think about the design in advance Relational Data Bases 06.09.17 Frank Kienle p. 9
  • 10. A database is created for the storage and retrieval of data. we want to be able to INSERT data into the database and we want to be able to SELECT data from the database. A database query language was invented for these tasks called the Structured Query Language, Structured query language (SQL) 06.09.17 Frank Kienle p. 10
  • 11. When you can do JOIN’s its good for analytics When a data base does not provide joins the work is it is all up for the users (Leave the work on the client side) Fundamental of data exploring (joins) 06.09.17 Frank Kienle p. 11
  • 12. Outer Relational Join (on time stamp) 06.09.17 Frank Kienle p. 12 Time stamp [s] Value room [Wa2] 1 30 2 25 5 12 Time stamp [s] Value Home [Wa2] 1 100 2 78 3 99 4 70 Time stamp [s] Value Room [Wa2| Value Home [Wa2] 1 30 100 2 25 78 3 NaN 99 4 NaN 70 5 12 NaN
  • 13. Left Join (on time stamp) 06.09.17 Frank Kienle p. 13 Time stamp [s] Value room [Wa2] 1 30 2 25 5 12 Time stamp [s] Value Home [Wa2] 1 100 2 78 3 99 4 70 Time stamp [s] Value Room [Wa2| Value Home [Wa2] 1 30 100 2 25 78 5 12 NaN
  • 14. Storing data efciently is all about the application schema less vs. schema writing centric vs. reading centric transactional vs. analytics batch vs. stream
  • 15. Key-Value object •  A set of key-value pairs Extensible record (XML or JSON) •  Families of attributes have a schema •  New attributes may be added •  Many predictive analytics tasks will require a kind of record •  Many REST APIs will deliver JSON, (YAML, XML) structures •  Example: tweeter feeds Key Value stores (Document store might be a subset) •  No schema, no exposed nesting •  often raw data (scalable to peta bytes) •  on top simple analytics tasks Different data structure 06.09.17 Frank Kienle p. 15 45777 Ux_78 321-87 Frank Kienle, Germany Please learn Random data key value
  • 17. Example JSON Twitter feed 06.09.17 Frank Kienle p. 17
  • 18. The ability to replicate and partition data over many serves •  Sharding: horizontal partitioning of the data set No query language: a simple API dened Ability to scale operations over many serves •  Throughput increase •  Due to missing (language) query layer each operation has to design towards the API Operations have often restrictions to data locality New features can be added dynamically to data records (no xed schema) Consistency model often weak (no modeling of transaction) (typical) NoSQL data base features 06.09.17 Frank Kienle p. 18
  • 19. In-memory database •  primarily relies on main memory for computer data storage •  main purpose is faster analytics on data •  relational or unstructured data structure •  memory optimized data structures Main memory database system (MMDB) 06.09.17 Frank Kienle p. 19
  • 20. Advantage Column-oriented: •  Reading efciency: more efcient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data select col_1,col_2 from table where col_2>5 and col_2<45; •  Writing efciency: more efcient when new values of a column are supplied for all rows at once Advantage row-oriented: •  Reading efciency: more efcient when many columns of a single row are required at the same time, and when row-size is relatively small •  Writing efciency: more efcient when writing a new row if all of the row data is supplied at the same time, as the entire row can be written with a single disk seek. Row vs. Column data stores 06.09.17 Frank Kienle p. 20
  • 21. Processing types 06.09.17 Frank Kienle p. 21 OLTP: On-line Transaction Processing e.g. Business transactions (insert, update, delete) OLAP: On-line Analytical Processing e.g. complex analytics (aggregating of historical data)
  • 22. for data analytics a column oriented in-memory data base is a must have 06.09.17 Frank Kienle p. 22
  • 23. Spanner Idea: Planet scale data base system ….we believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions … Loose consistency for predictive analytics is horrible Loose consistency is a no go for prescriptive analytics (dynamic pricing) Systems should always be designed for usability Many trends in data bases are going back to data consistency 06.09.17 Frank Kienle p. 23