SlideShare ist ein Scribd-Unternehmen logo
1 von 34
FBW
8-05-2018
Biological Databases
Wim Van Criekinge
SPARQL 1/2
SPARQL 2/2
noSQL
Data Warehousing and Decision Support
RMySQL Package
No need to parse the data – the Fetch function puts the
queried data directly into an R data.frame format!
What does NoSQLmean?
●
NoSQL stands for:
–
–
–
No Relational
No RDBMS
NonRel
– Not OnlySQL
●
NoSQL is
– An umbrella term for all databases and data stores that don’t
follow the RDBMSprinciples
●
●
A class of products
A collection of several (related) concepts about data storage and
manipulation
– Often related to large data sets
Where does NoSQL comefrom?
●
●
Non-relational DBMSs are not new
But NoSQL represents a newincarnation
– Due tomassively scalable Internet applications
– Based on distributed and parallel computing
●
Development
– Starts with Google
●
First research paper published in 2003
–
–
Continues also thanks to Lucene's developers/Apache
(Hadoop) and Amazon (Dynamo)
Then a lot of products and interests came from Facebook,
Netfix, Yahoo, eBay, Hulu, IBM, and many more
NoSQL and big data
●
●
NoSQL comes from Internet, thus it is often related to the
“big data” concept
How much big are “big data”?
–
–
Over few terabytes (>1012 ≈240)
Enough to start spanning multiple storage units
●
Challenges
–
–
– Efciently storing and accessing large amounts of data is
difcult, even more considering fault tolerance and backups
Manipulating large data sets involves running immensely
parallel processes
Managing continuously evolving schema and metadata for
semi-structured and un-structured data is difcult
Why are RDBMSnot suitable for big data?
●
●
The context is Internet
RDBMSs assume that dataare
– Dense
– Largely uniform (structureddata)
●
Data coming from Internet are
–
–
Massive and sparse
Semi-structured or unstructured
●
With massive sparse data sets, the typical storage
mechanisms and access methods getstretched
NoSQL products' categories
●
NoSQLproducts canbe categorized in
– Key/value stores
– Sorted ordered column-orientedstores
– Document databases(JSON/XML)
– Graphdatabases
The Benefits of NoSQL
[https://www.mongodb.com/nosql-explained]When compared to relational databases, NoSQL databases
are more scalable and provide superior performance, and
their data model addresses several issues that the relational
model is not designed to address:
– Geographically distributed architecture instead of
expensive, monolithic architecture
– Large volumes of rapidly changing structured, semi-
structured, and unstructured data
– Agile sprints, quick schema iteration, and frequent code
pushes
– Object-oriented programming that is easy to use and
flexible12
NoSQL Database Types
[https://www.mongodb.com/nosql-explained]
‱ Key-value stores are the simplest NoSQL databases. Every single item
in the database is stored as an attribute name (or 'key'), together with its
value. Examples of key-value stores are Riak and Berkeley DB.
‱ Wide-column stores such as Cassandra and HBase are optimized for
queries over large datasets, and store columns of data together, instead
of rows.
‱ Document databases pair each key with a complex data structure
known as a document.
‱ Graph stores are used to store information about networks of data, such
as social connections. Graph stores include Neo4J and triple stores like
Fuseki.
13
Features
Simple primitive data
structure
No predefined schema
Limited query capabilities
Dictionary-like
functionality at large scale
key3
key2
key1 value1
value2
value2
Bioinformatics Use Case
Word vectors in text
mining
Caching
Limitations
Key lookup only, no
generalized query
Small number of
attributes per entity
Key/value stores
Key/value stores
●
●
Store datain a schema-less way
Store data asmaps
–
–
HashMaps or associativearrays
Provide a very efcient average running time algorithm for
accessing data
●
Notable for:
–
–
–
–
–
– Couchbase (Zynga, Vimeo, NAVTEQ,...)
Redis (Craiglist, Instagram, StackOverfow, fickr, ...)
Amazon Dynamo (Amazon, Elsevier, IMDb,...)
Apache Cassandra (Facebook, Digg, Reddit, Twitter,...)
Voldemort (LinkedIn, eBay,
 )
Riak (Github, Comcast, Mochi, ...)
Sorted ordered column-orientedstores
●
Data are stored in a column-oriented way
–
–
–
Data efficiently stored
Avoids consuming space forstoring nulls
Unit of data is a set of key/value pairs
●
●
Identified by“row-key”
Ordered and sorted based on row-key
–
–
Columns are grouped incolumn-families
Data isn’t stored as a single table but is stored by column-
families
●
Notable for:
– Google's Bigtable (used in all Google's services)
– HBase (Facebook, StumbleUpon, Hulu, Yahoo!,...)
Column-oriented store example
Features
Groups attributes into
column families
Column families store key-
value pairs
Implemented as sparse
multi-dimensional arrays
Denormalized
104-106 columns; 109 rows
 Bioinformatics Use Case
 Large studies
 Many experiments & data types
 Simulations
 Limitations
 Operationally
challenging
 Suitable for large
number of servers
Document databases
●
Documents
–
–
–
– Loosely structured sets of key/value pairs in documents, e.g.,
XML, JSON, BSON
Encapsulate and encode data in some standard formats or
encodings
Are addressed in the database via a unique key
Documents are treated as a whole, avoiding splitting a
document into its constituent name/value pairs
●
●
Allow documents retrieving by keys or contents
Notable for:
– MongoDB (used in FourSquare, Github, and more)
– CouchDB (used inApple, BBC, Canonical, Cern, and more)
Document databases,JSON
{
“ApacheLogRecord”: {
“ip”: “127.0.0.1”,
“ident” : “-”,
“http_user” : “frank”,
“time” : “10/Oct/2000:13:55:36 -0700”,
“request_line” : {
“http_method” : “GET”,
“url” : “/apache_pb.gif”,
“http_vers” : “HTTP/1.0”,
},
“http_response_code” : “200”,
“http_response_size” : “2326”,
“referrer” : “http://www.example.com/start.html”,
“user_agent” : “Mozilla/4.08 [en] (Win98; I ;Nav)”,
}
}
{
subject_id: "F8273",
age : "26",
sex : "M"
date_of_death : "12-Jan-1995”,
glycohemoglobin: 10%,
BMI : 22,
samples : [ {type:"Thoracic Aorta", AHA_score: 1},
{type:"Abdominal Aorta", AHA_score: 2},
{type:"LAD", AHA_Score:5} ],
sequence: {seq_file: "F8273_08152014.bam",
variant_file: "F8273_08152014.vcf”}
}
Features
 JSON/XML structures
 Fields vary between docs
 No predefined schema
 Documents analogous to
rows
 Collections analogous to
tables
 Query capabilities
Bioinformatics Use Case
Text mining
Atherosclerosis
Limitations
No joins
No referential integrity
checks
Object-based query language
{
id : <value>,
<key> : <value>,
<key> : <embedded
document>,
<key> : <array>
}
Limitations
Less suited for tabular
data
Features
Highly normalized
Graph-based query
language (Gremlin)
SQL-inspired query
language (Cypher)
Support for path finding
and recursion
Bioinformatics Use Case
Epidemiology
simulations
Interaction networks
Property Graph Model
name: the Doctor
age: 907
species: Time Lord
first name: Rose
late name: Tyler
vehicle: tardis
model: Type 40
Modeling NoSQLstores
●
NoSQL data modeling often starts from the
application-specific queries as opposed to relational
modeling:
–
– Relational modeling is typically driven by the
structure of available data. The main design theme is
”What answers doI have?”
NoSQL data modeling is typically driven by
application-specific access patterns, i.e. thetypes of
queries to be supported. The main design theme is
”What questions doI have?”
●
Data duplication and denormalization are first-class
citizens
Querying NoSQLstores
●
Different NoSQLstores provide diferent
querying tools andfeatures
–
– From “simple” filtering ofdata basedon “columns”
names/values (MongoDB, HBase,Redis, 
)
ToSQL-likelanguages (GoogleApp Engine,
HyperTable, Hive,...)
NoSQL, No ACID
●
●
RDBMSs are based on ACID (Atomicity,Consistency,
Isolation, and Durability) properties
NoSQL
– Does not give importance to ACID properties
– In some cases completely ignoresthem
●
In distributed parallel systems itis difcult/impossible
to ensure ACIDproperties
– Even with a centralcoordinator
– 2PL, 2PC and SS2PLcan help
●
Long-running transactions don't work because keeping
resources blocked for a long time is not practical
CAPTheorem
●
A congruent and logical way for assessing the problems
involved in assuring ACID-like guarantees in distributed
systems is provided by the CAP theorem
– At most two of the following three can be maximized at one
time
●
●
●
Consistency - Each client has the same view of the data
Availability - Each clientcan always read and write
Partition tolerance - System works wellacross distributed
physical networks
–
–
Conjectured by Eric Brewer in 2000
Proved by Seth Gilbert and Nancy Lynch in 2002
References
●
●
●
●
●
●
●
Tiwari, Shashank. Professional NoSQL. Wrox, 2011.
Warden, Pete. Big Data Glossary. O'Reilly Media, 2011.
Vogels, Werner (Amazon.com's CTO). All Things Distributed. Werner Vogels'
weblog on building scalable and robust distributed systems.
http://www.allthingsdistributed.com/
Katsov, Ilya. NoSQL Data Modeling Techniques.
http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/
Bushik, Sergey. A vendor-independent comparison of NoSQL databases:
Cassandra, HBase, MongoDB, Riak. October 2012. Available online.
Gilbert, Seth and Lynch, Nancy. Brewer's conjecture and the feasibility of
consistent, available, partition-tolerant web services. ACM SIGACT News
33.2 (2002): 51-59.
Redmond, Eric, Wilson, Jim R. , and Carter, Jacquelyn. Seven databases in
seven weeks: a guide to modern databases and the NoSQL movement. The
Pragmatic Programmers, LLC,2012.
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql

Weitere Àhnliche Inhalte

Was ist angesagt?

NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL DatabasesBADR
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2Fabio Fumarola
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented DatabasesFabio Fumarola
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databasesAshwani Kumar
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL DatabaseHeman Hosainpana
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Databasenehabsairam
 
Big Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingBig Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingDATAVERSITY
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Managementsameerfaizan
 
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsChapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsnehabsairam
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture OverviewChristopher Foot
 
Nosql databases
Nosql databasesNosql databases
Nosql databasesFayez Shayeb
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL DatabasesRajith Pemabandu
 

Was ist angesagt? (20)

NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
 
NoSQL Consepts
NoSQL ConseptsNoSQL Consepts
NoSQL Consepts
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Big Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingBig Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data Modeling
 
Unit 3 MongDB
Unit 3 MongDBUnit 3 MongDB
Unit 3 MongDB
 
Nosql
NosqlNosql
Nosql
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsChapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL Databases
 

Ähnlich wie 2018 05 08_biological_databases_no_sql

NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdfAkshayDwivedi31
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptxRithikRaj25
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology LandscapeShivanandaVSeeri
 
NoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbmsNoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbmsAtulKabbur
 
Unit II -BIG DATA ANALYTICS.docx
Unit II -BIG DATA ANALYTICS.docxUnit II -BIG DATA ANALYTICS.docx
Unit II -BIG DATA ANALYTICS.docxvvpadhu
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL DatabasesAbiral Gautam
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresqlZaid Shabbir
 
Assignment_4
Assignment_4Assignment_4
Assignment_4Kirti J
 
TYPES OF NO SQL DATABASES.pptx
TYPES OF NO SQL DATABASES.pptxTYPES OF NO SQL DATABASES.pptx
TYPES OF NO SQL DATABASES.pptxMarkThomas316888
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLbalwinders
 
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.pptmy no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.pptwondimagegndesta
 
No sql databases
No sql databasesNo sql databases
No sql databasesswathika rajan
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sqlAnuja Gunale
 
Know what is NOSQL
Know what is NOSQL Know what is NOSQL
Know what is NOSQL Prasoon Sharma
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMohan Rathour
 

Ähnlich wie 2018 05 08_biological_databases_no_sql (20)

unit2-ppt1.pptx
unit2-ppt1.pptxunit2-ppt1.pptx
unit2-ppt1.pptx
 
NOsql Presentation.pdf
NOsql Presentation.pdfNOsql Presentation.pdf
NOsql Presentation.pdf
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
 
NoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbmsNoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbms
 
Unit II -BIG DATA ANALYTICS.docx
Unit II -BIG DATA ANALYTICS.docxUnit II -BIG DATA ANALYTICS.docx
Unit II -BIG DATA ANALYTICS.docx
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL Databases
 
Nosql
NosqlNosql
Nosql
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresql
 
Assignment_4
Assignment_4Assignment_4
Assignment_4
 
TYPES OF NO SQL DATABASES.pptx
TYPES OF NO SQL DATABASES.pptxTYPES OF NO SQL DATABASES.pptx
TYPES OF NO SQL DATABASES.pptx
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.pptmy no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Unit-10.pptx
Unit-10.pptxUnit-10.pptx
Unit-10.pptx
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sql
 
Know what is NOSQL
Know what is NOSQL Know what is NOSQL
Know what is NOSQL
 
Artigo no sql x relational
Artigo no sql x relationalArtigo no sql x relational
Artigo no sql x relational
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorial
 

Mehr von Prof. Wim Van Criekinge

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1Prof. Wim Van Criekinge
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_uploadProf. Wim Van Criekinge
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_uploadProf. Wim Van Criekinge
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_uploadProf. Wim Van Criekinge
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Prof. Wim Van Criekinge
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3Prof. Wim Van Criekinge
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_uploadProf. Wim Van Criekinge
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 
Van criekinge 2017_11_13_rodebiotech
Van criekinge 2017_11_13_rodebiotechVan criekinge 2017_11_13_rodebiotech
Van criekinge 2017_11_13_rodebiotechProf. Wim Van Criekinge
 

Mehr von Prof. Wim Van Criekinge (20)

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload
 
P7 2018 biopython3
P7 2018 biopython3P7 2018 biopython3
P7 2018 biopython3
 
P6 2018 biopython2b
P6 2018 biopython2bP6 2018 biopython2b
P6 2018 biopython2b
 
P4 2018 io_functions
P4 2018 io_functionsP4 2018 io_functions
P4 2018 io_functions
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
 
P1 2018 python
P1 2018 pythonP1 2018 python
P1 2018 python
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload
 
2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
P7 2017 biopython3
P7 2017 biopython3P7 2017 biopython3
P7 2017 biopython3
 
P6 2017 biopython2
P6 2017 biopython2P6 2017 biopython2
P6 2017 biopython2
 
Van criekinge 2017_11_13_rodebiotech
Van criekinge 2017_11_13_rodebiotechVan criekinge 2017_11_13_rodebiotech
Van criekinge 2017_11_13_rodebiotech
 

KĂŒrzlich hochgeladen

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 

KĂŒrzlich hochgeladen (20)

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 

2018 05 08_biological_databases_no_sql

  • 1.
  • 4. Data Warehousing and Decision Support
  • 5.
  • 6. RMySQL Package No need to parse the data – the Fetch function puts the queried data directly into an R data.frame format!
  • 7. What does NoSQLmean? ● NoSQL stands for: – – – No Relational No RDBMS NonRel – Not OnlySQL ● NoSQL is – An umbrella term for all databases and data stores that don’t follow the RDBMSprinciples ● ● A class of products A collection of several (related) concepts about data storage and manipulation – Often related to large data sets
  • 8. Where does NoSQL comefrom? ● ● Non-relational DBMSs are not new But NoSQL represents a newincarnation – Due tomassively scalable Internet applications – Based on distributed and parallel computing ● Development – Starts with Google ● First research paper published in 2003 – – Continues also thanks to Lucene's developers/Apache (Hadoop) and Amazon (Dynamo) Then a lot of products and interests came from Facebook, Netfix, Yahoo, eBay, Hulu, IBM, and many more
  • 9. NoSQL and big data ● ● NoSQL comes from Internet, thus it is often related to the “big data” concept How much big are “big data”? – – Over few terabytes (>1012 ≈240) Enough to start spanning multiple storage units ● Challenges – – – Efciently storing and accessing large amounts of data is difcult, even more considering fault tolerance and backups Manipulating large data sets involves running immensely parallel processes Managing continuously evolving schema and metadata for semi-structured and un-structured data is difcult
  • 10. Why are RDBMSnot suitable for big data? ● ● The context is Internet RDBMSs assume that dataare – Dense – Largely uniform (structureddata) ● Data coming from Internet are – – Massive and sparse Semi-structured or unstructured ● With massive sparse data sets, the typical storage mechanisms and access methods getstretched
  • 11. NoSQL products' categories ● NoSQLproducts canbe categorized in – Key/value stores – Sorted ordered column-orientedstores – Document databases(JSON/XML) – Graphdatabases
  • 12. The Benefits of NoSQL [https://www.mongodb.com/nosql-explained]When compared to relational databases, NoSQL databases are more scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address: – Geographically distributed architecture instead of expensive, monolithic architecture – Large volumes of rapidly changing structured, semi- structured, and unstructured data – Agile sprints, quick schema iteration, and frequent code pushes – Object-oriented programming that is easy to use and flexible12
  • 13. NoSQL Database Types [https://www.mongodb.com/nosql-explained] ‱ Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or 'key'), together with its value. Examples of key-value stores are Riak and Berkeley DB. ‱ Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows. ‱ Document databases pair each key with a complex data structure known as a document. ‱ Graph stores are used to store information about networks of data, such as social connections. Graph stores include Neo4J and triple stores like Fuseki. 13
  • 14. Features Simple primitive data structure No predefined schema Limited query capabilities Dictionary-like functionality at large scale key3 key2 key1 value1 value2 value2 Bioinformatics Use Case Word vectors in text mining Caching Limitations Key lookup only, no generalized query Small number of attributes per entity Key/value stores
  • 15. Key/value stores ● ● Store datain a schema-less way Store data asmaps – – HashMaps or associativearrays Provide a very efcient average running time algorithm for accessing data ● Notable for: – – – – – – Couchbase (Zynga, Vimeo, NAVTEQ,...) Redis (Craiglist, Instagram, StackOverfow, fickr, ...) Amazon Dynamo (Amazon, Elsevier, IMDb,...) Apache Cassandra (Facebook, Digg, Reddit, Twitter,...) Voldemort (LinkedIn, eBay,
 ) Riak (Github, Comcast, Mochi, ...)
  • 16. Sorted ordered column-orientedstores ● Data are stored in a column-oriented way – – – Data efficiently stored Avoids consuming space forstoring nulls Unit of data is a set of key/value pairs ● ● Identified by“row-key” Ordered and sorted based on row-key – – Columns are grouped incolumn-families Data isn’t stored as a single table but is stored by column- families ● Notable for: – Google's Bigtable (used in all Google's services) – HBase (Facebook, StumbleUpon, Hulu, Yahoo!,...)
  • 18. Features Groups attributes into column families Column families store key- value pairs Implemented as sparse multi-dimensional arrays Denormalized 104-106 columns; 109 rows  Bioinformatics Use Case  Large studies  Many experiments & data types  Simulations  Limitations  Operationally challenging  Suitable for large number of servers
  • 19. Document databases ● Documents – – – – Loosely structured sets of key/value pairs in documents, e.g., XML, JSON, BSON Encapsulate and encode data in some standard formats or encodings Are addressed in the database via a unique key Documents are treated as a whole, avoiding splitting a document into its constituent name/value pairs ● ● Allow documents retrieving by keys or contents Notable for: – MongoDB (used in FourSquare, Github, and more) – CouchDB (used inApple, BBC, Canonical, Cern, and more)
  • 20. Document databases,JSON { “ApacheLogRecord”: { “ip”: “127.0.0.1”, “ident” : “-”, “http_user” : “frank”, “time” : “10/Oct/2000:13:55:36 -0700”, “request_line” : { “http_method” : “GET”, “url” : “/apache_pb.gif”, “http_vers” : “HTTP/1.0”, }, “http_response_code” : “200”, “http_response_size” : “2326”, “referrer” : “http://www.example.com/start.html”, “user_agent” : “Mozilla/4.08 [en] (Win98; I ;Nav)”, } }
  • 21. { subject_id: "F8273", age : "26", sex : "M" date_of_death : "12-Jan-1995”, glycohemoglobin: 10%, BMI : 22, samples : [ {type:"Thoracic Aorta", AHA_score: 1}, {type:"Abdominal Aorta", AHA_score: 2}, {type:"LAD", AHA_Score:5} ], sequence: {seq_file: "F8273_08152014.bam", variant_file: "F8273_08152014.vcf”} }
  • 22. Features  JSON/XML structures  Fields vary between docs  No predefined schema  Documents analogous to rows  Collections analogous to tables  Query capabilities Bioinformatics Use Case Text mining Atherosclerosis Limitations No joins No referential integrity checks Object-based query language { id : <value>, <key> : <value>, <key> : <embedded document>, <key> : <array> }
  • 23. Limitations Less suited for tabular data Features Highly normalized Graph-based query language (Gremlin) SQL-inspired query language (Cypher) Support for path finding and recursion Bioinformatics Use Case Epidemiology simulations Interaction networks
  • 24. Property Graph Model name: the Doctor age: 907 species: Time Lord first name: Rose late name: Tyler vehicle: tardis model: Type 40
  • 25.
  • 26. Modeling NoSQLstores ● NoSQL data modeling often starts from the application-specific queries as opposed to relational modeling: – – Relational modeling is typically driven by the structure of available data. The main design theme is ”What answers doI have?” NoSQL data modeling is typically driven by application-specific access patterns, i.e. thetypes of queries to be supported. The main design theme is ”What questions doI have?” ● Data duplication and denormalization are first-class citizens
  • 27. Querying NoSQLstores ● Different NoSQLstores provide diferent querying tools andfeatures – – From “simple” filtering ofdata basedon “columns” names/values (MongoDB, HBase,Redis, 
) ToSQL-likelanguages (GoogleApp Engine, HyperTable, Hive,...)
  • 28. NoSQL, No ACID ● ● RDBMSs are based on ACID (Atomicity,Consistency, Isolation, and Durability) properties NoSQL – Does not give importance to ACID properties – In some cases completely ignoresthem ● In distributed parallel systems itis difcult/impossible to ensure ACIDproperties – Even with a centralcoordinator – 2PL, 2PC and SS2PLcan help ● Long-running transactions don't work because keeping resources blocked for a long time is not practical
  • 29. CAPTheorem ● A congruent and logical way for assessing the problems involved in assuring ACID-like guarantees in distributed systems is provided by the CAP theorem – At most two of the following three can be maximized at one time ● ● ● Consistency - Each client has the same view of the data Availability - Each clientcan always read and write Partition tolerance - System works wellacross distributed physical networks – – Conjectured by Eric Brewer in 2000 Proved by Seth Gilbert and Nancy Lynch in 2002
  • 30. References ● ● ● ● ● ● ● Tiwari, Shashank. Professional NoSQL. Wrox, 2011. Warden, Pete. Big Data Glossary. O'Reilly Media, 2011. Vogels, Werner (Amazon.com's CTO). All Things Distributed. Werner Vogels' weblog on building scalable and robust distributed systems. http://www.allthingsdistributed.com/ Katsov, Ilya. NoSQL Data Modeling Techniques. http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/ Bushik, Sergey. A vendor-independent comparison of NoSQL databases: Cassandra, HBase, MongoDB, Riak. October 2012. Available online. Gilbert, Seth and Lynch, Nancy. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News 33.2 (2002): 51-59. Redmond, Eric, Wilson, Jim R. , and Carter, Jacquelyn. Seven databases in seven weeks: a guide to modern databases and the NoSQL movement. The Pragmatic Programmers, LLC,2012.