SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
The evolution of database technology (II)
Huibert Aalbers
Senior Certified Executive IT Architect
IT Insight podcast
• This podcast belongs to the IT Insight series
• You can subscribe to the podcast through iTunes.
• Additional material such as presentations in PDF format or white
papers mentioned in the podcast can be downloaded from the IT
insight section of my site at http://www.huibert-aalbers.com
• You can send questions or suggestions regarding this podcast to my
personal email, huibert_aalbers@mac.com
A brave new world
• With Web 2.0, came the need for a new
set of tools that could handle an
explosive growth of data
• Data willfully shared by the users
• Data collected on users and
customers, sometimes
unsuspectedly on their part
• Sensors, IoT, etc.
• Big Data requires a new kind of data
repository
Do I need a different solution?
There are basically two ways to determine that you require a new
type of database solution instead of a traditional relational database
• The architect designs a new system from the ground up using
a Big Data solution because he knowns that it will require it
• The team has tried every single strategy to try to scale the
existing relational database and it is still not enough
• Upgrading the hardware / use of SSDs / Networking, etc.
• Query optimization
• Using a data caching scheme
• Partitioning the data
• Building new indices
• Denormalizing the data
• Using stored procedures, etc.
In order to solve the issue, we have to give up
something
• What can we give up?
• ACID properties
• Data normalization
• Transaction support
No SQL Repositories
From my point of view, the name “No SQL” is
not right to describe non-relational databases
• The success behind No SQL databases is
not related to the fact that developers don’t
like SQL. It is due to the following reasons:
• They scale linearly
• They are more flexible (schema-less)
• Easier to manage for extremely high
volumes of data
I think it is better to call them distributed non-
relational databases
Key-Value pair databases
These data stores are also known as distributed
hash tables
• Pros
• Extremely quick, well understood CS problem
• Scale almost linearly
• Cons
• Performing complex queries against the values
can be slow and complex
• Key-value pair data stores in which the product
also keeps a time stamp on the data for versioning
are a particular case of key-value pair databases
Document based databases
This is a large category of data stores which allow to work with data stored in a particular document
format. Among popular document formats used to store data, we could mention:
• XML
• JSON
• YAML
In this kind of data stores, documents are identified by a unique key, which allows for quick retrieval of the
information.
Although conceptually all data stores in this category are relatively similar, there are still important
differences from one product to another
• Query methods (SQL like, Map/Reduce, etc.)
• Replication
• Data consistency
Document based databases are schema-less
MongoDB vs CouchDB
• MongoDB
• Very high volumes of data somewhat mutable data
• Dynamic flexible queries, somewhat similar to SQL
• Very quick queries
• CouchDB
• Very high volumes of mostly immutable data
• Pre-defined queries, based on MapReduce, implemented in Javascript
• Master-Master replication
• Neither MongoDB nor CouchDB natively work with XML data, both work with JSON documents
Document based databases
• Among the many “Document based databases”, MongoDB is currently the
most popular, closely followed by CouchDB
• The MongoDB API is currently supported by both DB2 and Informix
• That means that it is now very easy to migrate from mongoDB to any of
those databases and store in a single repository both structured data and
JSON documents
Hosted Document databases
• Both MongoDB and CouchDB are popular databases, which explains why
there are many options to use both hosted and managed versions of these
products
• Cloudant is a fully managed version of BigCouch, which is in turn a high
availability, fault tolerant version of CouchDB
• Migrating from CouchDB or BigCouch to Cloudant is totally transparent
• Both MongoDB and CouchDB scale very well by implementing sharding,
which make them very well suited for born-on-cloud applications
Graph databases
Social networks have become one of the
most representative applications of what is
known as Web 2.0
• Storing and processing social graphs in
relational is both complex and inefficient
Unlike relational databases, this new kind of
data stores focuses more on relationships
than on data. For social networks kind of
projects this results in:
• Increased performance
• Simpler and more natural development
Hadoop
Hadoop is a framework designed to process tasks that can
be parallelized on extremely high volumes of data
distributed over a large number of server nodes belonging
to a cluster. It has four main components:
• Hadoop common
• Hadoop Distributed File System (HDFS)
• Designed primarily to handle extremely high
volumes of immutable data
• Loading and deleting data is efficient, updating
data is not
• Hadoop YARN
• Hadoop MapReduce
Managing a complete Hadoop system is currently not for
the faint of heart
MapReduce
MapReduce is the data processing algorithm that
sits at the very core of Hadoop
Developers need to implement for each query the
following functions:
• Map: In this phase the overall problem is
divided into smaller problems which can
be divided into smaller tasks (which can
also be further broken down) that can be
distributed to run on different server nodes
• Reduce: In this second phase, the master
node combines the answers received from
the different nodes and processes them to
produce a reply to the query
MapReduce
Hadoop allows to store any kind of data
• Structured
• Unstructured
When using Hadoop to store structured data, in
a data warehouse like environment, it is possible
to use languages that automatically generate
the code for the Map/Reduce functions
• Apache Pig (pig latin)
• Apache Hive (HiveQL, similar to SQL)
• IBM Big SQL
Analyzing streams of data
Sometimes the amount of stored data is so large
that it simply becomes impossible to perform real
time analysis
• In those cases, the best alternative is to
analyze the stream of data before it is stored
in the database
• The main idea is that the data is kept outside
the database (generally in RAM) during a
certain window of time in order to detect a
combination of events in a short period of
time
• Fraud detection
• Digital marketing
Polygot Persistence
When working with applications that require extreme scaling, there is no
solution that fits all challenges. It is likely that after careful analysis of the
problem more than one datastore will be required to obtain the best
performance.
• This is known as “Polygot Persistence”
Contact information
On Twitter: @huibert (English), @huibert2 (Spanish)
Web site: http://www.huibert-aalbers.com
Blog: http://www.huibert-aalbers.com/blog

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 

Kürzlich hochgeladen (20)

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 

Empfohlen

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

ITI016En-The evolution of databases (II)

  • 1. The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect
  • 2. IT Insight podcast • This podcast belongs to the IT Insight series • You can subscribe to the podcast through iTunes. • Additional material such as presentations in PDF format or white papers mentioned in the podcast can be downloaded from the IT insight section of my site at http://www.huibert-aalbers.com • You can send questions or suggestions regarding this podcast to my personal email, huibert_aalbers@mac.com
  • 3. A brave new world • With Web 2.0, came the need for a new set of tools that could handle an explosive growth of data • Data willfully shared by the users • Data collected on users and customers, sometimes unsuspectedly on their part • Sensors, IoT, etc. • Big Data requires a new kind of data repository
  • 4. Do I need a different solution? There are basically two ways to determine that you require a new type of database solution instead of a traditional relational database • The architect designs a new system from the ground up using a Big Data solution because he knowns that it will require it • The team has tried every single strategy to try to scale the existing relational database and it is still not enough • Upgrading the hardware / use of SSDs / Networking, etc. • Query optimization • Using a data caching scheme • Partitioning the data • Building new indices • Denormalizing the data • Using stored procedures, etc.
  • 5. In order to solve the issue, we have to give up something • What can we give up? • ACID properties • Data normalization • Transaction support
  • 6. No SQL Repositories From my point of view, the name “No SQL” is not right to describe non-relational databases • The success behind No SQL databases is not related to the fact that developers don’t like SQL. It is due to the following reasons: • They scale linearly • They are more flexible (schema-less) • Easier to manage for extremely high volumes of data I think it is better to call them distributed non- relational databases
  • 7. Key-Value pair databases These data stores are also known as distributed hash tables • Pros • Extremely quick, well understood CS problem • Scale almost linearly • Cons • Performing complex queries against the values can be slow and complex • Key-value pair data stores in which the product also keeps a time stamp on the data for versioning are a particular case of key-value pair databases
  • 8. Document based databases This is a large category of data stores which allow to work with data stored in a particular document format. Among popular document formats used to store data, we could mention: • XML • JSON • YAML In this kind of data stores, documents are identified by a unique key, which allows for quick retrieval of the information. Although conceptually all data stores in this category are relatively similar, there are still important differences from one product to another • Query methods (SQL like, Map/Reduce, etc.) • Replication • Data consistency Document based databases are schema-less
  • 9. MongoDB vs CouchDB • MongoDB • Very high volumes of data somewhat mutable data • Dynamic flexible queries, somewhat similar to SQL • Very quick queries • CouchDB • Very high volumes of mostly immutable data • Pre-defined queries, based on MapReduce, implemented in Javascript • Master-Master replication • Neither MongoDB nor CouchDB natively work with XML data, both work with JSON documents
  • 10. Document based databases • Among the many “Document based databases”, MongoDB is currently the most popular, closely followed by CouchDB • The MongoDB API is currently supported by both DB2 and Informix • That means that it is now very easy to migrate from mongoDB to any of those databases and store in a single repository both structured data and JSON documents
  • 11. Hosted Document databases • Both MongoDB and CouchDB are popular databases, which explains why there are many options to use both hosted and managed versions of these products • Cloudant is a fully managed version of BigCouch, which is in turn a high availability, fault tolerant version of CouchDB • Migrating from CouchDB or BigCouch to Cloudant is totally transparent • Both MongoDB and CouchDB scale very well by implementing sharding, which make them very well suited for born-on-cloud applications
  • 12. Graph databases Social networks have become one of the most representative applications of what is known as Web 2.0 • Storing and processing social graphs in relational is both complex and inefficient Unlike relational databases, this new kind of data stores focuses more on relationships than on data. For social networks kind of projects this results in: • Increased performance • Simpler and more natural development
  • 13. Hadoop Hadoop is a framework designed to process tasks that can be parallelized on extremely high volumes of data distributed over a large number of server nodes belonging to a cluster. It has four main components: • Hadoop common • Hadoop Distributed File System (HDFS) • Designed primarily to handle extremely high volumes of immutable data • Loading and deleting data is efficient, updating data is not • Hadoop YARN • Hadoop MapReduce Managing a complete Hadoop system is currently not for the faint of heart
  • 14. MapReduce MapReduce is the data processing algorithm that sits at the very core of Hadoop Developers need to implement for each query the following functions: • Map: In this phase the overall problem is divided into smaller problems which can be divided into smaller tasks (which can also be further broken down) that can be distributed to run on different server nodes • Reduce: In this second phase, the master node combines the answers received from the different nodes and processes them to produce a reply to the query
  • 15. MapReduce Hadoop allows to store any kind of data • Structured • Unstructured When using Hadoop to store structured data, in a data warehouse like environment, it is possible to use languages that automatically generate the code for the Map/Reduce functions • Apache Pig (pig latin) • Apache Hive (HiveQL, similar to SQL) • IBM Big SQL
  • 16. Analyzing streams of data Sometimes the amount of stored data is so large that it simply becomes impossible to perform real time analysis • In those cases, the best alternative is to analyze the stream of data before it is stored in the database • The main idea is that the data is kept outside the database (generally in RAM) during a certain window of time in order to detect a combination of events in a short period of time • Fraud detection • Digital marketing
  • 17. Polygot Persistence When working with applications that require extreme scaling, there is no solution that fits all challenges. It is likely that after careful analysis of the problem more than one datastore will be required to obtain the best performance. • This is known as “Polygot Persistence”
  • 18. Contact information On Twitter: @huibert (English), @huibert2 (Spanish) Web site: http://www.huibert-aalbers.com Blog: http://www.huibert-aalbers.com/blog