CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx

A
Asst.Prof. M.GokilavaniProfessor um KL university
CCS334 BIG DATAANALYTICS
(R-21 III (I Sem))
Department of Artificial Intelligence and Data Science )
Session 3
by
Asst.Prof.M.Gokilavani
NIET
9/19/2023 Department of AI & DS 1
TEXT BOOKS
• Michael Minelli, Michelle Chambers, and AmbigaDhiraj, "Big Data,
Big Analytics: Emerging Business Intelligence and Analytic Trends for
Today's Businesses", Wiley, 2013.
• Eric Sammer, "Hadoop Operations", O'Reilley, 2012.
• Sadalage, Pramod J. “NoSQL distilled”, 2013.
REFERENCES
• E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive",
O'Reilley, 2012.
• Lars George, "HBase: The Definitive Guide", O'Reilley, 2011.
• Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010.
9/19/2023 Department of AI & DS 2
Topics covered in Unit 2 session
9/19/2023 Department of AI & DS 3
UNIT II NOSQL DATA MANAGEMENT
Introduction to NoSQL – aggregate data models – key-value and
document data models – relationships – graph databases – schema
less databases – materialized views – distribution models – master-
slave replication – consistency - Cassandra – Cassandra data model
– Cassandra examples – Cassandra clients.
Distribution Models
• Depending on the distribution model the data store can give us the ability:
• To handle large Quantity of data
• To process a grater read or write traffic
• To have more availability in the case of network slowdowns of
breakages.
• There are two path for distribution:
• Sharding: Sharding distributes different data across multiple
servers, so each server acts as the single source for a subset of
data.
• Replication: Replication copies data across multiple servers,
so each bit of data can be found in multiple places.
9/19/2023 Department of AI & DS 4
Distributed Models
• Single server
• Sharding
• Master Slave Replication
• Peer-to-Peer Replication
• Combining Sharding & Replication
9/19/2023 Department of AI & DS 5
Single Server
• It is the first and simplest distribution option.
• Also if NoSQL database are designed to run on a cluster they can be
used in a single server application.
• This make sense if a NoSQL database is more suited for the
application data model.
• Graph database are the more obvious.
• If data usage is most about processing aggregates, than a key or a
document store may be useful.
9/19/2023 Department of AI & DS 6
Sharding
• Often, a data store is busy because different people are accessing
different part of the dataset.
• In this cases we can support horizontal scalability by putting different
part of the data onto different servers (Sharding)
• The concept of sharding is not new as a part of application logic.
• It consists in put all the customer with surname A-D on one shard and
E-G to another
• This complicates the programming model as the application code
needs to distributed the load across the shards.
• In the ideal setting we have each user to talk one server and the load is
balanced.
• Of course the ideal case is rare.
9/19/2023 Department of AI & DS 7
9/19/2023 Department of AI & DS 8
Sharding and NoSQL
• In general, many NoSQL databases offers auto- sharding.
• This can make much easier to use sharding in an application.
• Sharding is especially valuable for performance because it improves
read and write performances.
• It scales read and writes on the different nodes of the same cluster.
9/19/2023 Department of AI & DS 9
Master – Slave Replication
• In this setting one node is designated as the master, or primary ad the
other as slaves.
• The master is the authoritative source for the date and designed to
process updates and send them to slaves.
• The slaves are used for read operations.
• This allows us to scale in data intensive dataset.
9/19/2023 Department of AI & DS 10
9/19/2023 Department of AI & DS 11
Master – Slave Replication
• We can scale horizontally by adding more slaves.
• But, we are limited by the ability of the master in processing incoming
data.
An advantage is read resilience.
Disadvantage:
• Also if the master fails the slaves can still handle read requests.
• Anyway writes are not allowed until the master is not restored.
9/19/2023 Department of AI & DS 12
Master – Slave Replication
• Another characteristic is that a slave can be appointed as master.
• Masters can be appointed manually or automatically.
• In order to achieve resilience we need that read and write paths are
different.
• This is normally done using separate database connections.
• Disadvantage: Replication in master-slave have the analyzed
advantages but it come with the problem of inconsistency.
• The readers reading from the slaves can read data not updated.
9/19/2023 Department of AI & DS 13
Master – Slave Replication in MongoDB
9/19/2023 Department of AI & DS 14
Peer to Peer Replication
• Master-Slave replication helps with read scalability but has problems
on scalability of writes.
• Moreover, it provides resilience on read but not on writes.
• The master is still a single point of failure.
• Peer-to-Peer attacks these problems by not having a master.
• All the replica are equal (accept writes and reads).
• With a Peer-to-Peer we can have node failures without lose write
capability and losing data.
9/19/2023 Department of AI & DS 15
9/19/2023 Department of AI & DS 16
Combine Sharing
• We have multiple masters, but each data has a single master.
• Depending on the configuration we can decide the master for each
group of data.
• Peer-to-Peer and sharding is a common strategy for column-family
databases.
• This is commonly composed using replication of the shards.
9/19/2023 Department of AI & DS 17
Topics to be covered in next session 4
• Cassandra data model
9/19/2023 Department of AI & DS 18
Thank you!!!
1 von 18

Recomendados

FILE STRUCTURE IN DBMS von
FILE STRUCTURE IN DBMSFILE STRUCTURE IN DBMS
FILE STRUCTURE IN DBMSAbhishek Dutta
7.2K views59 Folien
Decision tree induction \ Decision Tree Algorithm with Example| Data science von
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceMaryamRehman6
1.4K views30 Folien
Big Data Analytics with Hadoop von
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
441.9K views64 Folien
Machine Learning von
Machine LearningMachine Learning
Machine LearningRahul Kumar
1.5K views27 Folien
ML Basics von
ML BasicsML Basics
ML BasicsSrujanaMerugu1
914 views162 Folien
Cs6703 grid and cloud computing unit 1 von
Cs6703 grid and cloud computing unit 1Cs6703 grid and cloud computing unit 1
Cs6703 grid and cloud computing unit 1RMK ENGINEERING COLLEGE, CHENNAI
18.2K views83 Folien

Más contenido relacionado

Was ist angesagt?

Machine learning von
Machine learning Machine learning
Machine learning Saurabh Agrawal
3.6K views20 Folien
Data mining techniques unit 1 von
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1malathieswaran29
835 views69 Folien
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop... von
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Simplilearn
2.1K views139 Folien
Machine Learning and Real-World Applications von
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
23.8K views19 Folien
Architecture of data mining system von
Architecture of data mining systemArchitecture of data mining system
Architecture of data mining systemramya marichamy
3.8K views10 Folien
Data cube computation von
Data cube computationData cube computation
Data cube computationRashmi Sheikh
31.1K views14 Folien

Was ist angesagt?(20)

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop... von Simplilearn
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop...
Simplilearn2.1K views
Machine Learning and Real-World Applications von MachinePulse
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
MachinePulse23.8K views
Architecture of data mining system von ramya marichamy
Architecture of data mining systemArchitecture of data mining system
Architecture of data mining system
ramya marichamy3.8K views
Data cube computation von Rashmi Sheikh
Data cube computationData cube computation
Data cube computation
Rashmi Sheikh31.1K views
Lecture6 introduction to data streams von hktripathy
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
hktripathy3.8K views
In-Memory Big Data Analytics von Supreeth M P
In-Memory Big Data AnalyticsIn-Memory Big Data Analytics
In-Memory Big Data Analytics
Supreeth M P261 views
Intro to Big Data and NoSQL von Don Demcsak
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
Don Demcsak4.8K views
HADOOP TECHNOLOGY ppt von sravya raju
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju38.7K views
Notes from Coursera Deep Learning courses by Andrew Ng von dataHacker. rs
Notes from Coursera Deep Learning courses by Andrew NgNotes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew Ng
dataHacker. rs6.6K views
Introduction to Data Mining and Data Warehousing von Kamal Acharya
Introduction to Data Mining and Data WarehousingIntroduction to Data Mining and Data Warehousing
Introduction to Data Mining and Data Warehousing
Kamal Acharya2.7K views
Cloud File System with GFS and HDFS von Dr Neelesh Jain
Cloud File System with GFS and HDFS  Cloud File System with GFS and HDFS
Cloud File System with GFS and HDFS
Dr Neelesh Jain3.1K views
Big Data Platforms: An Overview von C. Scyphers
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
C. Scyphers28K views
5.5 graph mining von Krish_ver2
5.5 graph mining5.5 graph mining
5.5 graph mining
Krish_ver29.5K views

Similar a CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx

Session 1 Introduction to NoSQL.pptx von
 Session 1 Introduction to NoSQL.pptx Session 1 Introduction to NoSQL.pptx
Session 1 Introduction to NoSQL.pptxAsst.Prof. M.Gokilavani
14 views18 Folien
Report 2.0.docx von
Report 2.0.docxReport 2.0.docx
Report 2.0.docxpinstechwork
7 views6 Folien
Introduction to NoSQL and MongoDB von
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBAhmed Farag
158 views26 Folien
NoSQLDatabases von
NoSQLDatabasesNoSQLDatabases
NoSQLDatabasesAdi Challa
395 views39 Folien
No sql database von
No sql databaseNo sql database
No sql databasevishal gupta
365 views25 Folien
6 Data Modeling for NoSQL 2/2 von
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2Fabio Fumarola
12.2K views45 Folien

Similar a CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx(20)

Introduction to NoSQL and MongoDB von Ahmed Farag
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
Ahmed Farag158 views
NoSQLDatabases von Adi Challa
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa395 views
6 Data Modeling for NoSQL 2/2 von Fabio Fumarola
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
Fabio Fumarola12.2K views
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt... von PascalDesmarets1
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
Hackolade Tutorial - part 3 - Query-driven data modeling based on access patt...
PascalDesmarets159 views
A survey on data mining and analysis in hadoop and mongo db von Alexander Decker
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
Alexander Decker307 views
A survey on data mining and analysis in hadoop and mongo db von Alexander Decker
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
Alexander Decker542 views
Cloud Computing: The Hard Problems Never Go Away von ZendCon
Cloud Computing: The Hard Problems Never Go AwayCloud Computing: The Hard Problems Never Go Away
Cloud Computing: The Hard Problems Never Go Away
ZendCon2.4K views
Relational and non relational database 7 von abdulrahmanhelan
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
abdulrahmanhelan230 views
Objectivity/DB: A Multipurpose NoSQL Database von InfiniteGraph
Objectivity/DB: A Multipurpose NoSQL DatabaseObjectivity/DB: A Multipurpose NoSQL Database
Objectivity/DB: A Multipurpose NoSQL Database
InfiniteGraph5.1K views

Más de Asst.Prof. M.Gokilavani

AI3391 ARTIFICIAL INTELLIGENCE Assignment 1 questions.pdf von
AI3391 ARTIFICIAL INTELLIGENCE Assignment 1 questions.pdfAI3391 ARTIFICIAL INTELLIGENCE Assignment 1 questions.pdf
AI3391 ARTIFICIAL INTELLIGENCE Assignment 1 questions.pdfAsst.Prof. M.Gokilavani
11 views16 Folien
AI3391 ARTIFICIAL INTELLIGENCE UNIT II notes.pdf von
AI3391 ARTIFICIAL INTELLIGENCE UNIT II notes.pdfAI3391 ARTIFICIAL INTELLIGENCE UNIT II notes.pdf
AI3391 ARTIFICIAL INTELLIGENCE UNIT II notes.pdfAsst.Prof. M.Gokilavani
21 views34 Folien
AI3391 ARTIFICIAL INTELLIGENCE Unit I notes.pdf von
AI3391 ARTIFICIAL INTELLIGENCE Unit I notes.pdfAI3391 ARTIFICIAL INTELLIGENCE Unit I notes.pdf
AI3391 ARTIFICIAL INTELLIGENCE Unit I notes.pdfAsst.Prof. M.Gokilavani
16 views33 Folien
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ... von
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...Asst.Prof. M.Gokilavani
10 views17 Folien
AI3391 Session 12 Local search in continious space.pptx von
AI3391 Session 12 Local search in continious space.pptxAI3391 Session 12 Local search in continious space.pptx
AI3391 Session 12 Local search in continious space.pptxAsst.Prof. M.Gokilavani
7 views16 Folien
AI3391 Session 11 Hill climbing algorithm.pptx von
AI3391 Session 11 Hill climbing algorithm.pptxAI3391 Session 11 Hill climbing algorithm.pptx
AI3391 Session 11 Hill climbing algorithm.pptxAsst.Prof. M.Gokilavani
4 views22 Folien

Más de Asst.Prof. M.Gokilavani(20)

AI3391 Session 13 searching with Non-Deterministic Actions and partial observ... von Asst.Prof. M.Gokilavani
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...
AI3391 Session 13 searching with Non-Deterministic Actions and partial observ...
AI3391 ARTIFICIAL INTELLIGENCE Session 8 Iterative deepening DFS and Bidirect... von Asst.Prof. M.Gokilavani
AI3391 ARTIFICIAL INTELLIGENCE Session 8 Iterative deepening DFS and Bidirect...AI3391 ARTIFICIAL INTELLIGENCE Session 8 Iterative deepening DFS and Bidirect...
AI3391 ARTIFICIAL INTELLIGENCE Session 8 Iterative deepening DFS and Bidirect...
AI3391 ARTIFICIAL INTELLIGENCE Session 7 Uniformed search strategies.pptx von Asst.Prof. M.Gokilavani
AI3391 ARTIFICIAL INTELLIGENCE Session 7 Uniformed search strategies.pptxAI3391 ARTIFICIAL INTELLIGENCE Session 7 Uniformed search strategies.pptx
AI3391 ARTIFICIAL INTELLIGENCE Session 7 Uniformed search strategies.pptx
AI3391 ARTIFICAL INTELLIGENCE Session 5 Problem Solving Agent and searching f... von Asst.Prof. M.Gokilavani
AI3391 ARTIFICAL INTELLIGENCE Session 5 Problem Solving Agent and searching f...AI3391 ARTIFICAL INTELLIGENCE Session 5 Problem Solving Agent and searching f...
AI3391 ARTIFICAL INTELLIGENCE Session 5 Problem Solving Agent and searching f...

Último

REPORT Data Science EXPERT LECTURE.doc von
REPORT Data Science EXPERT LECTURE.docREPORT Data Science EXPERT LECTURE.doc
REPORT Data Science EXPERT LECTURE.docParulkhatri11
7 views9 Folien
Here comes the Loom - Ya!vaConf.pdf von
Here comes the Loom - Ya!vaConf.pdfHere comes the Loom - Ya!vaConf.pdf
Here comes the Loom - Ya!vaConf.pdfKrystian Zybała
7 views134 Folien
Plant Design Report-Oil Refinery.pdf von
Plant Design Report-Oil Refinery.pdfPlant Design Report-Oil Refinery.pdf
Plant Design Report-Oil Refinery.pdfSafeen Yaseen Ja'far
9 views10 Folien
Pitchbook Repowerlab.pdf von
Pitchbook Repowerlab.pdfPitchbook Repowerlab.pdf
Pitchbook Repowerlab.pdfVictoriaGaleano
10 views12 Folien
unit 1.pptx von
unit 1.pptxunit 1.pptx
unit 1.pptxrrbornarecm
6 views53 Folien
dummy.pptx von
dummy.pptxdummy.pptx
dummy.pptxJamesLamp
7 views2 Folien

Último(20)

REPORT Data Science EXPERT LECTURE.doc von Parulkhatri11
REPORT Data Science EXPERT LECTURE.docREPORT Data Science EXPERT LECTURE.doc
REPORT Data Science EXPERT LECTURE.doc
Parulkhatri117 views
Programmable Logic Devices : SPLD and CPLD von Usha Mehta
Programmable Logic Devices : SPLD and CPLDProgrammable Logic Devices : SPLD and CPLD
Programmable Logic Devices : SPLD and CPLD
Usha Mehta44 views
Ansari: Practical experiences with an LLM-based Islamic Assistant von M Waleed Kadous
Ansari: Practical experiences with an LLM-based Islamic AssistantAnsari: Practical experiences with an LLM-based Islamic Assistant
Ansari: Practical experiences with an LLM-based Islamic Assistant
M Waleed Kadous13 views
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R... von IJCNCJournal
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
IJCNCJournal5 views
Integrating Sustainable Development Goals (SDGs) in School Education von SheetalTank1
Integrating Sustainable Development Goals (SDGs) in School EducationIntegrating Sustainable Development Goals (SDGs) in School Education
Integrating Sustainable Development Goals (SDGs) in School Education
SheetalTank120 views
MongoDB.pdf von ArthyR3
MongoDB.pdfMongoDB.pdf
MongoDB.pdf
ArthyR353 views
2023-12 Emarei MRI Tool Set E2I0501ST (TQ).pdf von Philipp Daum
2023-12 Emarei MRI Tool Set E2I0501ST (TQ).pdf2023-12 Emarei MRI Tool Set E2I0501ST (TQ).pdf
2023-12 Emarei MRI Tool Set E2I0501ST (TQ).pdf
Philipp Daum6 views
GDSC Mikroskil Members Onboarding 2023.pdf von gdscmikroskil
GDSC Mikroskil Members Onboarding 2023.pdfGDSC Mikroskil Members Onboarding 2023.pdf
GDSC Mikroskil Members Onboarding 2023.pdf
gdscmikroskil75 views
Web Dev Session 1.pptx von VedVekhande
Web Dev Session 1.pptxWeb Dev Session 1.pptx
Web Dev Session 1.pptx
VedVekhande23 views
Programmable Switches for Programmable Logic Devices von Usha Mehta
Programmable Switches for Programmable Logic DevicesProgrammable Switches for Programmable Logic Devices
Programmable Switches for Programmable Logic Devices
Usha Mehta37 views
DevFest 2023 Daegu Speech_이재규, Implementing easy and simple chat with gol... von JQLEE6
DevFest 2023 Daegu Speech_이재규,  Implementing easy and simple chat with gol...DevFest 2023 Daegu Speech_이재규,  Implementing easy and simple chat with gol...
DevFest 2023 Daegu Speech_이재규, Implementing easy and simple chat with gol...
JQLEE616 views
Solution Challenge Introduction.pptx von GDSCCEC
Solution Challenge Introduction.pptxSolution Challenge Introduction.pptx
Solution Challenge Introduction.pptx
GDSCCEC13 views
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf von AlhamduKure
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdfASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
AlhamduKure11 views

CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx

  • 1. CCS334 BIG DATAANALYTICS (R-21 III (I Sem)) Department of Artificial Intelligence and Data Science ) Session 3 by Asst.Prof.M.Gokilavani NIET 9/19/2023 Department of AI & DS 1
  • 2. TEXT BOOKS • Michael Minelli, Michelle Chambers, and AmbigaDhiraj, "Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses", Wiley, 2013. • Eric Sammer, "Hadoop Operations", O'Reilley, 2012. • Sadalage, Pramod J. “NoSQL distilled”, 2013. REFERENCES • E. Capriolo, D. Wampler, and J. Rutherglen, "Programming Hive", O'Reilley, 2012. • Lars George, "HBase: The Definitive Guide", O'Reilley, 2011. • Eben Hewitt, "Cassandra: The Definitive Guide", O'Reilley, 2010. 9/19/2023 Department of AI & DS 2
  • 3. Topics covered in Unit 2 session 9/19/2023 Department of AI & DS 3 UNIT II NOSQL DATA MANAGEMENT Introduction to NoSQL – aggregate data models – key-value and document data models – relationships – graph databases – schema less databases – materialized views – distribution models – master- slave replication – consistency - Cassandra – Cassandra data model – Cassandra examples – Cassandra clients.
  • 4. Distribution Models • Depending on the distribution model the data store can give us the ability: • To handle large Quantity of data • To process a grater read or write traffic • To have more availability in the case of network slowdowns of breakages. • There are two path for distribution: • Sharding: Sharding distributes different data across multiple servers, so each server acts as the single source for a subset of data. • Replication: Replication copies data across multiple servers, so each bit of data can be found in multiple places. 9/19/2023 Department of AI & DS 4
  • 5. Distributed Models • Single server • Sharding • Master Slave Replication • Peer-to-Peer Replication • Combining Sharding & Replication 9/19/2023 Department of AI & DS 5
  • 6. Single Server • It is the first and simplest distribution option. • Also if NoSQL database are designed to run on a cluster they can be used in a single server application. • This make sense if a NoSQL database is more suited for the application data model. • Graph database are the more obvious. • If data usage is most about processing aggregates, than a key or a document store may be useful. 9/19/2023 Department of AI & DS 6
  • 7. Sharding • Often, a data store is busy because different people are accessing different part of the dataset. • In this cases we can support horizontal scalability by putting different part of the data onto different servers (Sharding) • The concept of sharding is not new as a part of application logic. • It consists in put all the customer with surname A-D on one shard and E-G to another • This complicates the programming model as the application code needs to distributed the load across the shards. • In the ideal setting we have each user to talk one server and the load is balanced. • Of course the ideal case is rare. 9/19/2023 Department of AI & DS 7
  • 9. Sharding and NoSQL • In general, many NoSQL databases offers auto- sharding. • This can make much easier to use sharding in an application. • Sharding is especially valuable for performance because it improves read and write performances. • It scales read and writes on the different nodes of the same cluster. 9/19/2023 Department of AI & DS 9
  • 10. Master – Slave Replication • In this setting one node is designated as the master, or primary ad the other as slaves. • The master is the authoritative source for the date and designed to process updates and send them to slaves. • The slaves are used for read operations. • This allows us to scale in data intensive dataset. 9/19/2023 Department of AI & DS 10
  • 12. Master – Slave Replication • We can scale horizontally by adding more slaves. • But, we are limited by the ability of the master in processing incoming data. An advantage is read resilience. Disadvantage: • Also if the master fails the slaves can still handle read requests. • Anyway writes are not allowed until the master is not restored. 9/19/2023 Department of AI & DS 12
  • 13. Master – Slave Replication • Another characteristic is that a slave can be appointed as master. • Masters can be appointed manually or automatically. • In order to achieve resilience we need that read and write paths are different. • This is normally done using separate database connections. • Disadvantage: Replication in master-slave have the analyzed advantages but it come with the problem of inconsistency. • The readers reading from the slaves can read data not updated. 9/19/2023 Department of AI & DS 13
  • 14. Master – Slave Replication in MongoDB 9/19/2023 Department of AI & DS 14
  • 15. Peer to Peer Replication • Master-Slave replication helps with read scalability but has problems on scalability of writes. • Moreover, it provides resilience on read but not on writes. • The master is still a single point of failure. • Peer-to-Peer attacks these problems by not having a master. • All the replica are equal (accept writes and reads). • With a Peer-to-Peer we can have node failures without lose write capability and losing data. 9/19/2023 Department of AI & DS 15
  • 17. Combine Sharing • We have multiple masters, but each data has a single master. • Depending on the configuration we can decide the master for each group of data. • Peer-to-Peer and sharding is a common strategy for column-family databases. • This is commonly composed using replication of the shards. 9/19/2023 Department of AI & DS 17
  • 18. Topics to be covered in next session 4 • Cassandra data model 9/19/2023 Department of AI & DS 18 Thank you!!!