SlideShare ist ein Scribd-Unternehmen logo
1 von 4
Downloaden Sie, um offline zu lesen
NoSQL and the Semantic/ Social Web
                                         Irina Hutanu

                   Alexandru Ioan Cuza University, Computer Science,
                           Computional Linguistics (2nd year)
                               Faculty of Letters graduate

                           {Irina Hutanu, irina.hutanu@gmail.com}


         Abstract. NoSQL is a new and promising method of storing and managing the world
         wide information. “Not only SQL”[5], as many seem to define it, is spreading rapidly
         because of its popular non-relational principle, which allows a better distribution on a
         horizontal scale. Further on we will try to disambiguate this new born movement.




1 Introduction.

This type of database can handle a large amount of information because of some interesting
features that increase the storage power:

                  The Consistency requirement is limited. It is said you cannot have Consistency,
                   Availability and Partitioning at the the same time. ( CAP Theorem)
                  Key/ Value storage. A quite primitive manner to stockpile.
                  It runs on a large number of machines, the information being replicated and
                   partitioned among them.

Some of the most important and highly rated database applications that function in the above
manner are GoogleBigtable, HBase, Hypertable, AmazonDynamo, Voldemort, Cassandra, Riak,
CouchDB, MongoDB, Redis.

The data-driven sites like Amazon.com, Google, Facebook work with terabytes of information that
needs to be immediately scaled and partitioned in a very efficient manner. On the other hand, these
Internet giants also use tens of thousands of servers and machines located all around the world.
Consequently, many drawbacks and failures happen every second, but the transactions must stay
“always-on”. Every minor problem occuring while a customer/ user queries the database, causing
him/her to lose contact with the informational target, may lead to serious financial loss. Such risks
must not be taken for granted, therefore apps like Dynamo or Bigtable emerged. Their non-
relational architecture, incremental scalability and decentralized character offer a quite robust data
storage system.


2 Architecture
2.1 Partitioning Process

One important feature of a NoSQL system is that it has to scale incrementally the information. In
order for this to happen rapidly and consistently, Dynamo, for example, uses the idea of virtual
nodes in the partitioning process. That means that a node is not mapped only to one position but to
various ones, this way non-uniform distribution is not a problem. Also, if a specific node has
limited access or disappears because of a system failure, the data load contained in that virtual
node is available in some other nodes properly working.
Bigtable, another non-relational storage system, uses another type of partitioning and gathering-
           data tool. Being “a sparse, distributed, persistent multi-dimensional sorted map”[1] it uses rows,
           columns and timestamps. The partitioning process takes place dynamically and it is applied to the
           row’s range.


           2.2 Replication

            On the other hand, non-stop data availability is also assured by the replicational system. These
           apps replicate, in general, all the information acquired on multiple hosts in order to avoid loss of
           information and to offer durability.

           Bigtable, for instance, uses a replication process that allows information to be duplicated in
           different clusters, thus latency is avoided and data is assured against any loss: “The Personalized
           Search data is replicated across several Bigtable clusters to increase availability and to reduce
           latency due to distance from clients. The Personalized Search team originally built a client-side
           replication mechanism on top of Bigtable that ensured eventual consistency of all replicas. The
           current system now uses a replication subsystem that is built into the servers.”[1]




                                    Fig. 1. Partitioning and Replication in Dynamo1


           2.3 Consistency versus Availability

           If a multiple versions of the same data exist, they must be reconciled to avoid any possible system
           failures. Unfortunately, in a system that trades consistency for availability, reconciling divergent
           versions is almost impossible to obtain. Dynamo, for example, works with some vector clocks to
           filter the emergence of two or mode different versions of the same object. In some cases this
           method cannot control the number of the divergent versions, thus semantic reconciliation is used.
           However, this approach determines an overload of the entire system, so it’s used only if extreme
           cases ask for it.

           Anyway, with the exception of some minor issues that might cause problems like overloading, the
           choice of availability against consistency gave rise to some interesting and unexpected results,
           marking, to some extent, a real success: “The production use of Dynamo for the past year

1
    Image from Dynamo: Amazon’s Highly Available Key-value Store
demonstrates that decentralized techniques can be combined to provide a single highly-available
           system. Its success in one of the most challenging application environments shows that an
           eventual-consistent storage system can be a building block for highly-available applications.” [2]




                              Fig.2. Version evolution of an object over time2.



           2.4 Gossip Protocol

           This protocol is used both in the updating process and in detecting failures. If a node becomes
           unavailable it communicates its state to another node, allowing the reorganization of data between
           the functioning nodes. Thus the virtual nodes are programmed to contact one another every second
           in a random order to synchronize their history of membership changes.

           The process of failure detection is undergone through the same gossip protocol. A node is
           considered to be unavailable if it does not respond to the message of another node. The latter node
           will get the information required from another virtual node and periodically retries the first one to
           search for its recovery.

           This is in fact a decentralized manner of detection because we don’ have an upper, superior entity
           that points out the defective nodes. What we have is a gossip process that enables each node to
           “hear” about the new arrival or departure of other nodes: “Dynamo adopts a full membership
           model where each node is aware of the data hosted by its peers. To do this, each
           node actively gossips the full routing table with other nodes in the system. This model works well
           for a system that contains couple of hundreds of nodes.”[2]




2
    Image from Dynamo: Amazon’s Highly Available Key-value Store
Fig.3. Gossip-style process3.




           3. Final Remarks
           A somehow new movement in the storage domain, NoSQL succeds in dethroning classical SQL
           systems based on a relational and centralized information processing. The nowadays web realities
           imply the coordination, manipulation and gathering of vast quantities of data and knowledge. Thus
           the traditional database applications seem to have lost their applicability in favor of the non-
           relational systems that avoid to use joint operations or fixed schemas and, to some extent, even
           break the ACID guarantees by developing processes only “eventually consistent”[3].



           4. References
           [1] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach
           Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Grube, Bigtable: A Distributed Storage
           System for Structured Data, Appeared in: OSDI'06: Seventh Symposium on Operating System
           Design and Implementation,
           Seattle, WA, November, 2006.

           [2] Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,
           Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall
           and Werner Vogels, Dynamo: Amazon’s Highly Available Key-value Store, 2007

           [3] Werner Vogels, Eventually consistent- Revisited, 2008

           [4] SQL Databases Don't Scale

           [5] http://nosql-databases.org/



3
    Image from Pragmatic Programming Techniques

Weitere ähnliche Inhalte

Was ist angesagt?

DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTINGDISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTINGijcsit
 
Schema migrations in no sql
Schema migrations in no sqlSchema migrations in no sql
Schema migrations in no sqlDr-Dipali Meher
 
Using BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined NetworkingUsing BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined NetworkingIJCSIS Research Publications
 
Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engine
Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query EngineMeasuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engine
Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engineparekhnikunj
 
The Distributed Cloud
The Distributed CloudThe Distributed Cloud
The Distributed CloudWowd
 
Applications of SOA and Web Services in Grid Computing
Applications of SOA and Web Services in Grid ComputingApplications of SOA and Web Services in Grid Computing
Applications of SOA and Web Services in Grid Computingyht4ever
 
CodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudCodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudRightScale
 
Grid computing dis
Grid computing disGrid computing dis
Grid computing disgopishna09
 
iaetsd Controlling data deuplication in cloud storage
iaetsd Controlling data deuplication in cloud storageiaetsd Controlling data deuplication in cloud storage
iaetsd Controlling data deuplication in cloud storageIaetsd Iaetsd
 
Cassandra advanced-I
Cassandra advanced-ICassandra advanced-I
Cassandra advanced-Iachudhivi
 
Data Ware House System in Cloud Environment
Data Ware House System in Cloud EnvironmentData Ware House System in Cloud Environment
Data Ware House System in Cloud EnvironmentIJERA Editor
 
Advantage of distributed database over centralized database
Advantage of distributed database over centralized databaseAdvantage of distributed database over centralized database
Advantage of distributed database over centralized databaseAadesh Shrestha
 
An-Insight-about-Glusterfs-and-it's-Enforcement-Techniques
An-Insight-about-Glusterfs-and-it's-Enforcement-TechniquesAn-Insight-about-Glusterfs-and-it's-Enforcement-Techniques
An-Insight-about-Glusterfs-and-it's-Enforcement-TechniquesManikandan Selvaganesh
 

Was ist angesagt? (16)

DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTINGDISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
DISTRIBUTED SCHEME TO AUTHENTICATE DATA STORAGE SECURITY IN CLOUD COMPUTING
 
Schema migrations in no sql
Schema migrations in no sqlSchema migrations in no sql
Schema migrations in no sql
 
WJCAT2-13707877
WJCAT2-13707877WJCAT2-13707877
WJCAT2-13707877
 
Using BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined NetworkingUsing BIG DATA implementations onto Software Defined Networking
Using BIG DATA implementations onto Software Defined Networking
 
Cassandra architecture
Cassandra architectureCassandra architecture
Cassandra architecture
 
Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engine
Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query EngineMeasuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engine
Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engine
 
The Distributed Cloud
The Distributed CloudThe Distributed Cloud
The Distributed Cloud
 
Consistency in NoSQL
Consistency in NoSQLConsistency in NoSQL
Consistency in NoSQL
 
Applications of SOA and Web Services in Grid Computing
Applications of SOA and Web Services in Grid ComputingApplications of SOA and Web Services in Grid Computing
Applications of SOA and Web Services in Grid Computing
 
CodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudCodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the Cloud
 
Grid computing dis
Grid computing disGrid computing dis
Grid computing dis
 
iaetsd Controlling data deuplication in cloud storage
iaetsd Controlling data deuplication in cloud storageiaetsd Controlling data deuplication in cloud storage
iaetsd Controlling data deuplication in cloud storage
 
Cassandra advanced-I
Cassandra advanced-ICassandra advanced-I
Cassandra advanced-I
 
Data Ware House System in Cloud Environment
Data Ware House System in Cloud EnvironmentData Ware House System in Cloud Environment
Data Ware House System in Cloud Environment
 
Advantage of distributed database over centralized database
Advantage of distributed database over centralized databaseAdvantage of distributed database over centralized database
Advantage of distributed database over centralized database
 
An-Insight-about-Glusterfs-and-it's-Enforcement-Techniques
An-Insight-about-Glusterfs-and-it's-Enforcement-TechniquesAn-Insight-about-Glusterfs-and-it's-Enforcement-Techniques
An-Insight-about-Glusterfs-and-it's-Enforcement-Techniques
 

Ähnlich wie NoSql And The Semantic Web

Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesEditor Jacotech
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic WebStefan Ceriu
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebStefan Prutianu
 
Real time eventual consistency
Real time eventual consistencyReal time eventual consistency
Real time eventual consistencyijfcstjournal
 
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESijccsa
 
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESijccsa
 
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESijccsa
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...IJCERT JOURNAL
 
Cluster computing pptl (2)
Cluster computing pptl (2)Cluster computing pptl (2)
Cluster computing pptl (2)Rohit Jain
 
Clustercomputingpptl2 120204125126-phpapp01
Clustercomputingpptl2 120204125126-phpapp01Clustercomputingpptl2 120204125126-phpapp01
Clustercomputingpptl2 120204125126-phpapp01Ankit Soni
 
CLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdf
CLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdfCLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdf
CLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdfyadavkarthik4437
 
Cidr11 paper32
Cidr11 paper32Cidr11 paper32
Cidr11 paper32jujukoko
 
Megastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive servicesMegastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive servicesJoão Gabriel Lima
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataDebajani Mohanty
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Dataneirew J
 

Ähnlich wie NoSql And The Semantic Web (20)

Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic Web
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic Web
 
Real time eventual consistency
Real time eventual consistencyReal time eventual consistency
Real time eventual consistency
 
NOSQL
NOSQLNOSQL
NOSQL
 
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
 
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
 
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUES
 
ICICCE0298
ICICCE0298ICICCE0298
ICICCE0298
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
 
Cluster computing pptl (2)
Cluster computing pptl (2)Cluster computing pptl (2)
Cluster computing pptl (2)
 
Clustercomputingpptl2 120204125126-phpapp01
Clustercomputingpptl2 120204125126-phpapp01Clustercomputingpptl2 120204125126-phpapp01
Clustercomputingpptl2 120204125126-phpapp01
 
No Sql Databases
No Sql DatabasesNo Sql Databases
No Sql Databases
 
CLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdf
CLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdfCLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdf
CLOUD COMPUTING CHANTI-130 ( FOR THE COMPUTING2).pdf
 
Cloud Computing
Cloud Computing Cloud Computing
Cloud Computing
 
Cidr11 paper32
Cidr11 paper32Cidr11 paper32
Cidr11 paper32
 
Megastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive servicesMegastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive services
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Data
 
The NoSQL Movement
The NoSQL MovementThe NoSQL Movement
The NoSQL Movement
 

Kürzlich hochgeladen

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Kürzlich hochgeladen (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

NoSql And The Semantic Web

  • 1. NoSQL and the Semantic/ Social Web Irina Hutanu Alexandru Ioan Cuza University, Computer Science, Computional Linguistics (2nd year) Faculty of Letters graduate {Irina Hutanu, irina.hutanu@gmail.com} Abstract. NoSQL is a new and promising method of storing and managing the world wide information. “Not only SQL”[5], as many seem to define it, is spreading rapidly because of its popular non-relational principle, which allows a better distribution on a horizontal scale. Further on we will try to disambiguate this new born movement. 1 Introduction. This type of database can handle a large amount of information because of some interesting features that increase the storage power:  The Consistency requirement is limited. It is said you cannot have Consistency, Availability and Partitioning at the the same time. ( CAP Theorem)  Key/ Value storage. A quite primitive manner to stockpile.  It runs on a large number of machines, the information being replicated and partitioned among them. Some of the most important and highly rated database applications that function in the above manner are GoogleBigtable, HBase, Hypertable, AmazonDynamo, Voldemort, Cassandra, Riak, CouchDB, MongoDB, Redis. The data-driven sites like Amazon.com, Google, Facebook work with terabytes of information that needs to be immediately scaled and partitioned in a very efficient manner. On the other hand, these Internet giants also use tens of thousands of servers and machines located all around the world. Consequently, many drawbacks and failures happen every second, but the transactions must stay “always-on”. Every minor problem occuring while a customer/ user queries the database, causing him/her to lose contact with the informational target, may lead to serious financial loss. Such risks must not be taken for granted, therefore apps like Dynamo or Bigtable emerged. Their non- relational architecture, incremental scalability and decentralized character offer a quite robust data storage system. 2 Architecture 2.1 Partitioning Process One important feature of a NoSQL system is that it has to scale incrementally the information. In order for this to happen rapidly and consistently, Dynamo, for example, uses the idea of virtual nodes in the partitioning process. That means that a node is not mapped only to one position but to various ones, this way non-uniform distribution is not a problem. Also, if a specific node has limited access or disappears because of a system failure, the data load contained in that virtual node is available in some other nodes properly working.
  • 2. Bigtable, another non-relational storage system, uses another type of partitioning and gathering- data tool. Being “a sparse, distributed, persistent multi-dimensional sorted map”[1] it uses rows, columns and timestamps. The partitioning process takes place dynamically and it is applied to the row’s range. 2.2 Replication On the other hand, non-stop data availability is also assured by the replicational system. These apps replicate, in general, all the information acquired on multiple hosts in order to avoid loss of information and to offer durability. Bigtable, for instance, uses a replication process that allows information to be duplicated in different clusters, thus latency is avoided and data is assured against any loss: “The Personalized Search data is replicated across several Bigtable clusters to increase availability and to reduce latency due to distance from clients. The Personalized Search team originally built a client-side replication mechanism on top of Bigtable that ensured eventual consistency of all replicas. The current system now uses a replication subsystem that is built into the servers.”[1] Fig. 1. Partitioning and Replication in Dynamo1 2.3 Consistency versus Availability If a multiple versions of the same data exist, they must be reconciled to avoid any possible system failures. Unfortunately, in a system that trades consistency for availability, reconciling divergent versions is almost impossible to obtain. Dynamo, for example, works with some vector clocks to filter the emergence of two or mode different versions of the same object. In some cases this method cannot control the number of the divergent versions, thus semantic reconciliation is used. However, this approach determines an overload of the entire system, so it’s used only if extreme cases ask for it. Anyway, with the exception of some minor issues that might cause problems like overloading, the choice of availability against consistency gave rise to some interesting and unexpected results, marking, to some extent, a real success: “The production use of Dynamo for the past year 1 Image from Dynamo: Amazon’s Highly Available Key-value Store
  • 3. demonstrates that decentralized techniques can be combined to provide a single highly-available system. Its success in one of the most challenging application environments shows that an eventual-consistent storage system can be a building block for highly-available applications.” [2] Fig.2. Version evolution of an object over time2. 2.4 Gossip Protocol This protocol is used both in the updating process and in detecting failures. If a node becomes unavailable it communicates its state to another node, allowing the reorganization of data between the functioning nodes. Thus the virtual nodes are programmed to contact one another every second in a random order to synchronize their history of membership changes. The process of failure detection is undergone through the same gossip protocol. A node is considered to be unavailable if it does not respond to the message of another node. The latter node will get the information required from another virtual node and periodically retries the first one to search for its recovery. This is in fact a decentralized manner of detection because we don’ have an upper, superior entity that points out the defective nodes. What we have is a gossip process that enables each node to “hear” about the new arrival or departure of other nodes: “Dynamo adopts a full membership model where each node is aware of the data hosted by its peers. To do this, each node actively gossips the full routing table with other nodes in the system. This model works well for a system that contains couple of hundreds of nodes.”[2] 2 Image from Dynamo: Amazon’s Highly Available Key-value Store
  • 4. Fig.3. Gossip-style process3. 3. Final Remarks A somehow new movement in the storage domain, NoSQL succeds in dethroning classical SQL systems based on a relational and centralized information processing. The nowadays web realities imply the coordination, manipulation and gathering of vast quantities of data and knowledge. Thus the traditional database applications seem to have lost their applicability in favor of the non- relational systems that avoid to use joint operations or fixed schemas and, to some extent, even break the ACID guarantees by developing processes only “eventually consistent”[3]. 4. References [1] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Grube, Bigtable: A Distributed Storage System for Structured Data, Appeared in: OSDI'06: Seventh Symposium on Operating System Design and Implementation, Seattle, WA, November, 2006. [2] Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels, Dynamo: Amazon’s Highly Available Key-value Store, 2007 [3] Werner Vogels, Eventually consistent- Revisited, 2008 [4] SQL Databases Don't Scale [5] http://nosql-databases.org/ 3 Image from Pragmatic Programming Techniques