SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
Achieving High Availability, Scalable Storage and
                      Performance at Portal do Aluno
                - Distributed Databases Overview Study -
                         Luis Carlos Dill Junges1 , Ivan Linhares Martins1
                1
                    Certi Foundation – Federal University of Santa Catarina (UFSC)
                     Postal Box 5053 – 88.040-970 – Florian´ polis – SC – Brazil
                                                             o
                           luis.junges@gmail.com, ilm@certi.org.br

        Abstract. This document is a consolidation study made at Certi Foundation for
        the federal project called Portal do Aluno. This project will be an internet portal
        with the main objective to spread knowledge among kids between 12 and 18
        years old from Brazilian elementary schools. Considering the fact that there will
        be around 5 millions students using it montlhy, some problems are inevitable on
        the storage system and at the availability of the portal. With this problem in
        mind, a comprehensive study has been made on the new flavor of distributed
        databases available at the market. The results of such study has been published
        on this document for appreciation with some considerations on each one.

        Resumo. Este documento e uma consolidacao de um estudo realizado na
                                      ´                  ¸˜
        fundacao Certi para o projeto do governo Federal chamado Portal do Aluno.
               ¸˜
        Este projeto ser´ voltado para os estudantes entre 12 e 18 anos da rede de en-
                         a
        sino b´ sico do Brasil com o objetivo de ser tornar um portal para divulgacao e
               a                                                                   ¸˜
        geracao de assuntos relacionados a formacao dos estudantes.
             ¸˜                              `       ¸˜
        Tal projeto ter´ algo em torno de 5 milh˜ es de usu´ rios que inevitavelmente
                        a                            o          a
        trar˜ o alguns problemas em relacao ao backend do sistema nos aspectos de
            a                                ¸˜
        escalabilidade de dados e alta disponibilidade do portal. Com isto em mente,
        um estudo elaborado das solucoes atuais dos novos sistemas de banco de dados
                                        ¸˜
        distribu´dos foi feito e os seus resultados s˜ o apresentados neste documento.
                 ı                                   a

1. Introduction
This study was born due the problem being faced at Portal do Aluno project. The project
consists of a social network focused on spreading the knowledge among students between
12 and 18 years old from the elementary schools of Brazil.
        Those problems are related to the availability, the storage capacity and also the per-
formance of the overall system. Although minor projects were developed using standard
relational databases which inevitable have become the SPOF1 of the system, this project
had required a better solution in order to meet new requirements. This study shows a way
to overcome such problems by using a new kind of Open Source tools available at the
developing community.
       This new set of tools have been driven by the NoSQL movement which had began
around 2009 to solve the limitations found on handling big data volumes and workloads.
   1
       Single Point of Failure
This group has the aim to redirect the database development to horizontal scalability by
relaxing on some aspects. One of those aspects could be shown on the fact that such
system often provide eventually consistency and, therefore, are not fully compliant with
the ACID2 properties.
        This article is organized as follow: Section 2 describes the project. Section 3
presents the problems and the motivation to study a new approach. Section 4 describes the
general characteristics of those distributed systems and Section 5 gives a brief overview
of the major Open Source players. Section 6 shows a comparative table of the properties
of each system. Section 7 presents the most prominent solution that best meets the Portal
do Aluno’s requirements. Finally Section 8 gives the conclusion.

2. Portal do Aluno
Portal do Aluno is a social learning environment project from the Ministry of Education of
Brazil(known as MEC). It has characteristics of social network and has the aim to provide
an educational portal with colaborative tools for schools tasks. It will be an extension
of elementary schools on the internet trying to promote the integration among schools,
students and teachers around Brazil by the possibility of having groups for researchs,
discussions and others common tasks.
        This portal is subdivided into modules with specific content. On some of them,
there is the possibility of uploading files like images and any other type of document in-
cluding video. As the number of users of this portal is potentially high from the begining,
scalability and availability are essencial and lead to the problem described at the next
section.

3. Problem
Relational databases are powerful and robust in such way that there is widespread of
applications and systems using them. However, they show limitations when large sets of
data need to be stored and when high availability of the system is mandatory. On the first
issue is provably impossible3 to keep the ACID properties while scaling across multiple
machines. Until now it has been tipically solved by high end RDMS4 through the use
of replication system with master-slave architecture as shown on figure 1. Even being a
working approach, this model has a prominent SPOF5 on the master. If it fails, the system
goes down. This approach reachs scalabilty by forwarding the reads to free slaves (load
balancer) and all writes on master, being again the bottleneck of the data flow.
         The second issue is usually solved by hardware solutions based on RAID6 . At a
glance, the goal of this model is achieved by replicating the data among several hard drives
and swapping them accordly on a failure. RAID systems, however, are not a complete
safety solution because they can not survive without a backup if the server holding them
is lost by fire or flooding or any other reason.
   2
     Atomicity, Consistency, Isolation, Durability
   3
     See Section 4.1 - CAP Theorem
   4
     Relational Database Management System
   5
     Single Point of Failure
   6
     Redundant Array of Independent Drives
Figure 1. Tradicional Relational Database Scaling


       Those relational issues have been claimed to be solved (or at least on the road) by
a new flavor of distributed databases relatively new at the developing community. Those
systems promise to overcome efficiently the lacks found at relational systems by relaxing
on some characteristics like consistency and strong consideration on nodes failures. The
next section introduces such systems.

4. Distributed Databases
One of the major advantages of using a distributed database over a traditional relational
database is the possibility to scale the reads and writes easily by just adding new nodes
on the cluster. Relational databases can have this issue solved with the reads but scale the
writes are virtually impossible and at the end it becomes too expensive.
       A brief comparison between relational databases and those new systems is de-
scribed at table 1.

                                    Table, Columns, Rows
                                    ACID properties fully satisfied
              Relational Databases Normalized to avoid data duplication
                                    Strong storage schema
                                    Queries fully supported
                                    Table like domain
                                    Data identified only by a key
              Distributed Databases
                                    Schema-less
                                    Data integrity on application’s code
                                    Eventual Consistency
                                    Support for queries is limited

                       Table 1. Relational vs Distributed Database




       Those systems also adopt a key-value model or a document-oriented approach:
Key-value Basically the data is associated with a key like a map. It is only possible to
      retrieve the data by knowing the key. They usually are able to retrieve the data at
      a constant time independet of how many entries have been stored.
Document-Oriented The data is stored in a format which represents a document. It does
      not have any schema and some fields present at some document may not exist on
      others documents. Some implentations use JSON or XML as protocol layer for
      the data.

4.1. CAP Theorem
The CAP theorem [Gilbert and Lynch 2002] was born as some properties that shared sys-
tem must choose from. Their properties are as follow:
     • Strong Consistency:All Clients see the same view even in presence of updates.
     • High Availability: All Clients can find some replica of the data, even in presence
       of failures.
     • Partition-Tolerance: The system properties hold event when the system is parti-
       tioned by node failures, network problems or any other reason.
       The theorem states that a distributed system can always have only two of three
CAP properties at the same time. At distributed databases, it is usually used Availability
and Partition Tolerance. In order to handle the consistency, some of them use versioning
systems [Manassiev and Amza 2005] [Amza et al. 2003] for update’s conflicts resolution.

5. Available Solutions
On this section some approaches of distributed databases are shown explaining the sin-
gular characteristics of each one and a practical example where they are being used
at the moment. Those new systems are based on a large set of Open Source Tec-
nhologies [Bortnikov 2009] which makes them really atractive and although there is
not a consolidated benchmark already accepted by the community [Binnig et al. 2009]
[Cryans et al. 2008], some points still can be made on each solution.

5.1. Voldemort
Voldemort is a relatively new Open Source project at the community as it has been re-
leased at the beginning of this year. It has been entirely written in Java and it’s based on
the Key-Value model having just 2 functions to interact with (set and get). As the own de-
velopers said, voldemort is basically just a big, distributed, persistent, fault-tolerant
hash table. For the data persistency it uses MySQL or BDB as backend on each node.
As it has the concept of eventual consistency [Vogels 2008], it uses a simple incremental
versioning sytem for each update on the data. The application is responsible for fixing
integrity problems and other issues that may happen on the data stored.
        This project is currently in use on production at Linkedin.com on some parts which
require high-availability. The speed access observed at the production environment are at
order of 19384 requisitions per second (req/sec) for reading and 16559 req/sec for writing.
         As some good points of this project, there is a well written documentation. There
is also a good replication schema of data that can be manually configured in terms of how
many writes and reads have to be made in order to validade a store or a reading operation,
respectively. As an example, an exception will be through if it is set to write at least on 3
node and just 2 nodes are up.
       This project’s design also has taken into consideration working properly on load
balancers with clustering proposals as described on figure 2.
        As one major advantage, this project does not have a SPOF delivering, therefore,
a high available system for critical applications.
        The drawbacks are the impossibility to add a new node on a live cluster which
means the entire system has to be shut down in order to configure a new node. Other
point is that all code processes a value at a time in memory (no cursor or streaming)
meaning that the values need to fit comfortably in memory.




                           Figure 2. Voldemort’s clustering architecture


5.2. HBase
HBase is the official Hadoop project database. It is an Open Source, distributed,
column-oriented store modeled after Google’s BigTable [Chang et al. 2008]. Just as
BigTable leverages the distributed data storage provided by the Google File System
[Ghemawat et al. 2003], HBase provides BigTable-like capabilities on top of Hadoop us-
ing the HDFS7 .
        HBase is very good and powerful project which gives the users the opportunity to
run parallel processing on the cluster through the use of MapReduce jobs. The current
release (0.20) has removed the major drawbacks of having an SPOF and high reading
latency.
        Its architecture works through the use of a distribution of masters and region
servers along the cluster’s machines as described at figure 3.
   7
       Hadoop Distributed File System
HBase is currently in use at several places including a Yahoo’s Cluster with 10000
PCs. There is also some companies also doing tests with HBase running at Amazon
Elastic Compute Cloud (know as Amazon EC2)




                              Figure 3. HBase’s architecture


5.3. Redis
Redis is a key-value distributed solution with the advantage of having more operations
than just the tradicional set and get API. Those operations include handling multiple sets
and some simple queries on the dataset stored with the garantee of being atomic(just
some operations). It also supports storing more datatypes instead of just string or binaries
including list, sets and ordered sets.
       Other major point is that it is increadibly fast, able to perform around 110000
SETs/second and around 81000 GETs/second according to the developer’s test case. It
works by doing assynchronous calls which means data can be lost between the time is
was requested to write and it definitely happened (not atomic operations). There is also
the constraint that all dataset needs to fit on a single device.

5.4. Cassandra
Cassandra was born to solve Facebook’s problems. It is a more complete key-value
database based on Dynamos’s fully distributed database design [DeCandia et al. 2007]
and BigTable’s Column family based data model [Chang et al. 2008].
       This project has high-availability without a SPOF with incremental scalability
through the option of adding new nodes on a live environment without disturbing the
applications currently running on the database. It also has the garantee of being atomic
on a single Column Family’s operation.
        Drawbacks include the poor and inexistent documentation with a very obscure and
difficult API that will pass though a heavy remodelling on the next releases.
        Cassandra is currently at use on Facebook on the inbox search where it is claimed
to exist 40 TB of data distributed along 120 machines at separated data centers. It is also
in use at Rackspace and Digg.com

5.5. MongoDB
MongoDB is document-oriented approach for scalable distributed databases. It is an Open
Source implementation entirely written in C++ with commercial support. Its major ad-
vantage is the query support that made it unique on this feature. It works through a BSON
(binary JSON) format for big data handling (photos and videos) with support for MapRe-
duce jobs.




                        Figure 4. MongoDB’s Architecture Design

        As a drawback it has an intricated cluster schema as shown on figure 4 which has
several SPOFs. It is subdivided into config servers (store metadata on which mongo shard
is the data) and mongo shards that store the data. There is also the mongo instances that
are entry points for clients. Right now it is a relatively new implementation without full
support for sharding and data replication has constraint on the number of nodes (2 nodes
only) that can be used.

5.6. Tokyo Cabinet/Tyrant
Tokyo Cabinet/Tyrant is an Open Source project claimed to be in use at mixi.jp, a japanese
social network with 10000 updates/second through MemCache. The use of this tool seems
to apply on the handling of 20 millions entries of data(20 bytes each).
       Although no test has been made, this solution claims to be really fast on writing
and reading operations able to perform around 58000 req/seconds. It also seems to support
ACID properties with several differents storing approaches (Hash, B-Tree) for each type
of data being stored.
        As drawbacks, it does not have a good documentation and few projects are using
it.

5.7. CouchDB
CouchDB is a very easy to run project with a document-oriented approach. It has a to-
tally unstructured schema-less storing backend throught the use of JSON format as data
handling. It is very similar to Amazon’s SimpleDB solution with assynchronous replica-
tion of data. It also have a browser administration console where it is possible to create
MapReduce jobs, backup operations and views statements like those ones found at rela-
tional systems.
       It uses http requests to manage the dataset which makes it connectable to any soft-
ware able to perform http requests. CouchDB has a major advantage because it provides
a query like engine which enables the user to build their own queries properly for the
application being developed.
        As a big drawback, it does not satisfy the concept of scalability because all the
data being stored needs to fit on a single device. The availability of the system is achieved
by a client router which forward the queries to the desired backend service. So, CouchDB
is not a distributed database at the current moment but has some interesting features that
make it eligible to be on this listing. One of those features is the MapReduce support. An-
other one is the approach of having the entire dataset or part of it stored directly at client’s
computer with assynchronous replication. By doing this, the workload at the backend can
be reduced because the replication will happen on an appropriatte moment. This feature
could also be used for mobile devices that get synchronized at base station (bluetooh,
wireless, cable) and can access a website after that without having to connect to the in-
ternet. This leads the user to avoid spending money on data carrier or even witnessing
low conections speeds which invariable leads to a great website’s user-experience. There
is some issues regarded to the type of data that can be handled on such approach or even
if modifications made by the user can, at a later moment, be synchronized with the main
data server without consistency problems. Despite of those issues, this approach seems
interesting to delivery fast content on mobile devices. At Portal do Aluno, this feature
could be used to connect the users to the dashboard with the option of editing comments
on mobile devices that later can be synchronized with the main server.
        CouchDB is in use at several projects and websites because of its easiness it pro-
vides through http requests. At the moment it is a very young project with strong security
problems and at Alpha development. Even with such issues, it’s a project to keep an eye
on.

5.8. MemCache
According to the developers, MemCache is an Open Source, high-performance, dis-
tributed memory object caching system, generic in nature, but intended for use in
speeding up dynamic web applications by alleviating database load.
        It is a really simple and robust solution to improve the performance of web appli-
cations by dropping the reading time by accessing the cache layer instead of the database
itself. As real examples, it has reach the speed of 38000 req/sec at Flickr.com.
        It does not have persistency layer but is able to work properly doing load balancing
just by adding new nodes on the cluster. It is in use at several big projects on the internet
and can be used directly by the API of some high end RDBS as a cache layer (MySQL
and PostgreSQL).




                   Figure 5. Solution using intermediate storage layers


        Although MemCache does not provide persistency, it can be very useful with so-
lutions that do provide the storage together with commercial services available at market
like Amazon Simple Storage Service (Amazon S3). By using the MemCache, there is a
considerable drop on the number of reading operations that hit the Amazon S3 and con-
sequently the month payment. Figure 5 shows this approach by using MemCache as an
intermediate layer for applications that do required high availability but do not have the
capacity to setup a private cluster for it. Instead, they use commercial storage solutions
for the cluster [Brantner et al. 2008] [Palankar et al. 2008] and drop the month payment
by adding aditional storage layers (disk and MemCache).

5.9. Others
There is also a lot more of distributed databases projects with differents approaches. Most
of them seems to be at beginning development without enough documentation and robust-
ness or without persistence layer (In-memory only) . Some of them include:

              •   ThruDB                    •   LightCloud                 •   Kay
              •   MemcachedDB               •   Scalaris                   •   NMDB
              •   Disco                     •   Riak                       •   Hazelcast
              •   KeySpace                  •   Dynomite                   •   MNesia
              •   Ringo                     •   Hypertable


6. Solution’s Benchmark
Until now there is not an accepted benchmark for those new systems [Binnig et al. 2009]
[Cryans et al. 2008] and the decision to use or not a system is based on their properties.
Table 2 shows a comparative listing of some properties of each system. This table is a
snapshot of the systems made at December 2009.
Name           Language   Fault-Tolerance             Persistence             Client                Data Model               Documentation   Production
   Voldemort      Java       Partitioned, Replicated     Berkeley DB,MySQL       Java API              Structured, Blob, Text   Good            LinkedIn
   HBase          Java       Replication, Partitiong     Custom on-disk          Custom API, Thrift    BigTable                 Good            Yahoo
   Cassandra      Java       Replication, Partitiong     Custom on-disk          Thrift                BigTable,Dynamo          Poor            Facebook
   CouchDB        Earlang    Replication                 Custom on-disk          HTTP,JSON             Document-Oriented        Good            UbuntuOne
   MongoDB        C++        Replication                 Custom on-disk,GridFS   Java, C++ Drivers     Document-Oriented        Good            SourceForge
   Hypertable     C++        Replication,Partitioning    Custom on-disk          Java, Thrift          BigTable                 Good            Baidu
   ThruDB         C++        Replication                 Custom on-disk          Thrift                Document-Oriented        Medium          —
   Ringo          Earlang    Replication,Partitioning    Custom on-disk          HTTP                  Blob                     Medium          Nokia
   Tokyo Tyrant   C          —                           B-Tree,Hash             ANSI C                Document-Oriented        Poor            Mixi.jp
   Scalaris       Earlang    Replication, Partitioning   In-Memory               Java, Earlang, HTTP   Blob                     Medium          OnScale
   MemCache       C          Partitiong                  In-Memory               Python, java, Ruby    —-                       Good            Several Projects
   Dynomite       Earlang    Replication, Partitioning   —-                      Custom, Thrift        Blob                     Poor            PowerSet
   Kai            Earlang    Partitioning                —-                      —-                    Blob                     Poor            —


                     Table 2. Comparative List of Distributed Databases Properties


7. Adopted Solution
For the Portal do Aluno, there is some requirements that needs to be meet as easily scal-
able, high available and fast content retrieval storage.
        On the presented solutions, HBase (Hadoop Project) and Voldemort have shown
as the major robust solutions available at the moment which completely meet the easily
scalable goal proposed by the NoSQL movement. Hbase had the problem of having a
high latency and a SPOF on the master being inappropriate to serve web pages in real
time. At the current version (≥ 0.20), those problems seem to be solved. Voldemort
is a really robust approach but it’s not optimized for large data sets the Portal do Aluno
needs to store(Video, Photos). This limitations is because it uses mysql or BDB as storage
backend.
        MongoDB has some problems on the scalability it can provide an also the SPOFs
it have. It does have, however, a strong support for queries which make it eligible to be
tested as backend on Portal do Aluno. It also have its own binary format (BSON) that
makes it relatively fast.
        As a result, HBase could be used for future tests based on its good documentation,
easiness of use and robustness it provides. Considering that some intricated and complex
joins have to be made at Portal do Aluno, MongoDB could also be used for tests due its
query support. Togheter with one of those solutions, MemCache could be used to speed
up the performance of the Portal.

8. Conclusion
It is not possible to deny that a new set of databases are being developed from now
on. They have started as commercial competitive advantages from private companies to
solve internet related problems. Assume that they will replace the old fashion relational
database model is naive thinking because just some little and special applications require
their power. RDMS also have more features that are well known to implement and deal
with not to metion the fact they can organize the data as it is at the real world with strong
integrity which make them independent of application.
        The new flavor, however, has shown themselves as good promises in terms of
scalability and availability using ordinary hardware as a cheap solution. Businesses which
relies completely on a single access point with the client will see those new tools as
mandatory in order to have more availability on their applications.
          The requisite of having a high available system is mandatory for some applica-
tions. Until now this property was satisfied when huge investments were made at backup
systems with RAID and others devices and software which have just increased the num-
ber of SPOFs. Of course those new systems are not completely trustful at the moment
because they are relatively new and may end up on failures, but they could be considered
as an option to fulfill this requirement.
        The use of those systems should be analyzed for each application as it is known
the entire storing logic will have to be glued to the application’s code. The possiblity
of dealing with complicated queries (search, insert, update, delete) is, at the current mo-
ment, very little or inexistent. Also, the normalization theorem usually found at relational
database to avoid data replication does not apply at all for them. This new approach have
several replicas of data inside it at several places which need to be synchronized and keep
up to date entirely by the application’s code.
        As a thumb rule, those new tools are encouraged to be used when the requisites
of the application match at least one of the following statements: There is a huge amount
of data that needs to be stored; The data set has an easy representation that does not
required complex joins or queries and it naturally fits the key-value model; The future of
the application will have a high-demand access which will lead to performance problems
without clustering.

References
Amza, C., Cox, A. L., and Zwaenepoel, W. (2003). Distributed versioning: consistent
  replication for scaling back-end databases of dynamic content web sites. In Middle-
  ware ’03: Proceedings of the ACM/IFIP/USENIX 2003 International Conference on
  Middleware, pages 282–304, New York, NY, USA. Springer-Verlag New York, Inc.
Binnig, C., Kossmann, D., Kraska, T., and Loesing, S. (2009). How is the weather to-
  morrow?: towards a benchmark for the cloud. In DBTest ’09: Proceedings of the
  Second International Workshop on Testing Database Systems, pages 1–6, New York,
  NY, USA. ACM.
Bortnikov, E. (2009). Open-source grid technologies for web-scale computing. SIGACT
  News, 40(2):87–93.
Brantner, M., Florescu, D., Graf, D., Kossmann, D., and Kraska, T. (2008). Building
  a database on s3. In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD inter-
  national conference on Management of data, pages 251–264, New York, NY, USA.
  ACM.
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra,
  T., Fikes, A., and Gruber, R. E. (2008). Bigtable: A distributed storage system for
  structured data. ACM Trans. Comput. Syst., 26(2):1–26.
Cryans, J.-D., April, A., and Abran, A. (2008). Criteria to compare cloud computing
  with current database technology. In IWSM/Metrikon/Mensura ’08: Proceedings of
  the International Conferences on Software Process and Product Measurement, pages
  114–126, Berlin, Heidelberg. Springer-Verlag.
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A.,
  Sivasubramanian, S., Vosshall, P., and Vogels, W. (2007). Dynamo: amazon’s highly
available key-value store. In SOSP ’07: Proceedings of twenty-first ACM SIGOPS
  symposium on Operating systems principles, pages 205–220, New York, NY, USA.
  ACM.
Ghemawat, S., Gobioff, H., and Leung, S.-T. (2003). The google file system. In SOSP
  ’03: Proceedings of the nineteenth ACM symposium on Operating systems principles,
  pages 29–43, New York, NY, USA. ACM.
Gilbert, S. and Lynch, N. (2002). Brewer’s conjecture and the feasibility of consistent,
   available, partition-tolerant web services. SIGACT News, 33(2):51–59.
Manassiev, K. and Amza, C. (2005). Scalable database replication through dynamic mul-
  tiversioning. In CASCON ’05: Proceedings of the 2005 conference of the Centre for
  Advanced Studies on Collaborative research, pages 141–154. IBM Press.
Palankar, M. R., Iamnitchi, A., Ripeanu, M., and Garfinkel, S. (2008). Amazon s3 for
   science grids: a viable solution? In DADC ’08: Proceedings of the 2008 international
   workshop on Data-aware distributed computing, pages 55–64, New York, NY, USA.
   ACM.
Vogels, W. (2008).       Eventually consistent - revisited. http://www.
  allthingsdistributed.com/2008/12/eventually consistent.
  html, Visited in December 2009.

Weitere ähnliche Inhalte

Was ist angesagt?

Sitkoski Metadata Proposal - Final
Sitkoski Metadata Proposal - FinalSitkoski Metadata Proposal - Final
Sitkoski Metadata Proposal - FinalKeith Sitkoski
 
Are Data Models Superfluous Nov2003
Are Data Models Superfluous Nov2003Are Data Models Superfluous Nov2003
Are Data Models Superfluous Nov2003Andries_vanRenssen
 
Charleston 2012 - The Future of Serials in a Linked Data World
Charleston 2012 - The Future of Serials in a Linked Data WorldCharleston 2012 - The Future of Serials in a Linked Data World
Charleston 2012 - The Future of Serials in a Linked Data WorldProQuest
 
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...New York University
 
Whitepaper sones GraphDB (eng)
Whitepaper sones GraphDB (eng)Whitepaper sones GraphDB (eng)
Whitepaper sones GraphDB (eng)sones GmbH
 
Database overview
Database overviewDatabase overview
Database overviewSayem Khan
 
Data Integration in Multi-sources Information Systems
Data Integration in Multi-sources Information SystemsData Integration in Multi-sources Information Systems
Data Integration in Multi-sources Information Systemsijceronline
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data toIJwest
 
Web Page Segmentation for Querying Healthcare Repository
Web Page Segmentation for Querying Healthcare RepositoryWeb Page Segmentation for Querying Healthcare Repository
Web Page Segmentation for Querying Healthcare RepositoryAastha Madaan
 
Xml based data exchange in the
Xml based data exchange in theXml based data exchange in the
Xml based data exchange in theIJwest
 
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...dbpublications
 
Hyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyHyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyIJwest
 
Advancing life sciences with IBM reference architecture for genomics
Advancing life sciences with IBM reference architecture for genomicsAdvancing life sciences with IBM reference architecture for genomics
Advancing life sciences with IBM reference architecture for genomicsPatrick Berghaeger
 
Gellish A Standard Data And Knowledge Representation Language And Ontology
Gellish   A Standard Data And Knowledge Representation Language And OntologyGellish   A Standard Data And Knowledge Representation Language And Ontology
Gellish A Standard Data And Knowledge Representation Language And OntologyAndries_vanRenssen
 
Role of Cataloger in the 21st Century Academic Library
Role of Cataloger in the 21st Century Academic LibraryRole of Cataloger in the 21st Century Academic Library
Role of Cataloger in the 21st Century Academic LibraryNew York University
 
Types of data bases
Types of data basesTypes of data bases
Types of data basesJanu Jahnavi
 

Was ist angesagt? (19)

Sitkoski Metadata Proposal - Final
Sitkoski Metadata Proposal - FinalSitkoski Metadata Proposal - Final
Sitkoski Metadata Proposal - Final
 
Are Data Models Superfluous Nov2003
Are Data Models Superfluous Nov2003Are Data Models Superfluous Nov2003
Are Data Models Superfluous Nov2003
 
27 fcs157al2
27 fcs157al227 fcs157al2
27 fcs157al2
 
Charleston 2012 - The Future of Serials in a Linked Data World
Charleston 2012 - The Future of Serials in a Linked Data WorldCharleston 2012 - The Future of Serials in a Linked Data World
Charleston 2012 - The Future of Serials in a Linked Data World
 
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...Mending the Gap between Library's Electronic and Print Collections in ILS and...
Mending the Gap between Library's Electronic and Print Collections in ILS and...
 
Whitepaper sones GraphDB (eng)
Whitepaper sones GraphDB (eng)Whitepaper sones GraphDB (eng)
Whitepaper sones GraphDB (eng)
 
Dbms 1
Dbms 1Dbms 1
Dbms 1
 
Database overview
Database overviewDatabase overview
Database overview
 
Data Integration in Multi-sources Information Systems
Data Integration in Multi-sources Information SystemsData Integration in Multi-sources Information Systems
Data Integration in Multi-sources Information Systems
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data to
 
Web Page Segmentation for Querying Healthcare Repository
Web Page Segmentation for Querying Healthcare RepositoryWeb Page Segmentation for Querying Healthcare Repository
Web Page Segmentation for Querying Healthcare Repository
 
Database
DatabaseDatabase
Database
 
Xml based data exchange in the
Xml based data exchange in theXml based data exchange in the
Xml based data exchange in the
 
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...
BFC: High-Performance Distributed Big-File Cloud Storage Based On Key-Value S...
 
Hyponymy extraction of domain ontology
Hyponymy extraction of domain ontologyHyponymy extraction of domain ontology
Hyponymy extraction of domain ontology
 
Advancing life sciences with IBM reference architecture for genomics
Advancing life sciences with IBM reference architecture for genomicsAdvancing life sciences with IBM reference architecture for genomics
Advancing life sciences with IBM reference architecture for genomics
 
Gellish A Standard Data And Knowledge Representation Language And Ontology
Gellish   A Standard Data And Knowledge Representation Language And OntologyGellish   A Standard Data And Knowledge Representation Language And Ontology
Gellish A Standard Data And Knowledge Representation Language And Ontology
 
Role of Cataloger in the 21st Century Academic Library
Role of Cataloger in the 21st Century Academic LibraryRole of Cataloger in the 21st Century Academic Library
Role of Cataloger in the 21st Century Academic Library
 
Types of data bases
Types of data basesTypes of data bases
Types of data bases
 

Andere mochten auch

Certify Travel & Expense
Certify Travel & ExpenseCertify Travel & Expense
Certify Travel & Expensebrandilee81
 
DesignSherpa
DesignSherpaDesignSherpa
DesignSherpaawjapko
 
BrokerSherpa
BrokerSherpaBrokerSherpa
BrokerSherpaawjapko
 
My Project Portfolio
My Project PortfolioMy Project Portfolio
My Project Portfolioguest07207a
 
BrokerSherpa NAR November 2010
BrokerSherpa NAR November 2010BrokerSherpa NAR November 2010
BrokerSherpa NAR November 2010awjapko
 
Slide Ebay Natale 09
Slide Ebay Natale 09Slide Ebay Natale 09
Slide Ebay Natale 09guesta9fd3e2
 
Atlanta boardrealtors
Atlanta boardrealtorsAtlanta boardrealtors
Atlanta boardrealtorsawjapko
 
designsherpa2
designsherpa2designsherpa2
designsherpa2awjapko
 
Cre Sherpa
Cre SherpaCre Sherpa
Cre Sherpaawjapko
 
Facts On Design Blogger Conference Attendees
Facts On Design Blogger Conference AttendeesFacts On Design Blogger Conference Attendees
Facts On Design Blogger Conference Attendeesawjapko
 
Battery Testing Center
Battery Testing CenterBattery Testing Center
Battery Testing Centeraqchen
 
Chapter 2 Perception, The Self, And
Chapter 2  Perception, The Self, AndChapter 2  Perception, The Self, And
Chapter 2 Perception, The Self, AndJessica Tapman
 
Ddb 1.6-design issues
Ddb 1.6-design issuesDdb 1.6-design issues
Ddb 1.6-design issuesEsar Qasmi
 
ინგლისის ღირშესანიშნაობები
ინგლისის ღირშესანიშნაობებიინგლისის ღირშესანიშნაობები
ინგლისის ღირშესანიშნაობებიmakaegilkashvili
 

Andere mochten auch (17)

Certify Travel & Expense
Certify Travel & ExpenseCertify Travel & Expense
Certify Travel & Expense
 
DesignSherpa
DesignSherpaDesignSherpa
DesignSherpa
 
BrokerSherpa
BrokerSherpaBrokerSherpa
BrokerSherpa
 
My Project Portfolio
My Project PortfolioMy Project Portfolio
My Project Portfolio
 
BrokerSherpa NAR November 2010
BrokerSherpa NAR November 2010BrokerSherpa NAR November 2010
BrokerSherpa NAR November 2010
 
Slide Ebay Natale 09
Slide Ebay Natale 09Slide Ebay Natale 09
Slide Ebay Natale 09
 
Atlanta boardrealtors
Atlanta boardrealtorsAtlanta boardrealtors
Atlanta boardrealtors
 
designsherpa2
designsherpa2designsherpa2
designsherpa2
 
Cre Sherpa
Cre SherpaCre Sherpa
Cre Sherpa
 
Sudip
SudipSudip
Sudip
 
Facts On Design Blogger Conference Attendees
Facts On Design Blogger Conference AttendeesFacts On Design Blogger Conference Attendees
Facts On Design Blogger Conference Attendees
 
Battery Testing Center
Battery Testing CenterBattery Testing Center
Battery Testing Center
 
Chapter 2 Perception, The Self, And
Chapter 2  Perception, The Self, AndChapter 2  Perception, The Self, And
Chapter 2 Perception, The Self, And
 
Ddb 1.6-design issues
Ddb 1.6-design issuesDdb 1.6-design issues
Ddb 1.6-design issues
 
Amrnyy
AmrnyyAmrnyy
Amrnyy
 
(:
(:(:
(:
 
ინგლისის ღირშესანიშნაობები
ინგლისის ღირშესანიშნაობებიინგლისის ღირშესანიშნაობები
ინგლისის ღირშესანიშნაობები
 

Ähnlich wie Distributed Databases Overview

Evaluation of graph databases
Evaluation of graph databasesEvaluation of graph databases
Evaluation of graph databasesijaia
 
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4JOUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4Jijcsity
 
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityBarry Smith
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentationKlawal13
 
Towards a new hybrid approach for building documentoriented data wareh
Towards a new hybrid approach for building documentoriented data warehTowards a new hybrid approach for building documentoriented data wareh
Towards a new hybrid approach for building documentoriented data warehIJECEIAES
 
from local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspacefrom local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global DataspaceOpen Education Consortium
 
data science chapter-4,5,6
data science chapter-4,5,6data science chapter-4,5,6
data science chapter-4,5,6varshakumar21
 
Module-1.pptx63.pptx
Module-1.pptx63.pptxModule-1.pptx63.pptx
Module-1.pptx63.pptxShrinivasa6
 
Study on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemStudy on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemijitjournal
 
E.F. Codd (1970). Evolution of Current Generation Database Tech.docx
E.F. Codd (1970).  Evolution of Current Generation Database Tech.docxE.F. Codd (1970).  Evolution of Current Generation Database Tech.docx
E.F. Codd (1970). Evolution of Current Generation Database Tech.docxjacksnathalie
 
The MADlib Analytics Library
The MADlib Analytics Library The MADlib Analytics Library
The MADlib Analytics Library EMC
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...CSCJournals
 
Chapter – 1 Intro to DBS.pdf
Chapter – 1 Intro to DBS.pdfChapter – 1 Intro to DBS.pdf
Chapter – 1 Intro to DBS.pdfTamiratDejene1
 
Chapter – 1 Intro to DBS.pdf
Chapter – 1 Intro to DBS.pdfChapter – 1 Intro to DBS.pdf
Chapter – 1 Intro to DBS.pdfTamiratDejene1
 

Ähnlich wie Distributed Databases Overview (20)

Evaluation of graph databases
Evaluation of graph databasesEvaluation of graph databases
Evaluation of graph databases
 
MADHU.pptx
MADHU.pptxMADHU.pptx
MADHU.pptx
 
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4JOUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
 
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentation
 
Linked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter HaaseLinked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter Haase
 
Towards a new hybrid approach for building documentoriented data wareh
Towards a new hybrid approach for building documentoriented data warehTowards a new hybrid approach for building documentoriented data wareh
Towards a new hybrid approach for building documentoriented data wareh
 
from local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspacefrom local/regional OER Silos towards an OER Global Dataspace
from local/regional OER Silos towards an OER Global Dataspace
 
data science chapter-4,5,6
data science chapter-4,5,6data science chapter-4,5,6
data science chapter-4,5,6
 
Module-1.pptx63.pptx
Module-1.pptx63.pptxModule-1.pptx63.pptx
Module-1.pptx63.pptx
 
Dbms new manual
Dbms new manualDbms new manual
Dbms new manual
 
Data science unit2
Data science unit2Data science unit2
Data science unit2
 
Study on potential capabilities of a nodb system
Study on potential capabilities of a nodb systemStudy on potential capabilities of a nodb system
Study on potential capabilities of a nodb system
 
database ppt(2)
database ppt(2)database ppt(2)
database ppt(2)
 
RDBMS to NoSQL. An overview.
RDBMS to NoSQL. An overview.RDBMS to NoSQL. An overview.
RDBMS to NoSQL. An overview.
 
E.F. Codd (1970). Evolution of Current Generation Database Tech.docx
E.F. Codd (1970).  Evolution of Current Generation Database Tech.docxE.F. Codd (1970).  Evolution of Current Generation Database Tech.docx
E.F. Codd (1970). Evolution of Current Generation Database Tech.docx
 
The MADlib Analytics Library
The MADlib Analytics Library The MADlib Analytics Library
The MADlib Analytics Library
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...
 
Chapter – 1 Intro to DBS.pdf
Chapter – 1 Intro to DBS.pdfChapter – 1 Intro to DBS.pdf
Chapter – 1 Intro to DBS.pdf
 
Chapter – 1 Intro to DBS.pdf
Chapter – 1 Intro to DBS.pdfChapter – 1 Intro to DBS.pdf
Chapter – 1 Intro to DBS.pdf
 

Kürzlich hochgeladen

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Kürzlich hochgeladen (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Distributed Databases Overview

  • 1. Achieving High Availability, Scalable Storage and Performance at Portal do Aluno - Distributed Databases Overview Study - Luis Carlos Dill Junges1 , Ivan Linhares Martins1 1 Certi Foundation – Federal University of Santa Catarina (UFSC) Postal Box 5053 – 88.040-970 – Florian´ polis – SC – Brazil o luis.junges@gmail.com, ilm@certi.org.br Abstract. This document is a consolidation study made at Certi Foundation for the federal project called Portal do Aluno. This project will be an internet portal with the main objective to spread knowledge among kids between 12 and 18 years old from Brazilian elementary schools. Considering the fact that there will be around 5 millions students using it montlhy, some problems are inevitable on the storage system and at the availability of the portal. With this problem in mind, a comprehensive study has been made on the new flavor of distributed databases available at the market. The results of such study has been published on this document for appreciation with some considerations on each one. Resumo. Este documento e uma consolidacao de um estudo realizado na ´ ¸˜ fundacao Certi para o projeto do governo Federal chamado Portal do Aluno. ¸˜ Este projeto ser´ voltado para os estudantes entre 12 e 18 anos da rede de en- a sino b´ sico do Brasil com o objetivo de ser tornar um portal para divulgacao e a ¸˜ geracao de assuntos relacionados a formacao dos estudantes. ¸˜ ` ¸˜ Tal projeto ter´ algo em torno de 5 milh˜ es de usu´ rios que inevitavelmente a o a trar˜ o alguns problemas em relacao ao backend do sistema nos aspectos de a ¸˜ escalabilidade de dados e alta disponibilidade do portal. Com isto em mente, um estudo elaborado das solucoes atuais dos novos sistemas de banco de dados ¸˜ distribu´dos foi feito e os seus resultados s˜ o apresentados neste documento. ı a 1. Introduction This study was born due the problem being faced at Portal do Aluno project. The project consists of a social network focused on spreading the knowledge among students between 12 and 18 years old from the elementary schools of Brazil. Those problems are related to the availability, the storage capacity and also the per- formance of the overall system. Although minor projects were developed using standard relational databases which inevitable have become the SPOF1 of the system, this project had required a better solution in order to meet new requirements. This study shows a way to overcome such problems by using a new kind of Open Source tools available at the developing community. This new set of tools have been driven by the NoSQL movement which had began around 2009 to solve the limitations found on handling big data volumes and workloads. 1 Single Point of Failure
  • 2. This group has the aim to redirect the database development to horizontal scalability by relaxing on some aspects. One of those aspects could be shown on the fact that such system often provide eventually consistency and, therefore, are not fully compliant with the ACID2 properties. This article is organized as follow: Section 2 describes the project. Section 3 presents the problems and the motivation to study a new approach. Section 4 describes the general characteristics of those distributed systems and Section 5 gives a brief overview of the major Open Source players. Section 6 shows a comparative table of the properties of each system. Section 7 presents the most prominent solution that best meets the Portal do Aluno’s requirements. Finally Section 8 gives the conclusion. 2. Portal do Aluno Portal do Aluno is a social learning environment project from the Ministry of Education of Brazil(known as MEC). It has characteristics of social network and has the aim to provide an educational portal with colaborative tools for schools tasks. It will be an extension of elementary schools on the internet trying to promote the integration among schools, students and teachers around Brazil by the possibility of having groups for researchs, discussions and others common tasks. This portal is subdivided into modules with specific content. On some of them, there is the possibility of uploading files like images and any other type of document in- cluding video. As the number of users of this portal is potentially high from the begining, scalability and availability are essencial and lead to the problem described at the next section. 3. Problem Relational databases are powerful and robust in such way that there is widespread of applications and systems using them. However, they show limitations when large sets of data need to be stored and when high availability of the system is mandatory. On the first issue is provably impossible3 to keep the ACID properties while scaling across multiple machines. Until now it has been tipically solved by high end RDMS4 through the use of replication system with master-slave architecture as shown on figure 1. Even being a working approach, this model has a prominent SPOF5 on the master. If it fails, the system goes down. This approach reachs scalabilty by forwarding the reads to free slaves (load balancer) and all writes on master, being again the bottleneck of the data flow. The second issue is usually solved by hardware solutions based on RAID6 . At a glance, the goal of this model is achieved by replicating the data among several hard drives and swapping them accordly on a failure. RAID systems, however, are not a complete safety solution because they can not survive without a backup if the server holding them is lost by fire or flooding or any other reason. 2 Atomicity, Consistency, Isolation, Durability 3 See Section 4.1 - CAP Theorem 4 Relational Database Management System 5 Single Point of Failure 6 Redundant Array of Independent Drives
  • 3. Figure 1. Tradicional Relational Database Scaling Those relational issues have been claimed to be solved (or at least on the road) by a new flavor of distributed databases relatively new at the developing community. Those systems promise to overcome efficiently the lacks found at relational systems by relaxing on some characteristics like consistency and strong consideration on nodes failures. The next section introduces such systems. 4. Distributed Databases One of the major advantages of using a distributed database over a traditional relational database is the possibility to scale the reads and writes easily by just adding new nodes on the cluster. Relational databases can have this issue solved with the reads but scale the writes are virtually impossible and at the end it becomes too expensive. A brief comparison between relational databases and those new systems is de- scribed at table 1. Table, Columns, Rows ACID properties fully satisfied Relational Databases Normalized to avoid data duplication Strong storage schema Queries fully supported Table like domain Data identified only by a key Distributed Databases Schema-less Data integrity on application’s code Eventual Consistency Support for queries is limited Table 1. Relational vs Distributed Database Those systems also adopt a key-value model or a document-oriented approach:
  • 4. Key-value Basically the data is associated with a key like a map. It is only possible to retrieve the data by knowing the key. They usually are able to retrieve the data at a constant time independet of how many entries have been stored. Document-Oriented The data is stored in a format which represents a document. It does not have any schema and some fields present at some document may not exist on others documents. Some implentations use JSON or XML as protocol layer for the data. 4.1. CAP Theorem The CAP theorem [Gilbert and Lynch 2002] was born as some properties that shared sys- tem must choose from. Their properties are as follow: • Strong Consistency:All Clients see the same view even in presence of updates. • High Availability: All Clients can find some replica of the data, even in presence of failures. • Partition-Tolerance: The system properties hold event when the system is parti- tioned by node failures, network problems or any other reason. The theorem states that a distributed system can always have only two of three CAP properties at the same time. At distributed databases, it is usually used Availability and Partition Tolerance. In order to handle the consistency, some of them use versioning systems [Manassiev and Amza 2005] [Amza et al. 2003] for update’s conflicts resolution. 5. Available Solutions On this section some approaches of distributed databases are shown explaining the sin- gular characteristics of each one and a practical example where they are being used at the moment. Those new systems are based on a large set of Open Source Tec- nhologies [Bortnikov 2009] which makes them really atractive and although there is not a consolidated benchmark already accepted by the community [Binnig et al. 2009] [Cryans et al. 2008], some points still can be made on each solution. 5.1. Voldemort Voldemort is a relatively new Open Source project at the community as it has been re- leased at the beginning of this year. It has been entirely written in Java and it’s based on the Key-Value model having just 2 functions to interact with (set and get). As the own de- velopers said, voldemort is basically just a big, distributed, persistent, fault-tolerant hash table. For the data persistency it uses MySQL or BDB as backend on each node. As it has the concept of eventual consistency [Vogels 2008], it uses a simple incremental versioning sytem for each update on the data. The application is responsible for fixing integrity problems and other issues that may happen on the data stored. This project is currently in use on production at Linkedin.com on some parts which require high-availability. The speed access observed at the production environment are at order of 19384 requisitions per second (req/sec) for reading and 16559 req/sec for writing. As some good points of this project, there is a well written documentation. There is also a good replication schema of data that can be manually configured in terms of how many writes and reads have to be made in order to validade a store or a reading operation,
  • 5. respectively. As an example, an exception will be through if it is set to write at least on 3 node and just 2 nodes are up. This project’s design also has taken into consideration working properly on load balancers with clustering proposals as described on figure 2. As one major advantage, this project does not have a SPOF delivering, therefore, a high available system for critical applications. The drawbacks are the impossibility to add a new node on a live cluster which means the entire system has to be shut down in order to configure a new node. Other point is that all code processes a value at a time in memory (no cursor or streaming) meaning that the values need to fit comfortably in memory. Figure 2. Voldemort’s clustering architecture 5.2. HBase HBase is the official Hadoop project database. It is an Open Source, distributed, column-oriented store modeled after Google’s BigTable [Chang et al. 2008]. Just as BigTable leverages the distributed data storage provided by the Google File System [Ghemawat et al. 2003], HBase provides BigTable-like capabilities on top of Hadoop us- ing the HDFS7 . HBase is very good and powerful project which gives the users the opportunity to run parallel processing on the cluster through the use of MapReduce jobs. The current release (0.20) has removed the major drawbacks of having an SPOF and high reading latency. Its architecture works through the use of a distribution of masters and region servers along the cluster’s machines as described at figure 3. 7 Hadoop Distributed File System
  • 6. HBase is currently in use at several places including a Yahoo’s Cluster with 10000 PCs. There is also some companies also doing tests with HBase running at Amazon Elastic Compute Cloud (know as Amazon EC2) Figure 3. HBase’s architecture 5.3. Redis Redis is a key-value distributed solution with the advantage of having more operations than just the tradicional set and get API. Those operations include handling multiple sets and some simple queries on the dataset stored with the garantee of being atomic(just some operations). It also supports storing more datatypes instead of just string or binaries including list, sets and ordered sets. Other major point is that it is increadibly fast, able to perform around 110000 SETs/second and around 81000 GETs/second according to the developer’s test case. It works by doing assynchronous calls which means data can be lost between the time is was requested to write and it definitely happened (not atomic operations). There is also the constraint that all dataset needs to fit on a single device. 5.4. Cassandra Cassandra was born to solve Facebook’s problems. It is a more complete key-value database based on Dynamos’s fully distributed database design [DeCandia et al. 2007] and BigTable’s Column family based data model [Chang et al. 2008]. This project has high-availability without a SPOF with incremental scalability through the option of adding new nodes on a live environment without disturbing the
  • 7. applications currently running on the database. It also has the garantee of being atomic on a single Column Family’s operation. Drawbacks include the poor and inexistent documentation with a very obscure and difficult API that will pass though a heavy remodelling on the next releases. Cassandra is currently at use on Facebook on the inbox search where it is claimed to exist 40 TB of data distributed along 120 machines at separated data centers. It is also in use at Rackspace and Digg.com 5.5. MongoDB MongoDB is document-oriented approach for scalable distributed databases. It is an Open Source implementation entirely written in C++ with commercial support. Its major ad- vantage is the query support that made it unique on this feature. It works through a BSON (binary JSON) format for big data handling (photos and videos) with support for MapRe- duce jobs. Figure 4. MongoDB’s Architecture Design As a drawback it has an intricated cluster schema as shown on figure 4 which has several SPOFs. It is subdivided into config servers (store metadata on which mongo shard is the data) and mongo shards that store the data. There is also the mongo instances that are entry points for clients. Right now it is a relatively new implementation without full support for sharding and data replication has constraint on the number of nodes (2 nodes only) that can be used. 5.6. Tokyo Cabinet/Tyrant Tokyo Cabinet/Tyrant is an Open Source project claimed to be in use at mixi.jp, a japanese social network with 10000 updates/second through MemCache. The use of this tool seems to apply on the handling of 20 millions entries of data(20 bytes each). Although no test has been made, this solution claims to be really fast on writing and reading operations able to perform around 58000 req/seconds. It also seems to support
  • 8. ACID properties with several differents storing approaches (Hash, B-Tree) for each type of data being stored. As drawbacks, it does not have a good documentation and few projects are using it. 5.7. CouchDB CouchDB is a very easy to run project with a document-oriented approach. It has a to- tally unstructured schema-less storing backend throught the use of JSON format as data handling. It is very similar to Amazon’s SimpleDB solution with assynchronous replica- tion of data. It also have a browser administration console where it is possible to create MapReduce jobs, backup operations and views statements like those ones found at rela- tional systems. It uses http requests to manage the dataset which makes it connectable to any soft- ware able to perform http requests. CouchDB has a major advantage because it provides a query like engine which enables the user to build their own queries properly for the application being developed. As a big drawback, it does not satisfy the concept of scalability because all the data being stored needs to fit on a single device. The availability of the system is achieved by a client router which forward the queries to the desired backend service. So, CouchDB is not a distributed database at the current moment but has some interesting features that make it eligible to be on this listing. One of those features is the MapReduce support. An- other one is the approach of having the entire dataset or part of it stored directly at client’s computer with assynchronous replication. By doing this, the workload at the backend can be reduced because the replication will happen on an appropriatte moment. This feature could also be used for mobile devices that get synchronized at base station (bluetooh, wireless, cable) and can access a website after that without having to connect to the in- ternet. This leads the user to avoid spending money on data carrier or even witnessing low conections speeds which invariable leads to a great website’s user-experience. There is some issues regarded to the type of data that can be handled on such approach or even if modifications made by the user can, at a later moment, be synchronized with the main data server without consistency problems. Despite of those issues, this approach seems interesting to delivery fast content on mobile devices. At Portal do Aluno, this feature could be used to connect the users to the dashboard with the option of editing comments on mobile devices that later can be synchronized with the main server. CouchDB is in use at several projects and websites because of its easiness it pro- vides through http requests. At the moment it is a very young project with strong security problems and at Alpha development. Even with such issues, it’s a project to keep an eye on. 5.8. MemCache According to the developers, MemCache is an Open Source, high-performance, dis- tributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load. It is a really simple and robust solution to improve the performance of web appli- cations by dropping the reading time by accessing the cache layer instead of the database
  • 9. itself. As real examples, it has reach the speed of 38000 req/sec at Flickr.com. It does not have persistency layer but is able to work properly doing load balancing just by adding new nodes on the cluster. It is in use at several big projects on the internet and can be used directly by the API of some high end RDBS as a cache layer (MySQL and PostgreSQL). Figure 5. Solution using intermediate storage layers Although MemCache does not provide persistency, it can be very useful with so- lutions that do provide the storage together with commercial services available at market like Amazon Simple Storage Service (Amazon S3). By using the MemCache, there is a considerable drop on the number of reading operations that hit the Amazon S3 and con- sequently the month payment. Figure 5 shows this approach by using MemCache as an intermediate layer for applications that do required high availability but do not have the capacity to setup a private cluster for it. Instead, they use commercial storage solutions for the cluster [Brantner et al. 2008] [Palankar et al. 2008] and drop the month payment by adding aditional storage layers (disk and MemCache). 5.9. Others There is also a lot more of distributed databases projects with differents approaches. Most of them seems to be at beginning development without enough documentation and robust- ness or without persistence layer (In-memory only) . Some of them include: • ThruDB • LightCloud • Kay • MemcachedDB • Scalaris • NMDB • Disco • Riak • Hazelcast • KeySpace • Dynomite • MNesia • Ringo • Hypertable 6. Solution’s Benchmark Until now there is not an accepted benchmark for those new systems [Binnig et al. 2009] [Cryans et al. 2008] and the decision to use or not a system is based on their properties. Table 2 shows a comparative listing of some properties of each system. This table is a snapshot of the systems made at December 2009.
  • 10. Name Language Fault-Tolerance Persistence Client Data Model Documentation Production Voldemort Java Partitioned, Replicated Berkeley DB,MySQL Java API Structured, Blob, Text Good LinkedIn HBase Java Replication, Partitiong Custom on-disk Custom API, Thrift BigTable Good Yahoo Cassandra Java Replication, Partitiong Custom on-disk Thrift BigTable,Dynamo Poor Facebook CouchDB Earlang Replication Custom on-disk HTTP,JSON Document-Oriented Good UbuntuOne MongoDB C++ Replication Custom on-disk,GridFS Java, C++ Drivers Document-Oriented Good SourceForge Hypertable C++ Replication,Partitioning Custom on-disk Java, Thrift BigTable Good Baidu ThruDB C++ Replication Custom on-disk Thrift Document-Oriented Medium — Ringo Earlang Replication,Partitioning Custom on-disk HTTP Blob Medium Nokia Tokyo Tyrant C — B-Tree,Hash ANSI C Document-Oriented Poor Mixi.jp Scalaris Earlang Replication, Partitioning In-Memory Java, Earlang, HTTP Blob Medium OnScale MemCache C Partitiong In-Memory Python, java, Ruby —- Good Several Projects Dynomite Earlang Replication, Partitioning —- Custom, Thrift Blob Poor PowerSet Kai Earlang Partitioning —- —- Blob Poor — Table 2. Comparative List of Distributed Databases Properties 7. Adopted Solution For the Portal do Aluno, there is some requirements that needs to be meet as easily scal- able, high available and fast content retrieval storage. On the presented solutions, HBase (Hadoop Project) and Voldemort have shown as the major robust solutions available at the moment which completely meet the easily scalable goal proposed by the NoSQL movement. Hbase had the problem of having a high latency and a SPOF on the master being inappropriate to serve web pages in real time. At the current version (≥ 0.20), those problems seem to be solved. Voldemort is a really robust approach but it’s not optimized for large data sets the Portal do Aluno needs to store(Video, Photos). This limitations is because it uses mysql or BDB as storage backend. MongoDB has some problems on the scalability it can provide an also the SPOFs it have. It does have, however, a strong support for queries which make it eligible to be tested as backend on Portal do Aluno. It also have its own binary format (BSON) that makes it relatively fast. As a result, HBase could be used for future tests based on its good documentation, easiness of use and robustness it provides. Considering that some intricated and complex joins have to be made at Portal do Aluno, MongoDB could also be used for tests due its query support. Togheter with one of those solutions, MemCache could be used to speed up the performance of the Portal. 8. Conclusion It is not possible to deny that a new set of databases are being developed from now on. They have started as commercial competitive advantages from private companies to solve internet related problems. Assume that they will replace the old fashion relational database model is naive thinking because just some little and special applications require their power. RDMS also have more features that are well known to implement and deal with not to metion the fact they can organize the data as it is at the real world with strong integrity which make them independent of application. The new flavor, however, has shown themselves as good promises in terms of scalability and availability using ordinary hardware as a cheap solution. Businesses which relies completely on a single access point with the client will see those new tools as mandatory in order to have more availability on their applications. The requisite of having a high available system is mandatory for some applica-
  • 11. tions. Until now this property was satisfied when huge investments were made at backup systems with RAID and others devices and software which have just increased the num- ber of SPOFs. Of course those new systems are not completely trustful at the moment because they are relatively new and may end up on failures, but they could be considered as an option to fulfill this requirement. The use of those systems should be analyzed for each application as it is known the entire storing logic will have to be glued to the application’s code. The possiblity of dealing with complicated queries (search, insert, update, delete) is, at the current mo- ment, very little or inexistent. Also, the normalization theorem usually found at relational database to avoid data replication does not apply at all for them. This new approach have several replicas of data inside it at several places which need to be synchronized and keep up to date entirely by the application’s code. As a thumb rule, those new tools are encouraged to be used when the requisites of the application match at least one of the following statements: There is a huge amount of data that needs to be stored; The data set has an easy representation that does not required complex joins or queries and it naturally fits the key-value model; The future of the application will have a high-demand access which will lead to performance problems without clustering. References Amza, C., Cox, A. L., and Zwaenepoel, W. (2003). Distributed versioning: consistent replication for scaling back-end databases of dynamic content web sites. In Middle- ware ’03: Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware, pages 282–304, New York, NY, USA. Springer-Verlag New York, Inc. Binnig, C., Kossmann, D., Kraska, T., and Loesing, S. (2009). How is the weather to- morrow?: towards a benchmark for the cloud. In DBTest ’09: Proceedings of the Second International Workshop on Testing Database Systems, pages 1–6, New York, NY, USA. ACM. Bortnikov, E. (2009). Open-source grid technologies for web-scale computing. SIGACT News, 40(2):87–93. Brantner, M., Florescu, D., Graf, D., Kossmann, D., and Kraska, T. (2008). Building a database on s3. In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD inter- national conference on Management of data, pages 251–264, New York, NY, USA. ACM. Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. (2008). Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):1–26. Cryans, J.-D., April, A., and Abran, A. (2008). Criteria to compare cloud computing with current database technology. In IWSM/Metrikon/Mensura ’08: Proceedings of the International Conferences on Software Process and Product Measurement, pages 114–126, Berlin, Heidelberg. Springer-Verlag. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. (2007). Dynamo: amazon’s highly
  • 12. available key-value store. In SOSP ’07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pages 205–220, New York, NY, USA. ACM. Ghemawat, S., Gobioff, H., and Leung, S.-T. (2003). The google file system. In SOSP ’03: Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 29–43, New York, NY, USA. ACM. Gilbert, S. and Lynch, N. (2002). Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News, 33(2):51–59. Manassiev, K. and Amza, C. (2005). Scalable database replication through dynamic mul- tiversioning. In CASCON ’05: Proceedings of the 2005 conference of the Centre for Advanced Studies on Collaborative research, pages 141–154. IBM Press. Palankar, M. R., Iamnitchi, A., Ripeanu, M., and Garfinkel, S. (2008). Amazon s3 for science grids: a viable solution? In DADC ’08: Proceedings of the 2008 international workshop on Data-aware distributed computing, pages 55–64, New York, NY, USA. ACM. Vogels, W. (2008). Eventually consistent - revisited. http://www. allthingsdistributed.com/2008/12/eventually consistent. html, Visited in December 2009.