SlideShare ist ein Scribd-Unternehmen logo
1 von 6
#105
 Get More Refcardz! Visit refcardz.com




                                                                                                      Getting Started with
                                                    CONTENTS INCLUDE:
                                                    n	
                                                         Introduction
                                                    n	
                                                         Scalable Data Architecture


                                                                                                NoSQL and Data Scalability
                                                    n	
                                                         Is NoSQL For You?
                                                    n	
                                                         mongoDB
                                                    n	
                                                         GigaSpaces XAP
                                                         Google App Engine Datastore and more...
                                                                                                                                                                       By Eugene Ciurana
                                                    n	




                                                                                                                                    Implementations of either often share characteristics from the
                                                           INTRODUCTION                                                             other.

                                                    The DZone Refcard #43 is an introduction to system high                         Data Grids
                                                    availability and scalability terminology and techniques                         Data grids process workloads defined as independent jobs
                                                    (http://refcardz.dzone.com/refcardz/scalability). The next                      that don’t require data sharing among processes. Storage
                                                    logical step is the scalable handling of massive data volumes                   or network may be shared across all nodes of the grid, but
                                                    resulting from having these powerful processing capabilities.                   intermediate results have no bearing on other jobs progress or
                                                                                                                                    on other nodes in the grid, such as a MapReduce cluster.
                                                    This Refcard demystifies NoSQL and data scalability
                                                    techniques by introducing some core concepts. It also offers
                                                    an overview of current technologies available in this domain
                                                                                                                                                                  Consumer
                                                    and suggests how to apply them.

                                                    What is Data Scalability?
                                                                                                                                                                    Master
                                                    Data scalability is the ability of a system to store, manipulate,
                                                    analyze, and otherwise process ever increasing amounts of
 www.dzone.com




                                                    data without reducing overall system availability, performance,                                              Load Balancer

                                                    or throughput.
                                                    Data scalability is achieved by a combination of more powerful
                                                                                                                                           Node           Node                   Node    Node
                                                    processing capabilities and larger but efficient storage
                                                    mechanisms.

                                                    Relational and hierarchical databases scale up by adding more                                                Load Balancer
                                                    processors, more storage, caching systems, and such. Soon
                                                    they hit either a cost or a practical scalability limit because
                                                    they are difficult or impossible to scale out. These database                         Node           Node                    Node   Node
                                                    management systems are designed as single units that must
                                                    maintain data integrity, and enforce schema rules to guarantee                                         Figure 1 - Data Grid
                                                    it. This rigidity is what promotes upward, but not outward,
  Getting Started with NoSQL and Data Scalability




                                                    scalability.                                                                    Data grids expose their functionality through a single API
                                                                                                                                    (either a Web service or native to the application programming
                                                                    Oracle RAC is a cluster of multiple computers with              language) that abstracts its topology and implementation from
                                                                    access to a common database. This is considered                 its data processing consumer.
                                                            Hot     only vertically scalable because the processing
                                                            Tip     (usually in the form of stored procedures) may scale
                                                                    out, but the shared storage facilities don’t scale with
                                                                    the cluster.
                                                                                                                                                       Get over 90 DZone Refcardz
                                                    Data integrity and schemas are suited for handling
                                                    transactional, normalized, uniform data. They handle                                                  FREE from Refcardz.com!
                                                    unstructured or rapidly evolving data structures with difficulty
                                                    or exponentially larger costs.


                                                            Hot     Data replication is not the same as data scalability!
                                                            Tip

                                                           SCALABLE DATA ARCHITECTURES

                                                    There are two general kinds of architectures used for building
                                                    scalable data systems: data grids and NoSQL systems.

                                                                                                                  DZone, Inc.   |   www.dzone.com
2
                                                                                                                   Getting Started with NoSQL and Data Scalability



Areas of Application                                                                                    NoSQL vs RDBMS vs OO Analogies
  • Financial modeling                                                                                   NoSQL                                 RDBMS                                    OO
  • Data mining
                                                                                                         Kind                                  Table                                    Class
  • Click stream analytics
                                                                                                         Entity                                Record                                   Object
  • Document clustering
  • Distributed sorting or grepping                                                                      Attribute                             Column                                   Property
  • Simulations
  • Inverted index construction
                                                                                                            IS NoSQL FOR YOU?
  • Protein folding

NoSQL                                                                                                   Preparation:
NoSQL describes a horizontally scalable, non-relational
                                                                                                           • Don’t fall prey to the NoSQL fad
database with built-in replication support. Applications
                                                                                                           • Don’t be stubborn; neither NoSQL nor traditional
interact with it through a simple API, and the data is stored in a
                                                                                                             databases apply to all cases
“schema-free”, flat addressing repository, usually as large files
                                                                                                           • Apply the CAP Theorem to your use cases to
or data blocks. The repository is often a custom file system
                                                                                                             determine feasibility
designed to support NoSQL operations.
                                                                                                        Brewer’s (CAP) Theorem
                NoSQL is intended as shorthand for “not only                                            It’s impossible for a distributed computer system to
    Hot         SQL.” Complete architectures almost always mix                                          simultaneously provide all three of these guarantees:
    Tip
                traditional and NoSQL databases.                                                           • Consistency (all nodes see the same data at the same time)
                                                                                                           • Availability (node failures don’t prevent survivors from
NoSQL setups are best suited for non-OLTP applications that                                                  continuing to operate)
process massive amounts of structured and unstructured data                                                • Partition tolerance (no failures less than total network
at a lower cost and with higher efficiency than RDBMs and                                                    failures cause the system to fail)
stored procedures.
                                                                                                        Since only two of these characteristics are guaranteed for any
                                                                                                        given scalable system, use your functional specification and
                                           Consumer                                                     business SLA (service level agreement) to determine what your
                                                                                                        minimum and target goals for CAP are, pick the two that meet
                                                                                                        your requirements, and proceed to implement the appropriate
         Node                    Node                     Node                     Node                 technology.


                                                                                                                              Rule of Thumb: NoSQL’s primary goal is to achieve
                                      Virtual File System
                  logical table management, load balancing, garbage collection
                                                                                                             Hot              horizontal scalability. It attains this by reducing
                                  (HDFS, mongoFS, Hypertable)                                                Tip
                                                                                                                              transactional semantics and referential integrity.
            Tablet           Tablet                                        Tablet
           Server 0         Server 1                                      Server n

                                                                                                        Use Figure 3 to identify the best match between your
                                     Distributed File System
                                                                                                        application’s CAP requirements and the suggested SQL and
                                                                                                        NoSQL systems listed.
                FS 0          FS 1            FS 2                          FS n


                              Figure 2 - NoSQL Topology                                                                                                      Relational
                                                                                                                                                             Key-Value
                                                                                                                                                          Column-Oriented
NoSQL storage is highly replicated (a commit doesn’t occur                                                                                               Document-Oriented
until the data is successfully written to at least two separate
                                                                                                            Consistency                                                                               Availability
storage devices) and the file systems are optimized for write-
                                                                                                                  C                                                                                         A
                                                                                                                              RDBMs (Oracle, MySQL), Aster Data, Green Plum, Vertica
only commits. The storage devices are formatted to handle
large blocks (32 MB or more). Caching and buffering are
                                                                                                                                                                                                               a,
                                                                                                                  mo




designed for high I/O throughput. The NoSQL database
                                                                                                                                                                                                            dr
                                                                                                                  ng edi




                                                                                                                                                                                                          an
                                                                                                                    oD s,




                                                                                                                                                                                                             s
                                                                                                                     R




is implemented as a data grid for processing (mapReduce,
                                                                                                                                                                                                          as
                                                                                                                       B, Be




                                                                                                                                                                                                     ak C




                                                                                                                                               Pick Any Two
                                                                                                                         Te rke




                                                                                                                                                                                                   Ri I,




queries, CRUD, etc.)
                                                                                                                                                                                                 B, , KA
                                                                                                                           rra ley
                                                                                                                              sto D




                                                                                                                                                                                               hD et
                                                                                                                                 re B,




                                                                                                                                                                                             uc bin




Areas of Application
                                                                                                                                   ,D M




                                                                                                                                                                                          Co Ca
                                                                                                                                     ata em




                                                                                                                                                                                        B, yo




  • Document storage
                                                                                                                                        sto cac




                                                                                                                                                                                     leD Tok
                                                                                                                                           re he




  • Object databases
                                                                                                                                                                                  mp t,
                                                                                                                                             , H DB




                                                                                                                                                                               Si mor
                                                                                                                                                yp , S




                                                                                                                                                                                 lde
                                                                                                                                                  er




  • Graph databases
                                                                                                                                                     tab cala




                                                                                                                                                                              Vo
                                                                                                                                                        le, ris




  • Key/value stores
                                                                                                                                                                           o,
                                                                                                                                                                            m
                                                                                                                                                           Hb




                                                                                                                                                                         na




                                                                                                                                                                     P
                                                                                                                                                              as




  • Eventually consistent key/value stores
                                                                                                                                                                         Dy
                                                                                                                                                             ,   e




This list is the implementation counterpart to the data grid                                                                                            Partition tolerance
areas of application; the data store feeds the computational                                                                                    Figure 3 - CAP Selection Chart
network and, together, they form the NoSQL database.                                                                                             (source: Nathan Hurst's Blog)


                                                                                      DZone, Inc.   |   www.dzone.com
3
                                                                                                                   Getting Started with NoSQL and Data Scalability



                                                                                                         encoded JSON representation. BSON is designed to be
   mongoDB                                                                                               traversable, lightweight and efficient. Applications can map
                                                                                                         BSON/JSON documents using native representations like
mongoDB is a document-based NoSQL database that bridges                                                  dictionaries, lists, and arrays, leaving the BSON translation
the gap between scalable key-value stores like Datastore                                                 to the native mongoDB driver specific to each programming
and Memcache DB, and RDBMS’s querying and robustness                                                     language.
capabilities. Some of its main features include:
                                                                                                         BSON Example
   • Document-oriented storage - data is manipulated as                                                   {
     JSON-like documents                                                                                      'name' : 'Tom',
                                                                                                              'age' : 42
   • Querying - uses JavaScript and has APIs for submitting                                               }
     queries in every major programming language
   • In-place updates - atomicity                                                                         Language         Representation
   • Indexing - any attribute in a document may be used for                                               Python           {
                                                                                                                                'name' : 'Tom',
     indexing and query optimization                                                                                            'age' : 42
   • Auto-sharding - enables horizontal scalability                                                                        }

   • Map/reduce - the mongoDB cluster may run smaller                                                     Ruby             {
                                                                                                                                "name" => "Tom",
     MapReduce jobs than a Hadoop cluster with significant                                                                      "age" => 42
                                                                                                                           }
     cost and efficiency improvements
                                                                                                          Java             BasicDBObject d;
mongoDB implements its own file system to optimize I/O                                                                     d = new BasicObject();
                                                                                                                           d.put("name", "Tom");
throughput by dividing larger objects into smaller chunks.                                                                 d.put("age", 42);
Documents are stored in two separate collections: files                                                   PHP              array( "name" => "Tom",
containing the object meta-data, and chunks that form a                                                                      "age" => 42);

larger document when combined with database accounting                                                   Dynamic languages offer a closer object mapping to BSON/
information. The mongoDB API provides functions for                                                      JSON than compiled languages.
manipulating files, chunks, and indices directly. The
administration tools enable GridFS maintenance.                                                          The complete BSON specification is available from:
                                                                                                         http://bsonspec.org/
                                          Consumer                                                       mongoDB Programming
                                                                                                         Programming in mongoDB requires an active server running
                                                        fail-over
                                                                                                         the mongod and the mongos database daemons (see Figure
                                                                                                         5), and a client application that uses one of the language-
           mongoDB Server (master)                            mongoDB Server (slave)
       mongod                                         mongod
                                                                                                         specific drivers.
       Database                                       Database
       daemon                                         daemon
                              mongos                                             mongos
                              Sharding
                              daemon
                                                                                 Sharding
                                                                                 daemon                          Hot     All the examples in this Refcard are written in Python
                   Data                                              Data
                                                                                                                 Tip     for conciseness.
                  Storage                                           Storage


                                                                                                         Starting the Server
                                 Figure 5 - mongoDB Cluster
                                                                                                         Log on to the master server and execute:
A mongoDB cluster consists of a master and a slave. The slave                                             [servername:user] ./mongod
may become the master in a fail-over scenario, if necessary.
Having a master/slave configuration (also known as Active/                                               The server will display its status messages to the console
Passive or A/P cluster) helps ensure data integrity since only                                           unless stdout is redirected elsewhere.
the master is allowed to commit changes to the store at any                                              Programming Example
given time. A commit is successful only if the data is written to                                        This example allocates a database if one doesn’t already exist,
GridFS and replicated in the slave.                                                                      instantiates a collection on the server, and runs a couple of
                                                                                                         queries.
                  mongoDB also supports a limited master/master
                  configuration. It’s useful only for inserts, queries,                                  The mongoDB Developer Manual is available from:
    Hot           and deletions by specific object ID. It must not                                       http://www.mongodb.org/display/DOCS/Manual
    Tip
                  be used if updates of a single object may occur                                         #!/usr/bin/env jython
                  concurrently.                                                                           import pymongo

                                                                                                          from pymongo import Connection
Caching
                                                                                                          connection = Connection('servername', 27017)
mongoDB has a built-in cache that runs directly in the cluster
without external requirements. Any query is transparently                                                 db = connection['people_database']

cached in RAM to expedite data transfer rates and to reduce                                               peopleList = db['people_list']
disk I/O.                                                                                                 person = {
                                                                                                            'name' : 'Tom',
Document Format                                                                                             'age' : 42 }
mongoDB handles documents in BSON format, a binary-

                                                                                       DZone, Inc.   |   www.dzone.com
4
                                                                                       Getting Started with NoSQL and Data Scalability



 peopleList.insert(person)                                                         • JSON data and program objects storage - many RESTful
                                                                                     web services provide JSON data; they can be stored in
 person = {
   'name' : 'Nancy',                                                                 mongoDB without language serialization overhead
   'age' : 69 }
                                                                                     (especially when compared against XML documents)
 peopleList.insert(person)                                                         • Content management systems - JSON/BSON objects can
 # find first entry:                                                                 represent any kind of document, including those with a
 person = peopleList.find_one()
                                                                                     binary representation
 # find a specific person:
                                                                                mongoDB Drawbacks
 person = peopleList.find_one({ 'name' : 'Joe'})
                                                                                   • No JOIN operations - each document is stand-alone
 if person is None:                                                                • Complex queries - some complex queries and indices are
   print "Joe isn’t here!"
 else:                                                                               better suited for SQL
   print person['age']
                                                                                   • No row-level locking - unsuitable for transactional data
 # bulk inserts                                                                      without error prone application-level assistance
 persons = [{ 'name' : 'Joe' }, {'name' : 'Sue'}]

 peopleList.insert(persons)                                                     If any of these is part of the functional requirements, a SQL
                                                                                database would be better suited for the application.
 # queries with multiple results

 for person in peopleList.find():
   print person['name']                                                            GIGASPACES XAP
 for person in peopleList.find({'age' : {'$ge' : 21}}).sort('name'):
   print person['name']                                                         The GigaSpaces eXtreme Application Platform is a data
 # count:                                                                       grid designed to replace traditional application servers. It
 nDrinkingAge = peopleList.find({'age' : {'$ge' : 21}}).count()
                                                                                operates based on an event-processing model where the
                                                                                application dispatches objects to the processing nodes
 # indexing
                                                                                associated with a given data partition. The system may be
 from pymongo import ASCENDING, DESCENDING
                                                                                configured so that data state on the grid may trigger events, or
 peopleList.create_index([('age', DESCENDING), ('name', ASCENDING)])            an application may dispatch specific commands imperatively.
                                                                                GigaSpaces XAP also manages all threading and execution
The PyMongo documentation is available at:                                      aspects of the operation, including thread and connection
http://api.mongodb.org/python - guides for other languages                      pools. GigaSpaces XAP implements Spring transactions and
are also available from this web site.                                          auto-recovery. The system detects any failed operations
The code in the previous example performs these operations:                     in a computational node and automatically rolls back the
                                                                                transaction; it then places it back in the space where another
     • Connect to the database server started in the previous
                                                                                node picks it up to complete processing.
       section
     • Attach a database; notice that the database is treated like
                                                                                                      Application Frameworks
      an associative array
     • Get a collection (loosely equivalent to a table in a                                Java          C++               .Net           Groovy
       relational database), treated like an associative array
                                                                                           Mule         Spring             JEE              Jetty
     • Insert one or more entities
     • Query for one or more entites

Although mongoDB treats all these data as BSON internally,
most of the APIs allow the use of dictionary-style objects to
streamline the development process.                                                                              XAP Deployment Virtualization
                                                                                           XAP
Object ID                                                                               Management
A successful insertion into the database results in a valid                                and                   XAP Middleware Virtualization
                                                                                         Monitoring
Object ID. This is the unique identifier in the database for a                                                    (Virtualized Clustering Layer)
given document. When querying the database, a return value
will include this attribute:
 {
     "name" : "Tom",
     "age" : 42,
     "_id" : ObjectId('999999')
 }                                                                                        RDBMS            Memcache DB                   mongoDB

Users may override this Object ID with any argument as long as
it’s unique, or allow mongoDB to assign one automatically.
                                                                                              Figure 4 - GigaSpaces XAP Data Grid
Common Use Cases
     • Caching - more robust capabilities, plus persistence, when               GigaSpaces provides data persistence, distributed processing,
       compared against a pure caching system                                   and caching by interfacing with SQL and NoSQL data stores.
     • High volume processing - RDBMS may be too expensive                      The GigaSpaces API abstracts all back-end operations (job
       or slow to run in comparison                                             dispatching, data persistence, and caching) and makes

                                                              DZone, Inc.   |   www.dzone.com
5
                                                                                    Getting Started with NoSQL and Data Scalability



it transparent to the application. GigaSpaces XAP may
implement distributed computing operations like MapReduce                       GOOGLE APP ENGINE DATASTORE
and run them in its nodes, or it may dispatch them for
processing to the underlying subsystem if the functionality                 The Datastore is the main scalability feature of Google App
is available (e.g. mongoDB, Hadoop). The application may                    Engine applications. It’s not a relational database or a façade
implement transactions using the Spring API, the GigaSpaces                 for one. Datastore is a public API for accessing Google’s
XPA transactional facilities, or by implementing a workflow                 Bigtable high-performance distributed database system.
where specific NoSQL stores handle entities or groups of                    Think of it as a sparse array distributed across multiple servers
entities. This must be implemented explicitly in systems like               that also allows an infinite number of columns and rows.
mongoDB, but may exist in other NoSQL systems like Google                   Applications may even define new columns “on the fly”. The
App Engine’s Datastore.                                                     Datastore scales by adding new servers to a cluster; Google
                                                                            provides this functionality without user participation.
GigaSpaces XAP Programming                                                                                      Google                              Your
                                                                                                              Applications                       Applications
The GigaSpaces XAP API is very rich and covers many aspects
beyond NoSQL and data scalability areas like configuration                                     Datastore                         API 1            Datastore
                                                                                            Java (JDO, JPA)                  Other language        Python
management, deployment, web services, third-party product
                                                                                                                       Bigtable
integration, etc.                                                                                                   Master Server
                                                                                             (Logical table management, load balancing, garbage collection)
The GigaSpaces XAP documentation is at:
                                                                                                Tablet         Tablet                            Tablet
http://www.gigaspaces.com/wiki/display/XAP71/Programmer%27s+Guide                              Server 0       Server 1                          Server n


NoSQL operations may be implemented over these APIs:                                                                  Google File System


   • SQLQuery - allows querying a space using a SQL-like                                         FS 0          FS 1           FS 2                FS n

     syntax and regular expressions; do not confuse it with
                                                                                                          Figure 6 - Datastore Architecture
     JDBC support.
                                                                            Datastore operations are defined around entities. Entities
   • Persistency - mostly supports RDBMSs but may implement
                                                                            can have one-to-many or many-to-many relationships. The
     other persistency mechanisms through the External Data
                                                                            Datastore assigns unique IDs unless the application specifies
     Source Components API.
                                                                            a unique key. Datastore also disallows some property names
   • memcached - support for key/value pair distributed                     that it uses for housekeeping. The complete Datastore
     dictionaries available to any client in the grid; entities             documentation is available from:
     are automatically made available across all nodes. The                 http://code.google.com/appengine/docs/python/datastore/
     memcached API is implemented on top of the data
     grid, and it’s interchangeable with other memcached                                    Did you notice the parallels between Datastore
     implementations.                                                           Hot         and mongoDB so far? Many NoSQL database
   • Task Execution - allows synchronous and asynchronous job                   Tip         implementations have solved similar problems in
     execution on specific nodes or clusters                                                similar ways.

The GigaSpaces XAP API is in a minority of stateful NoSQL                   Transactions and Entity Groups
systems. Most NoSQL systems strive to achieve statelessness                 Datastore supports transactions. These transactions are only
to increase scalability and data consistency.                               effective on entities that belong to the same entity group.
                                                                            Entities in a given group are guaranteed to be stored on the
Common Use Cases                                                            same server.
   • Real-time analytics - dynamic data analysis and reporting
                                                                            Datastore Programming
     based on data entered into a system less than a minute
                                                                            The programming model is based on inheritance of basic
     before effective time of use
                                                                            entities, db.Model and db.Expando. Persistent data is
   • Map/reduce - distributed data processing of large data                 mapped onto an entity specialization of either of these classes.
     sets across a computational grid                                       The API provides persistence and querying instance methods
   • Near-zero downtime - allows for database schema                        for every entity managed by the Datastore.
     changes without homebrew master/slave configurations or
                                                                            Programming Example
     proprietary RDBMS dependencies
                                                                            The Datastore API is simpler than other NoSQL APIs and is
GigaSpaces XAP NoSQL Drawbacks                                              highly optimized to work in the App Engine environment. In
   • Complexity - the server, transactional, and grid model are             this example we insert data into the data store, then run a query:
     more complex than for other NoSQL systems                               from google.appengine.ext import db

   • Application server model - the API and components are                   class Person(db.Model):
                                                                               name = db.StringProperty(required=True)
     geared toward building applications and transactional                     age = db.IntegerProperty(require=True)
     logic
                                                                             person = Person(name = 'Tom', age = 42)
   • Steeper learning curve
                                                                             person.put()
   • Higher TCO - brings a requirement of a specialized,
                                                                             person = Person(name = 'Sue', age = 69)
     well-trained system administration team with higher
     requirements than other NoSQL systems                                   person.put()



                                                          DZone, Inc.   |   www.dzone.com
Getting Started with NoSQL and Data Scalability: An Introduction to Scalable Data Architecture Implementations and Technologies

Weitere ähnliche Inhalte

Was ist angesagt?

How to Develop True Distributed Simulations? HLA & DDS Interoperability
How to Develop True Distributed Simulations? HLA & DDS InteroperabilityHow to Develop True Distributed Simulations? HLA & DDS Interoperability
How to Develop True Distributed Simulations? HLA & DDS InteroperabilityJose Carlos Diaz
 
DBArtisan® vs Quest Toad with DB Admin Module
DBArtisan® vs Quest Toad with DB Admin ModuleDBArtisan® vs Quest Toad with DB Admin Module
DBArtisan® vs Quest Toad with DB Admin ModuleEmbarcadero Technologies
 
Persistence of memory: In-memory Is Not Often the Answer
Persistence of memory: In-memory Is Not Often the AnswerPersistence of memory: In-memory Is Not Often the Answer
Persistence of memory: In-memory Is Not Often the AnswerNeil Raden
 
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...IDES Editor
 
NetApp Unified Scale-Out/Clustered Storage
NetApp Unified Scale-Out/Clustered StorageNetApp Unified Scale-Out/Clustered Storage
NetApp Unified Scale-Out/Clustered StorageNetApp
 
Manage rising disk prices with storage virtualization webinar
Manage rising disk prices with storage virtualization webinarManage rising disk prices with storage virtualization webinar
Manage rising disk prices with storage virtualization webinarHitachi Vantara
 
Cost model for RFID-based traceability information systems
Cost model for RFID-based traceability information systemsCost model for RFID-based traceability information systems
Cost model for RFID-based traceability information systemsMiguel Pardal
 
ANN based STLF of Power System
ANN based STLF of Power SystemANN based STLF of Power System
ANN based STLF of Power SystemYousuf Khan
 
Seneca, Pittsburgh Supercomputer, and LSI
Seneca, Pittsburgh Supercomputer, and LSI Seneca, Pittsburgh Supercomputer, and LSI
Seneca, Pittsburgh Supercomputer, and LSI Jan Robin
 
DEISA Distributed European Infrastructure for Supercomputing Applications
DEISA Distributed European Infrastructure for Supercomputing ApplicationsDEISA Distributed European Infrastructure for Supercomputing Applications
DEISA Distributed European Infrastructure for Supercomputing ApplicationsWolfgang Gentzsch
 
Knowledge defined networking
Knowledge defined networkingKnowledge defined networking
Knowledge defined networkingPavel Ivanov
 
StruxureWare for data centers
StruxureWare for data centersStruxureWare for data centers
StruxureWare for data centersRogier den Boer
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
An asynchronous replication model to improve data available into a heterogene...
An asynchronous replication model to improve data available into a heterogene...An asynchronous replication model to improve data available into a heterogene...
An asynchronous replication model to improve data available into a heterogene...Alexander Decker
 
New Generation of Storage Tiering
New Generation of Storage TieringNew Generation of Storage Tiering
New Generation of Storage TieringTony Pearson
 
High performance operating system controlled memory compression
High performance operating system controlled memory compressionHigh performance operating system controlled memory compression
High performance operating system controlled memory compressionMr. Chanuwan
 

Was ist angesagt? (18)

How to Develop True Distributed Simulations? HLA & DDS Interoperability
How to Develop True Distributed Simulations? HLA & DDS InteroperabilityHow to Develop True Distributed Simulations? HLA & DDS Interoperability
How to Develop True Distributed Simulations? HLA & DDS Interoperability
 
DBArtisan® vs Quest Toad with DB Admin Module
DBArtisan® vs Quest Toad with DB Admin ModuleDBArtisan® vs Quest Toad with DB Admin Module
DBArtisan® vs Quest Toad with DB Admin Module
 
Persistence of memory: In-memory Is Not Often the Answer
Persistence of memory: In-memory Is Not Often the AnswerPersistence of memory: In-memory Is Not Often the Answer
Persistence of memory: In-memory Is Not Often the Answer
 
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc NETworks ...
 
Amazon SimpleDB
Amazon SimpleDBAmazon SimpleDB
Amazon SimpleDB
 
1 ieee98
1 ieee981 ieee98
1 ieee98
 
NetApp Unified Scale-Out/Clustered Storage
NetApp Unified Scale-Out/Clustered StorageNetApp Unified Scale-Out/Clustered Storage
NetApp Unified Scale-Out/Clustered Storage
 
Manage rising disk prices with storage virtualization webinar
Manage rising disk prices with storage virtualization webinarManage rising disk prices with storage virtualization webinar
Manage rising disk prices with storage virtualization webinar
 
Cost model for RFID-based traceability information systems
Cost model for RFID-based traceability information systemsCost model for RFID-based traceability information systems
Cost model for RFID-based traceability information systems
 
ANN based STLF of Power System
ANN based STLF of Power SystemANN based STLF of Power System
ANN based STLF of Power System
 
Seneca, Pittsburgh Supercomputer, and LSI
Seneca, Pittsburgh Supercomputer, and LSI Seneca, Pittsburgh Supercomputer, and LSI
Seneca, Pittsburgh Supercomputer, and LSI
 
DEISA Distributed European Infrastructure for Supercomputing Applications
DEISA Distributed European Infrastructure for Supercomputing ApplicationsDEISA Distributed European Infrastructure for Supercomputing Applications
DEISA Distributed European Infrastructure for Supercomputing Applications
 
Knowledge defined networking
Knowledge defined networkingKnowledge defined networking
Knowledge defined networking
 
StruxureWare for data centers
StruxureWare for data centersStruxureWare for data centers
StruxureWare for data centers
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
An asynchronous replication model to improve data available into a heterogene...
An asynchronous replication model to improve data available into a heterogene...An asynchronous replication model to improve data available into a heterogene...
An asynchronous replication model to improve data available into a heterogene...
 
New Generation of Storage Tiering
New Generation of Storage TieringNew Generation of Storage Tiering
New Generation of Storage Tiering
 
High performance operating system controlled memory compression
High performance operating system controlled memory compressionHigh performance operating system controlled memory compression
High performance operating system controlled memory compression
 

Andere mochten auch

Klondike wilderness rescue company
Klondike wilderness rescue companyKlondike wilderness rescue company
Klondike wilderness rescue companyLashawn Carmichael
 
to analyze the shifting people towards super markets
to analyze the shifting people towards super marketsto analyze the shifting people towards super markets
to analyze the shifting people towards super marketsMohammad Waseem
 
windowsAlia's Summer 2016 Presentation
windowsAlia's Summer 2016 PresentationwindowsAlia's Summer 2016 Presentation
windowsAlia's Summer 2016 PresentationAlia Alshammary
 
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bhaskar Ghosh
 
Corrosion in water industry and wind turbines
Corrosion in water industry and wind turbinesCorrosion in water industry and wind turbines
Corrosion in water industry and wind turbinesGuillaume R. Fery
 
I Am A Red Raider: A Marketing Campaign Designed with Engagement in Mind
I Am A Red Raider: A Marketing Campaign Designed with Engagement in MindI Am A Red Raider: A Marketing Campaign Designed with Engagement in Mind
I Am A Red Raider: A Marketing Campaign Designed with Engagement in MindAllison Matherly
 
Hombres que vencieron la adversidad
Hombres que vencieron la adversidadHombres que vencieron la adversidad
Hombres que vencieron la adversidadRUBEN ORELLANA
 
Elementos de la comunicación oral
Elementos de la comunicación oralElementos de la comunicación oral
Elementos de la comunicación oralRUBEN ORELLANA
 
Swiss elearning Institute Certificate
Swiss elearning Institute CertificateSwiss elearning Institute Certificate
Swiss elearning Institute CertificateVinay Prakash Oommen
 
Polimeret sintetike
Polimeret sintetikePolimeret sintetike
Polimeret sintetikeD. Sh
 
Treated wastewater for Irrigation
Treated wastewater for IrrigationTreated wastewater for Irrigation
Treated wastewater for IrrigationVignesh Sekar
 
Metoda Kërkimi Shkencor - Kapitulli 4 - Zgjedhja
Metoda Kërkimi Shkencor - Kapitulli 4 - ZgjedhjaMetoda Kërkimi Shkencor - Kapitulli 4 - Zgjedhja
Metoda Kërkimi Shkencor - Kapitulli 4 - ZgjedhjaSokol Luzi
 
Fieldking farm Implements, Rotavator, Power Harrow, Cultivator, Laser Levelle...
Fieldking farm Implements, Rotavator, Power Harrow, Cultivator, Laser Levelle...Fieldking farm Implements, Rotavator, Power Harrow, Cultivator, Laser Levelle...
Fieldking farm Implements, Rotavator, Power Harrow, Cultivator, Laser Levelle...DINESH GARG
 
Python for the Network Nerd
Python for the Network NerdPython for the Network Nerd
Python for the Network NerdMatt Bynum
 
Barnat kunder dhimbjes (Analgjeziket)
Barnat kunder dhimbjes (Analgjeziket)Barnat kunder dhimbjes (Analgjeziket)
Barnat kunder dhimbjes (Analgjeziket)Ardian Z. Kryeziu
 

Andere mochten auch (19)

Klondike wilderness rescue company
Klondike wilderness rescue companyKlondike wilderness rescue company
Klondike wilderness rescue company
 
to analyze the shifting people towards super markets
to analyze the shifting people towards super marketsto analyze the shifting people towards super markets
to analyze the shifting people towards super markets
 
windowsAlia's Summer 2016 Presentation
windowsAlia's Summer 2016 PresentationwindowsAlia's Summer 2016 Presentation
windowsAlia's Summer 2016 Presentation
 
La palabra
La palabraLa palabra
La palabra
 
PArty Bus
PArty BusPArty Bus
PArty Bus
 
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
 
Corrosion in water industry and wind turbines
Corrosion in water industry and wind turbinesCorrosion in water industry and wind turbines
Corrosion in water industry and wind turbines
 
I Am A Red Raider: A Marketing Campaign Designed with Engagement in Mind
I Am A Red Raider: A Marketing Campaign Designed with Engagement in MindI Am A Red Raider: A Marketing Campaign Designed with Engagement in Mind
I Am A Red Raider: A Marketing Campaign Designed with Engagement in Mind
 
Hombres que vencieron la adversidad
Hombres que vencieron la adversidadHombres que vencieron la adversidad
Hombres que vencieron la adversidad
 
Elementos de la comunicación oral
Elementos de la comunicación oralElementos de la comunicación oral
Elementos de la comunicación oral
 
Swiss elearning Institute Certificate
Swiss elearning Institute CertificateSwiss elearning Institute Certificate
Swiss elearning Institute Certificate
 
Mitos y leyendas de tubara
Mitos y leyendas de tubaraMitos y leyendas de tubara
Mitos y leyendas de tubara
 
Polimeret sintetike
Polimeret sintetikePolimeret sintetike
Polimeret sintetike
 
Treated wastewater for Irrigation
Treated wastewater for IrrigationTreated wastewater for Irrigation
Treated wastewater for Irrigation
 
Metoda Kërkimi Shkencor - Kapitulli 4 - Zgjedhja
Metoda Kërkimi Shkencor - Kapitulli 4 - ZgjedhjaMetoda Kërkimi Shkencor - Kapitulli 4 - Zgjedhja
Metoda Kërkimi Shkencor - Kapitulli 4 - Zgjedhja
 
Fieldking farm Implements, Rotavator, Power Harrow, Cultivator, Laser Levelle...
Fieldking farm Implements, Rotavator, Power Harrow, Cultivator, Laser Levelle...Fieldking farm Implements, Rotavator, Power Harrow, Cultivator, Laser Levelle...
Fieldking farm Implements, Rotavator, Power Harrow, Cultivator, Laser Levelle...
 
Python for the Network Nerd
Python for the Network NerdPython for the Network Nerd
Python for the Network Nerd
 
Barnat kunder dhimbjes (Analgjeziket)
Barnat kunder dhimbjes (Analgjeziket)Barnat kunder dhimbjes (Analgjeziket)
Barnat kunder dhimbjes (Analgjeziket)
 
Ambalazhet e plastikes
Ambalazhet e plastikesAmbalazhet e plastikes
Ambalazhet e plastikes
 

Ähnlich wie Getting Started with NoSQL and Data Scalability: An Introduction to Scalable Data Architecture Implementations and Technologies

Sybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem PerformansSybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem PerformansSybase Türkiye
 
A request skew aware heterogeneous distributed
A request skew aware heterogeneous distributedA request skew aware heterogeneous distributed
A request skew aware heterogeneous distributedJoão Gabriel Lima
 
Datacenter Design Guide UNIQUE Panduit-Cisco
Datacenter Design Guide UNIQUE Panduit-Cisco Datacenter Design Guide UNIQUE Panduit-Cisco
Datacenter Design Guide UNIQUE Panduit-Cisco jpjobard
 
NoSQL Database
NoSQL DatabaseNoSQL Database
NoSQL DatabaseSteve Min
 
Dynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 PresentationDynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 PresentationShanley Kane
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic WebStefan Ceriu
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebStefan Prutianu
 
Evaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseEvaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseDataStax
 
access.2021.3077680.pdf
access.2021.3077680.pdfaccess.2021.3077680.pdf
access.2021.3077680.pdfneju3
 
Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10keirdo1
 
The MADlib Analytics Library
The MADlib Analytics Library The MADlib Analytics Library
The MADlib Analytics Library EMC
 
NoSQL overview implementation free
NoSQL overview implementation freeNoSQL overview implementation free
NoSQL overview implementation freeBenoit Perroud
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesEditor Jacotech
 
No sql databases explained
No sql databases explainedNo sql databases explained
No sql databases explainedSalil Mehendale
 
Next Generation Hadoop: High Availability for YARN
Next Generation Hadoop: High Availability for YARN Next Generation Hadoop: High Availability for YARN
Next Generation Hadoop: High Availability for YARN Arinto Murdopo
 

Ähnlich wie Getting Started with NoSQL and Data Scalability: An Introduction to Scalable Data Architecture Implementations and Technologies (20)

Sybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem PerformansSybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem Performans
 
A request skew aware heterogeneous distributed
A request skew aware heterogeneous distributedA request skew aware heterogeneous distributed
A request skew aware heterogeneous distributed
 
Datacenter Design Guide UNIQUE Panduit-Cisco
Datacenter Design Guide UNIQUE Panduit-Cisco Datacenter Design Guide UNIQUE Panduit-Cisco
Datacenter Design Guide UNIQUE Panduit-Cisco
 
NoSQL Database
NoSQL DatabaseNoSQL Database
NoSQL Database
 
Dynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 PresentationDynamo Systems - QCon SF 2012 Presentation
Dynamo Systems - QCon SF 2012 Presentation
 
No sql database
No sql databaseNo sql database
No sql database
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic Web
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic Web
 
Evaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud DatabaseEvaluating Apache Cassandra as a Cloud Database
Evaluating Apache Cassandra as a Cloud Database
 
access.2021.3077680.pdf
access.2021.3077680.pdfaccess.2021.3077680.pdf
access.2021.3077680.pdf
 
Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10
 
Rise of NewSQL
Rise of NewSQLRise of NewSQL
Rise of NewSQL
 
The MADlib Analytics Library
The MADlib Analytics Library The MADlib Analytics Library
The MADlib Analytics Library
 
NoSQL overview implementation free
NoSQL overview implementation freeNoSQL overview implementation free
NoSQL overview implementation free
 
Poster for ISGC
Poster for ISGCPoster for ISGC
Poster for ISGC
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
 
No sql databases explained
No sql databases explainedNo sql databases explained
No sql databases explained
 
Nosql intro
Nosql introNosql intro
Nosql intro
 
Know what is NOSQL
Know what is NOSQL Know what is NOSQL
Know what is NOSQL
 
Next Generation Hadoop: High Availability for YARN
Next Generation Hadoop: High Availability for YARN Next Generation Hadoop: High Availability for YARN
Next Generation Hadoop: High Availability for YARN
 

Mehr von Roger Xia

机器学习推动金融数据智能
机器学习推动金融数据智能机器学习推动金融数据智能
机器学习推动金融数据智能Roger Xia
 
Code reviews
Code reviewsCode reviews
Code reviewsRoger Xia
 
Python introduction
Python introductionPython introduction
Python introductionRoger Xia
 
Learning notes ruby
Learning notes rubyLearning notes ruby
Learning notes rubyRoger Xia
 
Converged open platform for enterprise
Converged open platform for enterpriseConverged open platform for enterprise
Converged open platform for enterpriseRoger Xia
 
Code reviews
Code reviewsCode reviews
Code reviewsRoger Xia
 
E commerce search strategies
E commerce search strategiesE commerce search strategies
E commerce search strategiesRoger Xia
 
Indefero source code_managment
Indefero source code_managmentIndefero source code_managment
Indefero source code_managmentRoger Xia
 
Web Services Atomic Transactio
 Web Services Atomic Transactio Web Services Atomic Transactio
Web Services Atomic TransactioRoger Xia
 
Web service through cxf
Web service through cxfWeb service through cxf
Web service through cxfRoger Xia
 
Q con london2011-matthewwall-whyichosemongodbforguardiancouk
Q con london2011-matthewwall-whyichosemongodbforguardiancoukQ con london2011-matthewwall-whyichosemongodbforguardiancouk
Q con london2011-matthewwall-whyichosemongodbforguardiancoukRoger Xia
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 
Consistency-New-Generation-Databases
Consistency-New-Generation-DatabasesConsistency-New-Generation-Databases
Consistency-New-Generation-DatabasesRoger Xia
 
Java explore
Java exploreJava explore
Java exploreRoger Xia
 
Mongo db实战
Mongo db实战Mongo db实战
Mongo db实战Roger Xia
 
Ca siteminder
Ca siteminderCa siteminder
Ca siteminderRoger Xia
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Eclipse plug in mylyn & tasktop
Eclipse plug in mylyn & tasktopEclipse plug in mylyn & tasktop
Eclipse plug in mylyn & tasktopRoger Xia
 

Mehr von Roger Xia (20)

机器学习推动金融数据智能
机器学习推动金融数据智能机器学习推动金融数据智能
机器学习推动金融数据智能
 
Code reviews
Code reviewsCode reviews
Code reviews
 
Python introduction
Python introductionPython introduction
Python introduction
 
Learning notes ruby
Learning notes rubyLearning notes ruby
Learning notes ruby
 
Converged open platform for enterprise
Converged open platform for enterpriseConverged open platform for enterprise
Converged open platform for enterprise
 
Code reviews
Code reviewsCode reviews
Code reviews
 
E commerce search strategies
E commerce search strategiesE commerce search strategies
E commerce search strategies
 
Saml
SamlSaml
Saml
 
JavaEE6
JavaEE6JavaEE6
JavaEE6
 
Indefero source code_managment
Indefero source code_managmentIndefero source code_managment
Indefero source code_managment
 
Web Services Atomic Transactio
 Web Services Atomic Transactio Web Services Atomic Transactio
Web Services Atomic Transactio
 
Web service through cxf
Web service through cxfWeb service through cxf
Web service through cxf
 
Q con london2011-matthewwall-whyichosemongodbforguardiancouk
Q con london2011-matthewwall-whyichosemongodbforguardiancoukQ con london2011-matthewwall-whyichosemongodbforguardiancouk
Q con london2011-matthewwall-whyichosemongodbforguardiancouk
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
Consistency-New-Generation-Databases
Consistency-New-Generation-DatabasesConsistency-New-Generation-Databases
Consistency-New-Generation-Databases
 
Java explore
Java exploreJava explore
Java explore
 
Mongo db实战
Mongo db实战Mongo db实战
Mongo db实战
 
Ca siteminder
Ca siteminderCa siteminder
Ca siteminder
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Eclipse plug in mylyn & tasktop
Eclipse plug in mylyn & tasktopEclipse plug in mylyn & tasktop
Eclipse plug in mylyn & tasktop
 

Kürzlich hochgeladen

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Kürzlich hochgeladen (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Getting Started with NoSQL and Data Scalability: An Introduction to Scalable Data Architecture Implementations and Technologies

  • 1. #105 Get More Refcardz! Visit refcardz.com Getting Started with CONTENTS INCLUDE: n Introduction n Scalable Data Architecture NoSQL and Data Scalability n Is NoSQL For You? n mongoDB n GigaSpaces XAP Google App Engine Datastore and more... By Eugene Ciurana n Implementations of either often share characteristics from the INTRODUCTION other. The DZone Refcard #43 is an introduction to system high Data Grids availability and scalability terminology and techniques Data grids process workloads defined as independent jobs (http://refcardz.dzone.com/refcardz/scalability). The next that don’t require data sharing among processes. Storage logical step is the scalable handling of massive data volumes or network may be shared across all nodes of the grid, but resulting from having these powerful processing capabilities. intermediate results have no bearing on other jobs progress or on other nodes in the grid, such as a MapReduce cluster. This Refcard demystifies NoSQL and data scalability techniques by introducing some core concepts. It also offers an overview of current technologies available in this domain Consumer and suggests how to apply them. What is Data Scalability? Master Data scalability is the ability of a system to store, manipulate, analyze, and otherwise process ever increasing amounts of www.dzone.com data without reducing overall system availability, performance, Load Balancer or throughput. Data scalability is achieved by a combination of more powerful Node Node Node Node processing capabilities and larger but efficient storage mechanisms. Relational and hierarchical databases scale up by adding more Load Balancer processors, more storage, caching systems, and such. Soon they hit either a cost or a practical scalability limit because they are difficult or impossible to scale out. These database Node Node Node Node management systems are designed as single units that must maintain data integrity, and enforce schema rules to guarantee Figure 1 - Data Grid it. This rigidity is what promotes upward, but not outward, Getting Started with NoSQL and Data Scalability scalability. Data grids expose their functionality through a single API (either a Web service or native to the application programming Oracle RAC is a cluster of multiple computers with language) that abstracts its topology and implementation from access to a common database. This is considered its data processing consumer. Hot only vertically scalable because the processing Tip (usually in the form of stored procedures) may scale out, but the shared storage facilities don’t scale with the cluster. Get over 90 DZone Refcardz Data integrity and schemas are suited for handling transactional, normalized, uniform data. They handle FREE from Refcardz.com! unstructured or rapidly evolving data structures with difficulty or exponentially larger costs. Hot Data replication is not the same as data scalability! Tip SCALABLE DATA ARCHITECTURES There are two general kinds of architectures used for building scalable data systems: data grids and NoSQL systems. DZone, Inc. | www.dzone.com
  • 2. 2 Getting Started with NoSQL and Data Scalability Areas of Application NoSQL vs RDBMS vs OO Analogies • Financial modeling NoSQL RDBMS OO • Data mining Kind Table Class • Click stream analytics Entity Record Object • Document clustering • Distributed sorting or grepping Attribute Column Property • Simulations • Inverted index construction IS NoSQL FOR YOU? • Protein folding NoSQL Preparation: NoSQL describes a horizontally scalable, non-relational • Don’t fall prey to the NoSQL fad database with built-in replication support. Applications • Don’t be stubborn; neither NoSQL nor traditional interact with it through a simple API, and the data is stored in a databases apply to all cases “schema-free”, flat addressing repository, usually as large files • Apply the CAP Theorem to your use cases to or data blocks. The repository is often a custom file system determine feasibility designed to support NoSQL operations. Brewer’s (CAP) Theorem NoSQL is intended as shorthand for “not only It’s impossible for a distributed computer system to Hot SQL.” Complete architectures almost always mix simultaneously provide all three of these guarantees: Tip traditional and NoSQL databases. • Consistency (all nodes see the same data at the same time) • Availability (node failures don’t prevent survivors from NoSQL setups are best suited for non-OLTP applications that continuing to operate) process massive amounts of structured and unstructured data • Partition tolerance (no failures less than total network at a lower cost and with higher efficiency than RDBMs and failures cause the system to fail) stored procedures. Since only two of these characteristics are guaranteed for any given scalable system, use your functional specification and Consumer business SLA (service level agreement) to determine what your minimum and target goals for CAP are, pick the two that meet your requirements, and proceed to implement the appropriate Node Node Node Node technology. Rule of Thumb: NoSQL’s primary goal is to achieve Virtual File System logical table management, load balancing, garbage collection Hot horizontal scalability. It attains this by reducing (HDFS, mongoFS, Hypertable) Tip transactional semantics and referential integrity. Tablet Tablet Tablet Server 0 Server 1 Server n Use Figure 3 to identify the best match between your Distributed File System application’s CAP requirements and the suggested SQL and NoSQL systems listed. FS 0 FS 1 FS 2 FS n Figure 2 - NoSQL Topology Relational Key-Value Column-Oriented NoSQL storage is highly replicated (a commit doesn’t occur Document-Oriented until the data is successfully written to at least two separate Consistency Availability storage devices) and the file systems are optimized for write- C A RDBMs (Oracle, MySQL), Aster Data, Green Plum, Vertica only commits. The storage devices are formatted to handle large blocks (32 MB or more). Caching and buffering are a, mo designed for high I/O throughput. The NoSQL database dr ng edi an oD s, s R is implemented as a data grid for processing (mapReduce, as B, Be ak C Pick Any Two Te rke Ri I, queries, CRUD, etc.) B, , KA rra ley sto D hD et re B, uc bin Areas of Application ,D M Co Ca ata em B, yo • Document storage sto cac leD Tok re he • Object databases mp t, , H DB Si mor yp , S lde er • Graph databases tab cala Vo le, ris • Key/value stores o, m Hb na P as • Eventually consistent key/value stores Dy , e This list is the implementation counterpart to the data grid Partition tolerance areas of application; the data store feeds the computational Figure 3 - CAP Selection Chart network and, together, they form the NoSQL database. (source: Nathan Hurst's Blog) DZone, Inc. | www.dzone.com
  • 3. 3 Getting Started with NoSQL and Data Scalability encoded JSON representation. BSON is designed to be mongoDB traversable, lightweight and efficient. Applications can map BSON/JSON documents using native representations like mongoDB is a document-based NoSQL database that bridges dictionaries, lists, and arrays, leaving the BSON translation the gap between scalable key-value stores like Datastore to the native mongoDB driver specific to each programming and Memcache DB, and RDBMS’s querying and robustness language. capabilities. Some of its main features include: BSON Example • Document-oriented storage - data is manipulated as { JSON-like documents 'name' : 'Tom', 'age' : 42 • Querying - uses JavaScript and has APIs for submitting } queries in every major programming language • In-place updates - atomicity Language Representation • Indexing - any attribute in a document may be used for Python { 'name' : 'Tom', indexing and query optimization 'age' : 42 • Auto-sharding - enables horizontal scalability } • Map/reduce - the mongoDB cluster may run smaller Ruby { "name" => "Tom", MapReduce jobs than a Hadoop cluster with significant "age" => 42 } cost and efficiency improvements Java BasicDBObject d; mongoDB implements its own file system to optimize I/O d = new BasicObject(); d.put("name", "Tom"); throughput by dividing larger objects into smaller chunks. d.put("age", 42); Documents are stored in two separate collections: files PHP array( "name" => "Tom", containing the object meta-data, and chunks that form a "age" => 42); larger document when combined with database accounting Dynamic languages offer a closer object mapping to BSON/ information. The mongoDB API provides functions for JSON than compiled languages. manipulating files, chunks, and indices directly. The administration tools enable GridFS maintenance. The complete BSON specification is available from: http://bsonspec.org/ Consumer mongoDB Programming Programming in mongoDB requires an active server running fail-over the mongod and the mongos database daemons (see Figure 5), and a client application that uses one of the language- mongoDB Server (master) mongoDB Server (slave) mongod mongod specific drivers. Database Database daemon daemon mongos mongos Sharding daemon Sharding daemon Hot All the examples in this Refcard are written in Python Data Data Tip for conciseness. Storage Storage Starting the Server Figure 5 - mongoDB Cluster Log on to the master server and execute: A mongoDB cluster consists of a master and a slave. The slave [servername:user] ./mongod may become the master in a fail-over scenario, if necessary. Having a master/slave configuration (also known as Active/ The server will display its status messages to the console Passive or A/P cluster) helps ensure data integrity since only unless stdout is redirected elsewhere. the master is allowed to commit changes to the store at any Programming Example given time. A commit is successful only if the data is written to This example allocates a database if one doesn’t already exist, GridFS and replicated in the slave. instantiates a collection on the server, and runs a couple of queries. mongoDB also supports a limited master/master configuration. It’s useful only for inserts, queries, The mongoDB Developer Manual is available from: Hot and deletions by specific object ID. It must not http://www.mongodb.org/display/DOCS/Manual Tip be used if updates of a single object may occur #!/usr/bin/env jython concurrently. import pymongo from pymongo import Connection Caching connection = Connection('servername', 27017) mongoDB has a built-in cache that runs directly in the cluster without external requirements. Any query is transparently db = connection['people_database'] cached in RAM to expedite data transfer rates and to reduce peopleList = db['people_list'] disk I/O. person = { 'name' : 'Tom', Document Format 'age' : 42 } mongoDB handles documents in BSON format, a binary- DZone, Inc. | www.dzone.com
  • 4. 4 Getting Started with NoSQL and Data Scalability peopleList.insert(person) • JSON data and program objects storage - many RESTful web services provide JSON data; they can be stored in person = { 'name' : 'Nancy', mongoDB without language serialization overhead 'age' : 69 } (especially when compared against XML documents) peopleList.insert(person) • Content management systems - JSON/BSON objects can # find first entry: represent any kind of document, including those with a person = peopleList.find_one() binary representation # find a specific person: mongoDB Drawbacks person = peopleList.find_one({ 'name' : 'Joe'}) • No JOIN operations - each document is stand-alone if person is None: • Complex queries - some complex queries and indices are print "Joe isn’t here!" else: better suited for SQL print person['age'] • No row-level locking - unsuitable for transactional data # bulk inserts without error prone application-level assistance persons = [{ 'name' : 'Joe' }, {'name' : 'Sue'}] peopleList.insert(persons) If any of these is part of the functional requirements, a SQL database would be better suited for the application. # queries with multiple results for person in peopleList.find(): print person['name'] GIGASPACES XAP for person in peopleList.find({'age' : {'$ge' : 21}}).sort('name'): print person['name'] The GigaSpaces eXtreme Application Platform is a data # count: grid designed to replace traditional application servers. It nDrinkingAge = peopleList.find({'age' : {'$ge' : 21}}).count() operates based on an event-processing model where the application dispatches objects to the processing nodes # indexing associated with a given data partition. The system may be from pymongo import ASCENDING, DESCENDING configured so that data state on the grid may trigger events, or peopleList.create_index([('age', DESCENDING), ('name', ASCENDING)]) an application may dispatch specific commands imperatively. GigaSpaces XAP also manages all threading and execution The PyMongo documentation is available at: aspects of the operation, including thread and connection http://api.mongodb.org/python - guides for other languages pools. GigaSpaces XAP implements Spring transactions and are also available from this web site. auto-recovery. The system detects any failed operations The code in the previous example performs these operations: in a computational node and automatically rolls back the transaction; it then places it back in the space where another • Connect to the database server started in the previous node picks it up to complete processing. section • Attach a database; notice that the database is treated like Application Frameworks an associative array • Get a collection (loosely equivalent to a table in a Java C++ .Net Groovy relational database), treated like an associative array Mule Spring JEE Jetty • Insert one or more entities • Query for one or more entites Although mongoDB treats all these data as BSON internally, most of the APIs allow the use of dictionary-style objects to streamline the development process. XAP Deployment Virtualization XAP Object ID Management A successful insertion into the database results in a valid and XAP Middleware Virtualization Monitoring Object ID. This is the unique identifier in the database for a (Virtualized Clustering Layer) given document. When querying the database, a return value will include this attribute: { "name" : "Tom", "age" : 42, "_id" : ObjectId('999999') } RDBMS Memcache DB mongoDB Users may override this Object ID with any argument as long as it’s unique, or allow mongoDB to assign one automatically. Figure 4 - GigaSpaces XAP Data Grid Common Use Cases • Caching - more robust capabilities, plus persistence, when GigaSpaces provides data persistence, distributed processing, compared against a pure caching system and caching by interfacing with SQL and NoSQL data stores. • High volume processing - RDBMS may be too expensive The GigaSpaces API abstracts all back-end operations (job or slow to run in comparison dispatching, data persistence, and caching) and makes DZone, Inc. | www.dzone.com
  • 5. 5 Getting Started with NoSQL and Data Scalability it transparent to the application. GigaSpaces XAP may implement distributed computing operations like MapReduce GOOGLE APP ENGINE DATASTORE and run them in its nodes, or it may dispatch them for processing to the underlying subsystem if the functionality The Datastore is the main scalability feature of Google App is available (e.g. mongoDB, Hadoop). The application may Engine applications. It’s not a relational database or a façade implement transactions using the Spring API, the GigaSpaces for one. Datastore is a public API for accessing Google’s XPA transactional facilities, or by implementing a workflow Bigtable high-performance distributed database system. where specific NoSQL stores handle entities or groups of Think of it as a sparse array distributed across multiple servers entities. This must be implemented explicitly in systems like that also allows an infinite number of columns and rows. mongoDB, but may exist in other NoSQL systems like Google Applications may even define new columns “on the fly”. The App Engine’s Datastore. Datastore scales by adding new servers to a cluster; Google provides this functionality without user participation. GigaSpaces XAP Programming Google Your Applications Applications The GigaSpaces XAP API is very rich and covers many aspects beyond NoSQL and data scalability areas like configuration Datastore API 1 Datastore Java (JDO, JPA) Other language Python management, deployment, web services, third-party product Bigtable integration, etc. Master Server (Logical table management, load balancing, garbage collection) The GigaSpaces XAP documentation is at: Tablet Tablet Tablet http://www.gigaspaces.com/wiki/display/XAP71/Programmer%27s+Guide Server 0 Server 1 Server n NoSQL operations may be implemented over these APIs: Google File System • SQLQuery - allows querying a space using a SQL-like FS 0 FS 1 FS 2 FS n syntax and regular expressions; do not confuse it with Figure 6 - Datastore Architecture JDBC support. Datastore operations are defined around entities. Entities • Persistency - mostly supports RDBMSs but may implement can have one-to-many or many-to-many relationships. The other persistency mechanisms through the External Data Datastore assigns unique IDs unless the application specifies Source Components API. a unique key. Datastore also disallows some property names • memcached - support for key/value pair distributed that it uses for housekeeping. The complete Datastore dictionaries available to any client in the grid; entities documentation is available from: are automatically made available across all nodes. The http://code.google.com/appengine/docs/python/datastore/ memcached API is implemented on top of the data grid, and it’s interchangeable with other memcached Did you notice the parallels between Datastore implementations. Hot and mongoDB so far? Many NoSQL database • Task Execution - allows synchronous and asynchronous job Tip implementations have solved similar problems in execution on specific nodes or clusters similar ways. The GigaSpaces XAP API is in a minority of stateful NoSQL Transactions and Entity Groups systems. Most NoSQL systems strive to achieve statelessness Datastore supports transactions. These transactions are only to increase scalability and data consistency. effective on entities that belong to the same entity group. Entities in a given group are guaranteed to be stored on the Common Use Cases same server. • Real-time analytics - dynamic data analysis and reporting Datastore Programming based on data entered into a system less than a minute The programming model is based on inheritance of basic before effective time of use entities, db.Model and db.Expando. Persistent data is • Map/reduce - distributed data processing of large data mapped onto an entity specialization of either of these classes. sets across a computational grid The API provides persistence and querying instance methods • Near-zero downtime - allows for database schema for every entity managed by the Datastore. changes without homebrew master/slave configurations or Programming Example proprietary RDBMS dependencies The Datastore API is simpler than other NoSQL APIs and is GigaSpaces XAP NoSQL Drawbacks highly optimized to work in the App Engine environment. In • Complexity - the server, transactional, and grid model are this example we insert data into the data store, then run a query: more complex than for other NoSQL systems from google.appengine.ext import db • Application server model - the API and components are class Person(db.Model): name = db.StringProperty(required=True) geared toward building applications and transactional age = db.IntegerProperty(require=True) logic person = Person(name = 'Tom', age = 42) • Steeper learning curve person.put() • Higher TCO - brings a requirement of a specialized, person = Person(name = 'Sue', age = 69) well-trained system administration team with higher requirements than other NoSQL systems person.put() DZone, Inc. | www.dzone.com