SlideShare ist ein Scribd-Unternehmen logo
1 von 7
Downloaden Sie, um offline zu lesen
2011 Seventh International Conference on Semantics, Knowledge and Grids



    CCIndex for Cassandra: A Novel Scheme for Multi-
        dimensional Range Queries in Cassandra
                                               Chen Feng#1, Yongqiang Zou*2, Zhiwei Xu#3
                                    #
                                        Institute of Computing Technology, Chinese Academy of Sciences
                                                             Beijing, 100190, China
                                                       1
                                                           fengchen8086@gmail.com
                                                              3
                                                                zxu@ict.ac.cn
                                                                *
                                                                 Tencent Corporation
                                                                Beijing, 100190, China
                                                           2
                                                               aaronzou@tencent.com


 Abstract—Multi-dimensional range queries are fundamental                     their limited support for queries on non-primary keys leads to
 requirements in large scale Internet applications using                      poor performance in multi-dimensional range queries
 Distributed Ordered Tables. Apache Cassandra is a Distributed                involving non-primary keys.
 Ordered Table when it employs order-preserving hashing as data                  CCIndex [7], short for Complemental Clustering Index, is
 partitioner. Cassandra supports multi-dimensional range queries              proposed to support multi-dimensional range queries over
 with poor performance and with a limitation that there must be
                                                                              DOTs for high performance, low space overhead, and high
 one dimension with an equal operator. Based on the success of
 CCIndex scheme in Apache HBase, this paper tries to answer the
                                                                              reliability. CCIndex has been implemented on HBase and
 question: Can CCIndex benefit multi-dimensional range queries                gains 11.4 times scan efficiency over non-primary columns.
 in DOTs like Cassandra?                                                         The Apache Cassandra [8] is a highly scalable distributed
 This paper studies the feasibility of employing CCIndex in                   database with fully distributed design like Dynamo [9] and
 Cassandra, proposes a new approach to estimate result size,                  column family data model of BigTable. Cassandra is a
 implements CCIndex in Cassandra including recovery                           Distributed Ordered Tables rather than Distributed Hash
 mechanisms and studies the pros and cons of CCIndex for                      Tables (DHT) when it employs order-preserving hashing as
 different DOTs. Experimental results show that CCIndex gains                 data partitioner, instead of consistency hashing. Cassandra
 2.4 to 3.7 times efficiency over Cassandra’s index scheme with
                                                                              partitions primary keys into nodes in a circle overlay as in
 1% to 50% selectivity for 2 million records. This paper shows
 that CCIndex is a general approach for DOTs, and could gain                  Chord [10], replicates keys for performance and reliability,
 better performance for DOTs which perform scan tasks much                    and supports range queries on keys. Cassandra supports multi-
 faster than random read. This paper reveals that Cassandra is                dimensional range queries since version 0.7 with a limitation
 optimized for hash tables rather than ordered tables in                      that there must be one dimension with an equal operator in the
 performing read and range queries.                                           query expression, which hinders the broad usage of these
                                                                              queries. Multi-dimensional range queries of Cassandra’s index
                        I. INTRODUCTION                                       schema also encounter the efficiency problem when applied in
    Multi-dimensional range queries are fundamental                           the HBase IndexedTable. When responding to the queries, the
 requirements in large scale Internet applications and gained                 system must first scan the secondary index to get primary keys,
 more and more attentions in Distributed Ordered Tables                       and then issue multiple random reads to get real data.
 (DOTs) [1,2] like BigTable [3], PNUTS [4], and HBase [5] in                     Can CCIndex benefit multi-dimensional range queries in
 recent years.                                                                DOTs like Cassandra to support multi-dimensional range
    Multi-dimensional range queries mean queries with less                    queries without such limitations and with better performance?
 than operator or greater than operator on multiple table                        This paper studies the feasibility of employing CCIndex to
 columns. For example, a query for yesterday’s hot photos                     support multi-dimensional range queries in Cassandra. We
 written in SQL is like “select * from photos where                           identify three differences between HBase and Cassandra when
 hit_counts > 100000 and create_time > now() - 86400”. When                   utilizing CCIndex: (1) The smallest sorted unit is region in
 modeling resources in physical or cyber space as a multi-                    HBase while it’s node in Cassandra; (2) The speed of range
 dimensional classification space as in Probabilistic Resource                query in Cassandra is not fast enough to accelerate the
 Space Model (P-RSM) [6], multi-dimensional range queries                     CCIndex performance; (3) The APIs of HBase and Cassandra
 are basic operations.                                                        are different. This paper proposes a new approach to estimate
    As the data scale grows, Distributed Ordered Tables are                   result size by data distribution information, implements
 adopted in more and more applications to store and query                     CCIndex in Cassandra, studies the pros and cons of CCIndex
 structured data for outstanding performance, reliability, and                for different DOTs styles, and reveals more performance
 scalability. Naturally, Distributed Ordered Tables can support               issues of Cassandra.
 point queries and range queries on primary key. However,                        The contributions of this paper are summarized as follows.

978-0-7695-4515-8/11 $26.00 © 2011 IEEE                                 130
DOI 10.1109/SKG.2011.28
1. This paper employs CCIndex to support multi-                           CCIndex creates all ComplementalTables and CCTs when
dimensional range queries overcoming the limitations of                   the OriginalTable is created. CCIndex maintains the index by
Cassandra. The results show that CCIndex gains 2.4 times                  the procedures of inserting and deleting.
performance over Cassandra’s index scheme with 1%
selectivity, and about 3.7 times performance when the
selectivity is 50% for 2 million records.
   2. This paper shows that CCIndex is a general approach for
DOTs, which could gain better performance for DOTs with
slow random read and fast sequential read. This paper shows
that CCIndex improves query performance by about 2 times
on DOTs with fast random read, and achieves an order of
magnitude times performance improvement for the DOTs
whose random read is significantly slower than sequential
read or scan, such as HBase. This paper implements the
CCIndex recovery mechanism indicates that the efficiency of
CCIndex recovery is 33% of that of sequential write for
Cassandra.
   3. This paper reveals that Cassandra is optimized for hash
tables rather than ordered tables. Cassandra provides both
consistency hashing and order-preserving hashing, while the
read and scan operations are not optimized for order-
preserving hashing, such as considering pre-fetch for read, and
optimizing scan for range queries over ordered tables.
Cassandra’s strategy is good for hash tables, but inefficient for
ordered tables.
   This paper is organized as follows. Section 2 gives the
background. Section 3 illustrates the design and                                             Fig. 1 Data layout of CCIndex.
implementation for CCIndex in Cassandra. Section 4 shows                     The procedure of writing is shown as Fig. 2. When writing
the experimental results and the discussion on the results.               a record into OriginalTable, CCIndex reads the OriginalTable
Section 5 concludes the whole work.                                       by rowkey to get the old values, checks whether the index
                                                                          values are going to be modified, and then deletes records form
                       II. BACKGROUND
                                                                          corresponding CCITs and CCTs when updating index values.
A. CCIndex Analysis                                                       After that, CCIndex writes the records to all CCITs and CCTs.
    CCIndex is proposed to support multi-dimensional range                When deleting a record, CCIndex reads all index values from
queries over DOTs by reorganizing data. CCIndex introduces                OriginalTable and deletes records from all CCITs and CCTs.
a ComplementalTable for each index column. A
ComplementalTable stores all columns except the rowkey and
the corresponding index column. The ComplementalTable
rowkey is a concatenation of the index column value, the
original rowkey, and the length of index column value. The
way of generating the rowkey of ComplementalTable ensures
that all the rowkeys are unique and sorted by index column
and the original rowkey. The OriginalTable and the
ComplementalTables are called Complemental Clustering
Index Table (CCIT). CCIT sets the replica factor to 1 to
decrease the storage overhead. CCIndex maintains the
reliability of a CCIT by other CCITs and introduces a
replicated CCT (Complemental Check Table) for each CCIT
                                                                                            Fig. 2 The procedure of writing.
to help data recovery.
   In Fig. 1, there is an OriginalTable (CCIT0) with a primary               The procedure of multi-dimensional range queries is shown
id and two index columns weight and height. CCIT-W and                    as Fig. 3. CCIndex estimates result size for each query
CCIT-H (ComplementalTable) are ordered by key1 and key2                   condition and selects the condition with the smallest result
respectively. With these CCITs, range queries over id, weight,            size to execute range query on corresponding CCIT. CCIndex
or height can be converted to range queries on CCIT0, CCIT-               employs other conditions to filter the result got by range query
W or CCIT-H.                                                              and returns the ultimate results of multi-dimensional range
   CCT stores the rowkey and all index columns of a CCIT.                 queries.
CCTs are replicated while the CCITs are not replicated.



                                                                    131
ratio of CCIndex to IndexedTable is determined by
                                                                                the speed ratio of range query to random read.
                                                                         B. Cassandra Analysis
                                                                            Cassandra organizes nodes as a ring overlay like Chord to
                                                                         partition data. Each node manages a part of data in the ring,
                                                                         with data id from previous node token to this node token.
                                                                         Records use the same partitioner to map its key to the token
                                                                         ring. Corresponding node writes records to commitlog and
                                                                         then to its memtable.
                                                                            Memtable is a memory structure contains sorted rows.
                                                                         Memtable is flushed to an SSTable on disk when it is full.
                                                                         SSTable is a sorted structure flushed one by one and cannot be
                                                                         modified once flushed, so that records between multiple
                                                                         SSTables are not sorted as in Fig. 5. Cassandra combines
                                                                         several old SSTables into a new SSTable by compaction to
                                                                         reduce the SSTable number. Each node contains more than
        Fig. 3 The procedure of multi-dimensional range queries.
                                                                         one SSTable in most cases.
   CCIndex for HBase uses a simple way to estimate the result
size. In HBase, HMaster stores region-to-server mapping
information as in Fig. 4. The mapping information can be
described as a set of <startKey-regionServer>, ordered by
startKey. CCIndex finds the regions covered by each range
query and estimates the result size by the region number.
When HBase has more than 1 region and has max region size
Smax, each region size must be greater than Smax/2 and less than
Smax. Thus CCIndex considers the result size depends on the
region number covered.




                                                                                 Fig. 5 An example of memtable and SSTables in a node.

                                                                            Like Dynamo, Cassandra keeps strong consistency if W +
                                                                         R > N, where W and R indicates respectively the minimum
                                                                         number of nodes that have executed write and read operation
                                                                         successfully, and N is the number of replication factor.
                                                                         Cassandra uses different ConsistencyLevels to keep the
                                                                         balance between consistency and availability. In writing,
                                                                         ConsistencyLevel.ONE and QUORUM ensure that the write
            Fig. 4 The region-to-server mapping of HBase.
                                                                         operation has been executed successfully on at least 1 and N /
  In HBase, the speed of scan is 8.2 times of random read.               2 + 1 node(s). In reading, ONE returns the record responded
The speed of multi-dimensional range query on CCIndex is                 by the fastest node and QUORUM returns the record in
11.4 times of IndexedTable.                                              majority of most recent records from at least N / 2 + 1 nodes.
  The performance of CCIndex is affected by 2 issues:                    Comparing with ONE, QUORUM has higher latency while
        The accuracy of result size estimation. The more                 maintaining the consistency.
        accurate the estimation is, the less unnecessary                    Cassandra version 0.7+ provides APIs to execute multi-
        records will be scanned.                                         dimensional range queries. But there is a limitation that the
        The speed ratio of range query to random read. To                APIs require at least one equal operator on a configured index
        execute a multi-dimensional range query, CCIndex                 column in the query expression. Cassandra also provides APIs
        executes range query on a CCIT and then filters the              to execute the range query over rowkey, but the speed of
        result. IndexedTable executes range query on an index            range query is only 1.3 times of random read.
        table to get original rowkeys, and then gets the records            In summary, there are three issues of mismatches between
        by random read on those rowkeys. Thus the speed                  HBase and Cassandra, which impose challenges when
                                                                         utilizing CCIndex for Cassandra.




                                                                   132
1) The smallest sorted unit is region in HBase while it’s               CCIndex encapsulates APIs of HBase and Cassandra, and
node in Cassandra: In HBase, regions are sorted by the                   exposes the same CCIndex APIs for applications.
rowkey of records. In Cassandra, records are stored in
SSTables and sorted between nodes, while multiple SSTables               D. Data recovery
in the same node are not sorted. The difference decreases the               CCIndex introduces replicated CCT to help recover the
accuracy of estimating result size.                                      damaged data. This paper implements the data recovery
   2) The speed of range query: Cassandra executes range                 module with CCT in Cassandra.
query by logical scan, traversing all SSTables to find the                  To recover a record of OriginalTable, CCIndex first reads
‘next’ record, while HBase executes physical scans on regions.           CCTs by rowkey to get all index columns. Then CCIndex
   3) The differences between HBase and Cassandra on APIs:               concatenates the original rowkey and the index column value
To implement CCIndex for Cassandra, the API issue must be                to form the rowkey of a certain ComplementalTable. CCIndex
considered, namely how to utilize the different APIs given by            tries to read the record by the concatenated rowkey and write
HBase and Cassandra and unify the APIs CCIndex providing                 the corresponding record into OriginalTable. If the recovery
to the application level.                                                fails, CCIndex tries to recover data by another
                                                                         ComplementalTable.
              III. DESIGN AND IMPLEMENTATION                                To recover a record on ComplementalTable, CCIndex gets
   CCIndex for Cassandra uses different methods to deal with             the rowkey of OriginalTable by splitting the given rowkey.
the differences when utilizing CCIndex for Cassandra.                    Then CCIndex tries to read the record from OriginalTable. If
                                                                         the reading operation fails, CCIndex uses other index column
A. The smallest sorted unit issue.                                       values got from CCT to recover data by other
    As record size between nodes might be unbalanced, the                ComplementalTables.
way which CCIndex for HBase uses to estimate result size by                 To recover a certain range of table, CCIndex scans
covered region number cannot work on Cassandra. This paper               corresponding CCT, and uses the methods above to recover
uses a different way to estimate result size, which lies on data         records one by one. A range can be split into several parts for
distribution information of Cassandra.                                   multi-thread recovery to increase efficiency.
   1) Data distribution information gathering: CCIndex for               E. Implementation
Cassandra first adds an API in CassandraClient to gather
SSTable information of a certain node, and then adds a                      CCIndex for Cassandra prototype uses Cassandra v0.7.2 as
daemon thread Listener in CassandraDaemon. Listener gets                 code bases and is written in Java.
token ring information from StorageService every other                      As replica factor of Cassandra associates with keyspace, it
minute. With token-IP mapping, Listener uses the API above               is easy for CCIndex for Cassandra to replicate CCTs by
to get SSTable information from every node. Thus each node               putting CCTs into a separate keyspace with replica factor 3.
saves the data distribution information of all nodes. Cassandra          CCIndex sets keyspace replica factor to 1 for CCIT, and
kernel code is modified without performance degradation.                 creates one ComplementalTable for each index column.
   2) The estimation of result size: CCIndex client uses a
thread Refiner to get data distribution information and token
ring information from Listener, then CCIndex estimates result
size for every query condition:
    • Calculate the nodes covered by range. Count the node
       number as N3,
    • For every node covered, read the SSTable data file total
       size S, and file number C,
    • Summarize the total size of S, C for all nodes, get N1,
       N2.
    Each search condition has a tuple [N1, N2, N3]. N1 has
higher priority than N2, and N2 has higher priority than N3.
CCIndex for Cassandra executes range queries on
corresponding CCIT which has the smallest tuple.
B. The speed of range query
   The speed of range query is determined by Cassandra
system. The aim of CCIndex for Cassandra is to implement
                                                                                    Fig. 6 The architecture of CCIndex for Cassandra.
CCIndex while making as few changes as possible. The low
speed of range query affects the speed of multi-dimensional                 CCIndex for Cassandra client connects with a server node
range queries but does not restrict the implementation.                  to perform operations like inserting, reading and range query.
                                                                         As Fig. 6 shows, CCIndex for Cassandra uses a connection
C. The API issue




                                                                   133
pool extends from Pelops [11]. The connection pool assigns a                     not have enough replicas for CCIT. When N changes from 2
random connection to each client to avoid hot spot issue.                        to 4 and Ls/L changes from 1/30 to 1/10, the overhead ratio
   The client gets the token ring and data distribution                          changes from 10% to 116.7%.
information by sending a query to a certain node to estimate
the query result size.                                                           B. Experiment Setup
                                                                                    This paper introduces a benchmark to evaluate the basic
                      IV. EVALUATION                                             operations throughput, including sequential read/write,
   CCIndex for Cassandra is implemented and evaluated                            random read, and range query. The workload uses a table with
through analysis and experiments.                                                columns rowkey, index1, index2, index3 and data. The length
                                                                                 of rowkey, index1, index2 and index3 are 10 bytes while the
A. Space Overhead Analysis                                                       data column is 1 KB. The throughput is defined as rows per
    For the given metrics, the performance is easy to be                         second for all clients.
evaluated through experiments. As to the space overhead,                            CCIndex builds index for index1, index2, and index3,
theoretical analysis is more suitable.                                           ConsistencyLevel for CCIT is ONE, and is QUORUM for
    Here we denote the number of index columns by N, the                         CCT.
replica factor of Original Cassandra and CCT by R, the                              Original Cassandra and Cassandra Indexed set replica to 3
average length of the key and all index columns by Ls, and the                   and ConsistencyLevel to QUORUM. Original Cassandra does
total length of record by L.                                                     not build index. Cassandra Indexed builds index for index1,
    In Original Cassandra, the space for every record is:                        index2, and index3.
                              SORG = L * R                                          The experimental cluster has 5 nodes. Each node has two
                                                                (1)              1.8 GHz dual-cores AMD Opteron (tm) Processor270, with 4
    In CCIndex, the space for each record is the CCITs plus                      GB memory. Each node in the cluster has 321 GB RAID5
CCTs. The space for CCITs is:                                                    SCSI disks. All nodes are connected by Gigabits Ethernet.
                           SCCIT = L *( N + 1)                  (2)              Each node uses Red Hat CentOS release 5.3 (kernel 2.6.18),
    The space for CCT is:                                                        ext3 file system, Sun JDK1.6.0_14. The test runs on another
                        SCCT = Ls *( N + 1)* R                                   client machine, which has two 2.0 GHz Intel(R) Core(TM)
                                                                (3)              Duo T5750 Processor , with 3 GB memory, Broadcom
    The total space for CCIndex is:
                                                                                 Netlink(TM) fast Ethernet 100M bps. The client uses Ubuntu
               SCC = SCCIT + SCCT = ( N + 1)( L + Ls * R)       (4)              10.04LTS, ext3 file system, Sun JDK 1.6.0_14.
    The space overhead ratio of CCIndex to Original Cassandra                       The workload in the experiments has 2 million rows; the
is:                                                                              token of each node is initialized manually to keep load
           SCC / SORG − 1 = ( N + 1) / R + ( N + 1)* Ls / L − 1                  balance. Each test runs three times to report the average value.
                                                                (5)              The client uses 25 concurrent threads for sequential write,
    In Cassandra, the replica number R is often set to 3. The                    sequential read, random read and range query, and uses 1
radio is:                                                                        thread for multi-dimensional range queries.
                    ( N + 1) / 3 + ( N + 1)* Ls / L − 1         (6)
                                                                                 C. Experiment Result
    Equation (6) can be plotted as Fig. 7.
                                                                                    The result in Fig. 8 shows that ConsistencyLevel has great
                                                                                 effect on every test, which can be confirmed by the great
                                                                                 differences between the throughput of Cassandra(1) and
                                                                                 Cassandra(3) or Cassandra Indexed(1) and Cassandra
                                                                                 Indexed(3).
                                                                                    The throughput of sequential write for CCIndex is
                                                                                 significantly lower than the Cassandra Indexed and much
                                                                                 lower than the Original Cassandra, because maintaining index
                                                                                 needs extra random read to get row data from OriginalTable,
                                                                                 and if there are old index column values, further delete
                                                                                 operations are needed to update index.
                                                                                    The performance of Original Cassandra(3) and Cassandra
                                                                                 Indexed(3) on range query, random read, and sequential read
 Fig. 7 The space overhead ratio of CCIndex to Original Cassandra. Using         are nearly identical due to the same implementation. They are
                    L/Ls values as the horizontal axis.                          lower than that of CCIndex because of ConsistencyLevel,
   From Fig. 7, the overhead ratio drop significantly as the                     which can be confirmed by the fact that Original Cassandra(1)
Ls/L decreases and the N decreases, which indicates that to                      and Cassandra Indexed(1) have nearly the same throughput
avoid huge space overhead, there should be less index                            with CCIndex.
columns in CCIndex and the data length of index columns
should be shorter. When N is smaller than 2, CCIndex would




                                                                           134
CCIndex increases to 3.7 times that of Cassandra Indexed(3).
                                                                                  In the experiment, CCIndex is about 1.8 to 2.7 times as fast as
                                                                                  Cassandra Indexed(1).
                                                                                     In another test on Cassandra Indexed, when MAXVALUE
                                                                                  is 100 and the query expression is 0 < index1 < 10000, 0 <
                                                                                  index2 < 10000 and index3 = 0, exception happens every time
                                                                                  in all 10 attempts while CCIndex performs well. We consider
                                                                                  it happens when many records are discarded by the non-equal
                                                                                  columns ranges.
                                                                                     The throughput of recovery is 1819 records/s in average in
                                                                                  Fig. 10. To recover one record, CCIndex first executes range
                                                                                  query on CCT, writes on CCIT, and random reads on CCIT.
                                                                                  The CCT range query speed is 6013 records/s, while the write
                                                                                  speed on CCIT is 4778 records/s and the random read speed
                                                                                  on CCIT is 4797 records/s. The recovery speed is 1964.7
 Fig. 8 Basic Operations for Original Cassandra, Cassandra Indexed and            records/s in theory. Comparing with 1819 records/s in practice,
CCIndex. Cassandra(1) is Cassandra with 1 replica and ConsistencyLevel is         the recover speed matches the theoretical analysis.
 ONE. Cassandra(3) is Cassandra with 3 replica and ConsistencyLevel is
     QUORUM. Cassandra Indexed builds index for index columns.

   In this experiment, N is 4,Ls/L is 1/30, CCIndex uses 46%
more space than Original Cassandra(3) in theory. The result
shows that Original Cassandra(3) uses 1.39 GB per node
while CCIndex uses 2.12 GB per node, which has 52.6%
space overhead. Because there are memtables not flushed in
memory, we consider the storage overhead confirms the
theoretical analysis.
   The tests of multi-dimensional range query writes records
with index1 and index2 whose value is randomly generated
from 0 to 2 million and index3 is randomly generated from 0
to MAXVALUE. In this way, the test could use expression 0
< index1 < 2000000 and 0 < index2 < 2000000 and index3 =                                            Fig. 10 CCIndex recovery speed.
0 to match the requirement of Cassandra API. The
MAXVALUE of index3 is set from 100 to 1 to change the                             D. Discussion
selectivity from 1% to 100%.                                                         The results provide many insights on CCIndex and
   The results of multi-dimensional range query test on                           Cassandra.
different conditions are shown as Fig. 9. When the selectivity                       1) Overall, the results show that CCIndex is a general
is under 10%, Cassandra Indexed performs well, but when the                       approach for DOTs, successfully in improving both
selectivity raises from 20% to 100%, the latency increases                        performance and query expressiveness.
significantly.                                                                       2) The results show that in Cassandra, the sequential read
                                                                                  and random read are the same in throughput and the range
                                                                                  query throughput is only 1.3 times as fast as random read. But
                                                                                  if a client sets Cassandra’s partitioner to OrderedPartitioner, it
                                                                                  suggests that the client is probably willing to use some special
                                                                                  operations on ordered table such as sequential read and range
                                                                                  query. Cassandra could do some optimization like prefetching
                                                                                  and caching on adjacent records.
                                                                                     3) CCIndex is suitable for tables with 2 to 4 index columns.
                                                                                  CCIndex cannot guarantee the reliability with fewer than 2
                                                                                  index columns because the CCITs are not replicated. If there
                                                                                  are more than 4 index columns, the space overhead is more
                                                                                  than 2 times of the Original Cassandra. When a table has more
                                                                                  than 4 columns with query requirements, a solution is to build
                                                                                  index for 2 to 4 most frequently used columns, and to filter the
   Fig. 9 Throughput of multi-dimensional range queries by CCIndex ,              result by non-indexed conditions in applications.
            Cassandra Indexed(1) and Cassandra Indexed(3)                            4) The throughput of CCIndex is determined by the ratio of
   The throughput ratio of CCIndex to Cassandra Indexed(3)                        range query to random read. This explains why the throughput
is at least 2.4. When the selectivity grows, the throughput of                    of CCIndex for Cassandra is 2.4 to 3.7 times to Cassandra



                                                                            135
Indexed(3), while the throughput of CCIndex for HBase is                         1% to 50% selectivity for 2 million records. This paper shows
11.4 times to that of IndexedTable. CCIndex converts random                      that CCIndex is a general approach for DOTs, and could gain
read on OriginalTable to range query on CCIT, so its                             better performance on multi-dimensional range queries for
performance is associated with the speed improvement from                        DOTs with slow random read and fast sequential read. This
random read to range query.                                                      paper implements the CCIndex recovery mechanism and show
   During the procedure of multi-dimensional range query,                        that CCIndex recovery performance is 33% of that for
IndexedTable executes range query and random read for every                      sequential write in Cassandra. This paper reveals that
record before filtering while CCIndex only needs to execute                      Cassandra is optimized for hash tables rather than ordered
range query for one time.                                                        tables in read and range queries. Cassandra could do some
   We denote the speed of range query by Ss, and the speed of                    optimizing like prefetching and caching on adjacent records.
random read by Sr.
   The speed for CCIndex to get records is:                                                           ACKNOWLEDGMENT
                               Scc = S s                             (7)
                                                                                    This work is supported in part by the Hi-Tech Research and
                                                                                 Development (863) Program of China (Grant No.
   The speed for IndexedTable is:
                                                                                 2006AA01A106), and the major national science and
                Si = 1/ (1/ S s + 1/ S r ) = S s * Sr / ( S s + Sr ) (8)         technology special projects (2010ZX03004-003-03).
   The ratio of CCIndex to IndexedTable is:
                   Scc / Si = ( S s + Sr ) / Sr = 1 + S s / Sr                                                  REFERENCES
                                                                     (9)         [1]    Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava, Erik Vee,
   So the ratio of CCIndex to IndexedTable is decided by the                            Ramana Yerneni, and Raghu Ramakrishnan, “Efficient bulk insertion
value of Ss / Sr. For HBase, Ss / Sr is equal to 8.2 and Scc / Si is                    into a distributed ordered table,” in Proceedings of the 2008 ACM
equal to 9.2. As there’s no optimization on query,                                      SIGMOD International conference on Management of Data, 2008.
                                                                                 [2]    Ymir Vigfusson, Adam Silberstein, Brian F. Cooper, Rodrigo Fonseca,
IndexedTable filters more records as candidate results. So the                          “Adaptively parallelizing distributed range queries,” in Proc. VLDB
final ratio of CCIndex to IndexedTable on multi-dimensional                             Endow., vol. 2, pp. 682–693. VLDB Endowment (2009)
range queries, 11.4, meets the analysis.                                         [3]    Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh,
   From Fig.9, the throughput of CCIndex is 1.9 and 2.4 times                           Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes,
                                                                                        and Robert E. Gruber, “Bigtable: a distributed storage system for
to Cassandra Indexed(1) and Cassandra Indexed(3)                                        structured data,” in 7th USENIX Symposium on Operating Systems
respectively. CCIndex performs the same with Cassandra                                  Design and Implementation, 2006.
Indexed(1) in random read and scan.                                              [4]    Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam
   From Fig.8 Ss / Sr is equal to 1.2 on Cassandra Indexed(1),                          Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel
                                                                                        Weaver, and Ramana Yerneni, “PNUTS: Yahoo!'s hosted data serving
and CCIndex takes more time to filter the result, the final ratio                       platform,” in Proc. VLDB Endow. vol. 1, pp. 1277--1288. 2008
1.9 is close to the predicted value 2.2.                                         [5]    Apache HBase project. [Online]. Available: http://hbase.apache.org/.
                                                                                 [6]    Hai Zhuge, "Probabilistic Resource Space Model for Managing
                        V. CONCLUSIONS                                                  Resources in Cyber-Physical Society," IEEE Transactions on Services
                                                                                        Computing, vol. 99, no. PrePrints, 2011
   Cassandra is a Distributed Ordered Table supporting multi-                    [7]    Yongqiang Zou, Jia Liu, Shicai Wang, Li Zha, and Zhiwei Xu,
dimensional range queries. However, current design and                                  “CCIndex: a Complemental Clustering Index on Distributed Ordered
implementation of Cassandra have two problems: (1)                                      Tables for Multi-dimensional Range Queries,” in 7th IFIP
                                                                                        International Conference on Network and Parallel Computing, 2010.
Cassandra’s query expression is limited in that there must be
                                                                                 [8]    Avinash Lakshman, Prashant Malik, “Cassandra: a decentralized
one dimension with an equal operator in the query expression;                           structured storage system,” SIGOPS Operating Systems Review, vol.
(2) The performance is poor. With the success of CCIndex                                44 issue 2. pp. 35-40. Apr. 2010
scheme in Apache HBase, this paper tries to study the                            [9]    Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan
                                                                                        Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan
feasibility of employing CCIndex to improve multi-
                                                                                        Sivasubramanian, Peter Vosshall, and Werner Vogels, “Dynamo:
dimensional range queries in DOTs like Cassandra.                                       amazon's highly available key-value store,” in Proceedings of 21st
   There are three mismatches between HBase and Cassandra                               ACM SIGOPS symposium on Operating systems principles, 2007.
when utilizing CCIndex for Cassandra, which imposes                              [10]   Ion Stoica, Robert Morris, David Karger, Frans Kaashoek, and Hari
                                                                                        Balakrishnan, “Chord: A scalable peer-to-peer lookup service for
challenges: (1) The smallest sorted unit is region in HBase
                                                                                        internet applications,” in Proceedings of the 2001 conference on
while it’s node in Cassandra, so the estimation method in                               Applications, Technologies, Architectures, and Protocols for Computer
HBase is not suitable for Cassandra; (2) The speed of range                             Communications, 2001.
query of Cassandra is not fast enough to accelerate the                          [11]   Pelops project. [Online]. Available. https://github.com/s7/scale7-pelops
CCIndex performance; (3) The APIs of HBase and Cassandra
are different.
   This paper proposes a new approach to estimate result size
and exposes the same CCIndex APIs for application to tackle
the first and the third mismatch. The speed of range query is
determined by Cassandra system, Cassandra could do some
optimization like prefetching and caching on adjacent records.
   The experimental results show that CCIndex gains 2.4 to
3.7 times performance over Cassandra’s index scheme with



                                                                           136

Weitere ähnliche Inhalte

Was ist angesagt?

Cooperative Demonstrable Data Retention for Integrity Verification in Multi-C...
Cooperative Demonstrable Data Retention for Integrity Verification in Multi-C...Cooperative Demonstrable Data Retention for Integrity Verification in Multi-C...
Cooperative Demonstrable Data Retention for Integrity Verification in Multi-C...Editor IJCATR
 
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...IRJET Journal
 
Ieeepro techno solutions 2013 ieee java project -building confidential and ...
Ieeepro techno solutions   2013 ieee java project -building confidential and ...Ieeepro techno solutions   2013 ieee java project -building confidential and ...
Ieeepro techno solutions 2013 ieee java project -building confidential and ...hemanthbbc
 
Graphlab Ted Dunning Clustering
Graphlab Ted Dunning  ClusteringGraphlab Ted Dunning  Clustering
Graphlab Ted Dunning ClusteringMapR Technologies
 
Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill MapR Technologies
 
IRJET - A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
IRJET -  	  A Secure AMR Stganography Scheme based on Pulse Distribution Mode...IRJET -  	  A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
IRJET - A Secure AMR Stganography Scheme based on Pulse Distribution Mode...IRJET Journal
 
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...Samsung Business USA
 
Reactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeReactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeSumant Tambe
 
IEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and AbstractIEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and Abstracttsysglobalsolutions
 
Deploying OpenStack Private Cloud on NEC DX1000 MicroServer Chassis
Deploying OpenStack Private Cloud on NEC DX1000 MicroServer ChassisDeploying OpenStack Private Cloud on NEC DX1000 MicroServer Chassis
Deploying OpenStack Private Cloud on NEC DX1000 MicroServer ChassisPrincipled Technologies
 

Was ist angesagt? (12)

Cooperative Demonstrable Data Retention for Integrity Verification in Multi-C...
Cooperative Demonstrable Data Retention for Integrity Verification in Multi-C...Cooperative Demonstrable Data Retention for Integrity Verification in Multi-C...
Cooperative Demonstrable Data Retention for Integrity Verification in Multi-C...
 
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
 
Ieeepro techno solutions 2013 ieee java project -building confidential and ...
Ieeepro techno solutions   2013 ieee java project -building confidential and ...Ieeepro techno solutions   2013 ieee java project -building confidential and ...
Ieeepro techno solutions 2013 ieee java project -building confidential and ...
 
Graphlab Ted Dunning Clustering
Graphlab Ted Dunning  ClusteringGraphlab Ted Dunning  Clustering
Graphlab Ted Dunning Clustering
 
Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill
 
IRJET - A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
IRJET -  	  A Secure AMR Stganography Scheme based on Pulse Distribution Mode...IRJET -  	  A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
IRJET - A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
 
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
 
Reactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeReactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/Subscribe
 
IEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and AbstractIEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and Abstract
 
ME Synopsis
ME SynopsisME Synopsis
ME Synopsis
 
Deploying OpenStack Private Cloud on NEC DX1000 MicroServer Chassis
Deploying OpenStack Private Cloud on NEC DX1000 MicroServer ChassisDeploying OpenStack Private Cloud on NEC DX1000 MicroServer Chassis
Deploying OpenStack Private Cloud on NEC DX1000 MicroServer Chassis
 
169 s170
169 s170169 s170
169 s170
 

Ähnlich wie Cc index for cassandra a novel scheme for multidimensional range queries in cassandra

Iaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasetsIaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasetsIaetsd Iaetsd
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGijiert bestjournal
 
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEM
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEMEFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEM
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEMNexgen Technology
 
Cb pattern trees identifying
Cb pattern trees  identifyingCb pattern trees  identifying
Cb pattern trees identifyingIJDKP
 
No sql query processing system for wireless ad hoc and sensor networks
No sql query processing system for wireless ad hoc and sensor networksNo sql query processing system for wireless ad hoc and sensor networks
No sql query processing system for wireless ad hoc and sensor networksJoão Gabriel Lima
 
Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engine
Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query EngineMeasuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engine
Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engineparekhnikunj
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationDATAVERSITY
 
Modified Pure Radix Sort for Large Heterogeneous Data Set
Modified Pure Radix Sort for Large Heterogeneous Data Set Modified Pure Radix Sort for Large Heterogeneous Data Set
Modified Pure Radix Sort for Large Heterogeneous Data Set IOSR Journals
 
access.2021.3077680.pdf
access.2021.3077680.pdfaccess.2021.3077680.pdf
access.2021.3077680.pdfneju3
 
JovianDATA MDX Engine Comad oct 22 2011
JovianDATA MDX Engine Comad oct 22 2011JovianDATA MDX Engine Comad oct 22 2011
JovianDATA MDX Engine Comad oct 22 2011Satya Ramachandran
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET Journal
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAijfcstjournal
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAijfcstjournal
 
Beginner's guide to Mongodb and NoSQL
Beginner's guide to Mongodb and NoSQL  Beginner's guide to Mongodb and NoSQL
Beginner's guide to Mongodb and NoSQL Maulin Shah
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 

Ähnlich wie Cc index for cassandra a novel scheme for multidimensional range queries in cassandra (20)

Iaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasetsIaetsd mapreduce streaming over cassandra datasets
Iaetsd mapreduce streaming over cassandra datasets
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
 
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEM
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEMEFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEM
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEM
 
Cb pattern trees identifying
Cb pattern trees  identifyingCb pattern trees  identifying
Cb pattern trees identifying
 
No sql query processing system for wireless ad hoc and sensor networks
No sql query processing system for wireless ad hoc and sensor networksNo sql query processing system for wireless ad hoc and sensor networks
No sql query processing system for wireless ad hoc and sensor networks
 
Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engine
Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query EngineMeasuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engine
Measuring Resources & Workload Skew In Micro-Service MPP Analytic Query Engine
 
FULLTEXT02
FULLTEXT02FULLTEXT02
FULLTEXT02
 
Cr25555560
Cr25555560Cr25555560
Cr25555560
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL Migration
 
Modified Pure Radix Sort for Large Heterogeneous Data Set
Modified Pure Radix Sort for Large Heterogeneous Data Set Modified Pure Radix Sort for Large Heterogeneous Data Set
Modified Pure Radix Sort for Large Heterogeneous Data Set
 
C1803041317
C1803041317C1803041317
C1803041317
 
access.2021.3077680.pdf
access.2021.3077680.pdfaccess.2021.3077680.pdf
access.2021.3077680.pdf
 
JovianDATA MDX Engine Comad oct 22 2011
JovianDATA MDX Engine Comad oct 22 2011JovianDATA MDX Engine Comad oct 22 2011
JovianDATA MDX Engine Comad oct 22 2011
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
 
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRAA NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
A NOVEL APPROACH FOR HOTEL MANAGEMENT SYSTEM USING CASSANDRA
 
Why Cassandra?
Why Cassandra?Why Cassandra?
Why Cassandra?
 
C0312023
C0312023C0312023
C0312023
 
Beginner's guide to Mongodb and NoSQL
Beginner's guide to Mongodb and NoSQL  Beginner's guide to Mongodb and NoSQL
Beginner's guide to Mongodb and NoSQL
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 

Mehr von João Gabriel Lima

Deep marketing - Indoor Customer Segmentation
Deep marketing - Indoor Customer SegmentationDeep marketing - Indoor Customer Segmentation
Deep marketing - Indoor Customer SegmentationJoão Gabriel Lima
 
Aplicações de Alto Desempenho com JHipster Full Stack
Aplicações de Alto Desempenho com JHipster Full StackAplicações de Alto Desempenho com JHipster Full Stack
Aplicações de Alto Desempenho com JHipster Full StackJoão Gabriel Lima
 
Realidade aumentada com react native e ARKit
Realidade aumentada com react native e ARKitRealidade aumentada com react native e ARKit
Realidade aumentada com react native e ARKitJoão Gabriel Lima
 
Big data e Inteligência Artificial
Big data e Inteligência ArtificialBig data e Inteligência Artificial
Big data e Inteligência ArtificialJoão Gabriel Lima
 
Mineração de Dados no Weka - Regressão Linear
Mineração de Dados no Weka -  Regressão LinearMineração de Dados no Weka -  Regressão Linear
Mineração de Dados no Weka - Regressão LinearJoão Gabriel Lima
 
Segurança na Internet - Estudos de caso
Segurança na Internet - Estudos de casoSegurança na Internet - Estudos de caso
Segurança na Internet - Estudos de casoJoão Gabriel Lima
 
Segurança na Internet - Google Hacking
Segurança na Internet - Google  HackingSegurança na Internet - Google  Hacking
Segurança na Internet - Google HackingJoão Gabriel Lima
 
Segurança na Internet - Conceitos fundamentais
Segurança na Internet - Conceitos fundamentaisSegurança na Internet - Conceitos fundamentais
Segurança na Internet - Conceitos fundamentaisJoão Gabriel Lima
 
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...João Gabriel Lima
 
Mineração de dados com RapidMiner + WEKA - Clusterização
Mineração de dados com RapidMiner + WEKA - ClusterizaçãoMineração de dados com RapidMiner + WEKA - Clusterização
Mineração de dados com RapidMiner + WEKA - ClusterizaçãoJoão Gabriel Lima
 
Mineração de dados na prática com RapidMiner e Weka
Mineração de dados na prática com RapidMiner e WekaMineração de dados na prática com RapidMiner e Weka
Mineração de dados na prática com RapidMiner e WekaJoão Gabriel Lima
 
Visualizacao de dados - Come to the dark side
Visualizacao de dados - Come to the dark sideVisualizacao de dados - Come to the dark side
Visualizacao de dados - Come to the dark sideJoão Gabriel Lima
 
REST x SOAP : Qual abordagem escolher?
REST x SOAP : Qual abordagem escolher?REST x SOAP : Qual abordagem escolher?
REST x SOAP : Qual abordagem escolher?João Gabriel Lima
 
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...João Gabriel Lima
 
E-trânsito cidadão - IPVA em suas mãos
E-trânsito cidadão - IPVA em suas mãosE-trânsito cidadão - IPVA em suas mãos
E-trânsito cidadão - IPVA em suas mãosJoão Gabriel Lima
 
[Estácio - IESAM] Automatizando Tarefas com Gulp.js
[Estácio - IESAM] Automatizando Tarefas com Gulp.js[Estácio - IESAM] Automatizando Tarefas com Gulp.js
[Estácio - IESAM] Automatizando Tarefas com Gulp.jsJoão Gabriel Lima
 
Hackeando a Internet das Coisas com Javascript
Hackeando a Internet das Coisas com JavascriptHackeando a Internet das Coisas com Javascript
Hackeando a Internet das Coisas com JavascriptJoão Gabriel Lima
 

Mehr von João Gabriel Lima (20)

Cooking with data
Cooking with dataCooking with data
Cooking with data
 
Deep marketing - Indoor Customer Segmentation
Deep marketing - Indoor Customer SegmentationDeep marketing - Indoor Customer Segmentation
Deep marketing - Indoor Customer Segmentation
 
Aplicações de Alto Desempenho com JHipster Full Stack
Aplicações de Alto Desempenho com JHipster Full StackAplicações de Alto Desempenho com JHipster Full Stack
Aplicações de Alto Desempenho com JHipster Full Stack
 
Realidade aumentada com react native e ARKit
Realidade aumentada com react native e ARKitRealidade aumentada com react native e ARKit
Realidade aumentada com react native e ARKit
 
JS - IA
JS - IAJS - IA
JS - IA
 
Big data e Inteligência Artificial
Big data e Inteligência ArtificialBig data e Inteligência Artificial
Big data e Inteligência Artificial
 
Mineração de Dados no Weka - Regressão Linear
Mineração de Dados no Weka -  Regressão LinearMineração de Dados no Weka -  Regressão Linear
Mineração de Dados no Weka - Regressão Linear
 
Segurança na Internet - Estudos de caso
Segurança na Internet - Estudos de casoSegurança na Internet - Estudos de caso
Segurança na Internet - Estudos de caso
 
Segurança na Internet - Google Hacking
Segurança na Internet - Google  HackingSegurança na Internet - Google  Hacking
Segurança na Internet - Google Hacking
 
Segurança na Internet - Conceitos fundamentais
Segurança na Internet - Conceitos fundamentaisSegurança na Internet - Conceitos fundamentais
Segurança na Internet - Conceitos fundamentais
 
Web Machine Learning
Web Machine LearningWeb Machine Learning
Web Machine Learning
 
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...
 
Mineração de dados com RapidMiner + WEKA - Clusterização
Mineração de dados com RapidMiner + WEKA - ClusterizaçãoMineração de dados com RapidMiner + WEKA - Clusterização
Mineração de dados com RapidMiner + WEKA - Clusterização
 
Mineração de dados na prática com RapidMiner e Weka
Mineração de dados na prática com RapidMiner e WekaMineração de dados na prática com RapidMiner e Weka
Mineração de dados na prática com RapidMiner e Weka
 
Visualizacao de dados - Come to the dark side
Visualizacao de dados - Come to the dark sideVisualizacao de dados - Come to the dark side
Visualizacao de dados - Come to the dark side
 
REST x SOAP : Qual abordagem escolher?
REST x SOAP : Qual abordagem escolher?REST x SOAP : Qual abordagem escolher?
REST x SOAP : Qual abordagem escolher?
 
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...
 
E-trânsito cidadão - IPVA em suas mãos
E-trânsito cidadão - IPVA em suas mãosE-trânsito cidadão - IPVA em suas mãos
E-trânsito cidadão - IPVA em suas mãos
 
[Estácio - IESAM] Automatizando Tarefas com Gulp.js
[Estácio - IESAM] Automatizando Tarefas com Gulp.js[Estácio - IESAM] Automatizando Tarefas com Gulp.js
[Estácio - IESAM] Automatizando Tarefas com Gulp.js
 
Hackeando a Internet das Coisas com Javascript
Hackeando a Internet das Coisas com JavascriptHackeando a Internet das Coisas com Javascript
Hackeando a Internet das Coisas com Javascript
 

Kürzlich hochgeladen

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Cc index for cassandra a novel scheme for multidimensional range queries in cassandra

  • 1. 2011 Seventh International Conference on Semantics, Knowledge and Grids CCIndex for Cassandra: A Novel Scheme for Multi- dimensional Range Queries in Cassandra Chen Feng#1, Yongqiang Zou*2, Zhiwei Xu#3 # Institute of Computing Technology, Chinese Academy of Sciences Beijing, 100190, China 1 fengchen8086@gmail.com 3 zxu@ict.ac.cn * Tencent Corporation Beijing, 100190, China 2 aaronzou@tencent.com Abstract—Multi-dimensional range queries are fundamental their limited support for queries on non-primary keys leads to requirements in large scale Internet applications using poor performance in multi-dimensional range queries Distributed Ordered Tables. Apache Cassandra is a Distributed involving non-primary keys. Ordered Table when it employs order-preserving hashing as data CCIndex [7], short for Complemental Clustering Index, is partitioner. Cassandra supports multi-dimensional range queries proposed to support multi-dimensional range queries over with poor performance and with a limitation that there must be DOTs for high performance, low space overhead, and high one dimension with an equal operator. Based on the success of CCIndex scheme in Apache HBase, this paper tries to answer the reliability. CCIndex has been implemented on HBase and question: Can CCIndex benefit multi-dimensional range queries gains 11.4 times scan efficiency over non-primary columns. in DOTs like Cassandra? The Apache Cassandra [8] is a highly scalable distributed This paper studies the feasibility of employing CCIndex in database with fully distributed design like Dynamo [9] and Cassandra, proposes a new approach to estimate result size, column family data model of BigTable. Cassandra is a implements CCIndex in Cassandra including recovery Distributed Ordered Tables rather than Distributed Hash mechanisms and studies the pros and cons of CCIndex for Tables (DHT) when it employs order-preserving hashing as different DOTs. Experimental results show that CCIndex gains data partitioner, instead of consistency hashing. Cassandra 2.4 to 3.7 times efficiency over Cassandra’s index scheme with partitions primary keys into nodes in a circle overlay as in 1% to 50% selectivity for 2 million records. This paper shows that CCIndex is a general approach for DOTs, and could gain Chord [10], replicates keys for performance and reliability, better performance for DOTs which perform scan tasks much and supports range queries on keys. Cassandra supports multi- faster than random read. This paper reveals that Cassandra is dimensional range queries since version 0.7 with a limitation optimized for hash tables rather than ordered tables in that there must be one dimension with an equal operator in the performing read and range queries. query expression, which hinders the broad usage of these queries. Multi-dimensional range queries of Cassandra’s index I. INTRODUCTION schema also encounter the efficiency problem when applied in Multi-dimensional range queries are fundamental the HBase IndexedTable. When responding to the queries, the requirements in large scale Internet applications and gained system must first scan the secondary index to get primary keys, more and more attentions in Distributed Ordered Tables and then issue multiple random reads to get real data. (DOTs) [1,2] like BigTable [3], PNUTS [4], and HBase [5] in Can CCIndex benefit multi-dimensional range queries in recent years. DOTs like Cassandra to support multi-dimensional range Multi-dimensional range queries mean queries with less queries without such limitations and with better performance? than operator or greater than operator on multiple table This paper studies the feasibility of employing CCIndex to columns. For example, a query for yesterday’s hot photos support multi-dimensional range queries in Cassandra. We written in SQL is like “select * from photos where identify three differences between HBase and Cassandra when hit_counts > 100000 and create_time > now() - 86400”. When utilizing CCIndex: (1) The smallest sorted unit is region in modeling resources in physical or cyber space as a multi- HBase while it’s node in Cassandra; (2) The speed of range dimensional classification space as in Probabilistic Resource query in Cassandra is not fast enough to accelerate the Space Model (P-RSM) [6], multi-dimensional range queries CCIndex performance; (3) The APIs of HBase and Cassandra are basic operations. are different. This paper proposes a new approach to estimate As the data scale grows, Distributed Ordered Tables are result size by data distribution information, implements adopted in more and more applications to store and query CCIndex in Cassandra, studies the pros and cons of CCIndex structured data for outstanding performance, reliability, and for different DOTs styles, and reveals more performance scalability. Naturally, Distributed Ordered Tables can support issues of Cassandra. point queries and range queries on primary key. However, The contributions of this paper are summarized as follows. 978-0-7695-4515-8/11 $26.00 © 2011 IEEE 130 DOI 10.1109/SKG.2011.28
  • 2. 1. This paper employs CCIndex to support multi- CCIndex creates all ComplementalTables and CCTs when dimensional range queries overcoming the limitations of the OriginalTable is created. CCIndex maintains the index by Cassandra. The results show that CCIndex gains 2.4 times the procedures of inserting and deleting. performance over Cassandra’s index scheme with 1% selectivity, and about 3.7 times performance when the selectivity is 50% for 2 million records. 2. This paper shows that CCIndex is a general approach for DOTs, which could gain better performance for DOTs with slow random read and fast sequential read. This paper shows that CCIndex improves query performance by about 2 times on DOTs with fast random read, and achieves an order of magnitude times performance improvement for the DOTs whose random read is significantly slower than sequential read or scan, such as HBase. This paper implements the CCIndex recovery mechanism indicates that the efficiency of CCIndex recovery is 33% of that of sequential write for Cassandra. 3. This paper reveals that Cassandra is optimized for hash tables rather than ordered tables. Cassandra provides both consistency hashing and order-preserving hashing, while the read and scan operations are not optimized for order- preserving hashing, such as considering pre-fetch for read, and optimizing scan for range queries over ordered tables. Cassandra’s strategy is good for hash tables, but inefficient for ordered tables. This paper is organized as follows. Section 2 gives the background. Section 3 illustrates the design and Fig. 1 Data layout of CCIndex. implementation for CCIndex in Cassandra. Section 4 shows The procedure of writing is shown as Fig. 2. When writing the experimental results and the discussion on the results. a record into OriginalTable, CCIndex reads the OriginalTable Section 5 concludes the whole work. by rowkey to get the old values, checks whether the index values are going to be modified, and then deletes records form II. BACKGROUND corresponding CCITs and CCTs when updating index values. A. CCIndex Analysis After that, CCIndex writes the records to all CCITs and CCTs. CCIndex is proposed to support multi-dimensional range When deleting a record, CCIndex reads all index values from queries over DOTs by reorganizing data. CCIndex introduces OriginalTable and deletes records from all CCITs and CCTs. a ComplementalTable for each index column. A ComplementalTable stores all columns except the rowkey and the corresponding index column. The ComplementalTable rowkey is a concatenation of the index column value, the original rowkey, and the length of index column value. The way of generating the rowkey of ComplementalTable ensures that all the rowkeys are unique and sorted by index column and the original rowkey. The OriginalTable and the ComplementalTables are called Complemental Clustering Index Table (CCIT). CCIT sets the replica factor to 1 to decrease the storage overhead. CCIndex maintains the reliability of a CCIT by other CCITs and introduces a replicated CCT (Complemental Check Table) for each CCIT Fig. 2 The procedure of writing. to help data recovery. In Fig. 1, there is an OriginalTable (CCIT0) with a primary The procedure of multi-dimensional range queries is shown id and two index columns weight and height. CCIT-W and as Fig. 3. CCIndex estimates result size for each query CCIT-H (ComplementalTable) are ordered by key1 and key2 condition and selects the condition with the smallest result respectively. With these CCITs, range queries over id, weight, size to execute range query on corresponding CCIT. CCIndex or height can be converted to range queries on CCIT0, CCIT- employs other conditions to filter the result got by range query W or CCIT-H. and returns the ultimate results of multi-dimensional range CCT stores the rowkey and all index columns of a CCIT. queries. CCTs are replicated while the CCITs are not replicated. 131
  • 3. ratio of CCIndex to IndexedTable is determined by the speed ratio of range query to random read. B. Cassandra Analysis Cassandra organizes nodes as a ring overlay like Chord to partition data. Each node manages a part of data in the ring, with data id from previous node token to this node token. Records use the same partitioner to map its key to the token ring. Corresponding node writes records to commitlog and then to its memtable. Memtable is a memory structure contains sorted rows. Memtable is flushed to an SSTable on disk when it is full. SSTable is a sorted structure flushed one by one and cannot be modified once flushed, so that records between multiple SSTables are not sorted as in Fig. 5. Cassandra combines several old SSTables into a new SSTable by compaction to reduce the SSTable number. Each node contains more than Fig. 3 The procedure of multi-dimensional range queries. one SSTable in most cases. CCIndex for HBase uses a simple way to estimate the result size. In HBase, HMaster stores region-to-server mapping information as in Fig. 4. The mapping information can be described as a set of <startKey-regionServer>, ordered by startKey. CCIndex finds the regions covered by each range query and estimates the result size by the region number. When HBase has more than 1 region and has max region size Smax, each region size must be greater than Smax/2 and less than Smax. Thus CCIndex considers the result size depends on the region number covered. Fig. 5 An example of memtable and SSTables in a node. Like Dynamo, Cassandra keeps strong consistency if W + R > N, where W and R indicates respectively the minimum number of nodes that have executed write and read operation successfully, and N is the number of replication factor. Cassandra uses different ConsistencyLevels to keep the balance between consistency and availability. In writing, ConsistencyLevel.ONE and QUORUM ensure that the write Fig. 4 The region-to-server mapping of HBase. operation has been executed successfully on at least 1 and N / In HBase, the speed of scan is 8.2 times of random read. 2 + 1 node(s). In reading, ONE returns the record responded The speed of multi-dimensional range query on CCIndex is by the fastest node and QUORUM returns the record in 11.4 times of IndexedTable. majority of most recent records from at least N / 2 + 1 nodes. The performance of CCIndex is affected by 2 issues: Comparing with ONE, QUORUM has higher latency while The accuracy of result size estimation. The more maintaining the consistency. accurate the estimation is, the less unnecessary Cassandra version 0.7+ provides APIs to execute multi- records will be scanned. dimensional range queries. But there is a limitation that the The speed ratio of range query to random read. To APIs require at least one equal operator on a configured index execute a multi-dimensional range query, CCIndex column in the query expression. Cassandra also provides APIs executes range query on a CCIT and then filters the to execute the range query over rowkey, but the speed of result. IndexedTable executes range query on an index range query is only 1.3 times of random read. table to get original rowkeys, and then gets the records In summary, there are three issues of mismatches between by random read on those rowkeys. Thus the speed HBase and Cassandra, which impose challenges when utilizing CCIndex for Cassandra. 132
  • 4. 1) The smallest sorted unit is region in HBase while it’s CCIndex encapsulates APIs of HBase and Cassandra, and node in Cassandra: In HBase, regions are sorted by the exposes the same CCIndex APIs for applications. rowkey of records. In Cassandra, records are stored in SSTables and sorted between nodes, while multiple SSTables D. Data recovery in the same node are not sorted. The difference decreases the CCIndex introduces replicated CCT to help recover the accuracy of estimating result size. damaged data. This paper implements the data recovery 2) The speed of range query: Cassandra executes range module with CCT in Cassandra. query by logical scan, traversing all SSTables to find the To recover a record of OriginalTable, CCIndex first reads ‘next’ record, while HBase executes physical scans on regions. CCTs by rowkey to get all index columns. Then CCIndex 3) The differences between HBase and Cassandra on APIs: concatenates the original rowkey and the index column value To implement CCIndex for Cassandra, the API issue must be to form the rowkey of a certain ComplementalTable. CCIndex considered, namely how to utilize the different APIs given by tries to read the record by the concatenated rowkey and write HBase and Cassandra and unify the APIs CCIndex providing the corresponding record into OriginalTable. If the recovery to the application level. fails, CCIndex tries to recover data by another ComplementalTable. III. DESIGN AND IMPLEMENTATION To recover a record on ComplementalTable, CCIndex gets CCIndex for Cassandra uses different methods to deal with the rowkey of OriginalTable by splitting the given rowkey. the differences when utilizing CCIndex for Cassandra. Then CCIndex tries to read the record from OriginalTable. If the reading operation fails, CCIndex uses other index column A. The smallest sorted unit issue. values got from CCT to recover data by other As record size between nodes might be unbalanced, the ComplementalTables. way which CCIndex for HBase uses to estimate result size by To recover a certain range of table, CCIndex scans covered region number cannot work on Cassandra. This paper corresponding CCT, and uses the methods above to recover uses a different way to estimate result size, which lies on data records one by one. A range can be split into several parts for distribution information of Cassandra. multi-thread recovery to increase efficiency. 1) Data distribution information gathering: CCIndex for E. Implementation Cassandra first adds an API in CassandraClient to gather SSTable information of a certain node, and then adds a CCIndex for Cassandra prototype uses Cassandra v0.7.2 as daemon thread Listener in CassandraDaemon. Listener gets code bases and is written in Java. token ring information from StorageService every other As replica factor of Cassandra associates with keyspace, it minute. With token-IP mapping, Listener uses the API above is easy for CCIndex for Cassandra to replicate CCTs by to get SSTable information from every node. Thus each node putting CCTs into a separate keyspace with replica factor 3. saves the data distribution information of all nodes. Cassandra CCIndex sets keyspace replica factor to 1 for CCIT, and kernel code is modified without performance degradation. creates one ComplementalTable for each index column. 2) The estimation of result size: CCIndex client uses a thread Refiner to get data distribution information and token ring information from Listener, then CCIndex estimates result size for every query condition: • Calculate the nodes covered by range. Count the node number as N3, • For every node covered, read the SSTable data file total size S, and file number C, • Summarize the total size of S, C for all nodes, get N1, N2. Each search condition has a tuple [N1, N2, N3]. N1 has higher priority than N2, and N2 has higher priority than N3. CCIndex for Cassandra executes range queries on corresponding CCIT which has the smallest tuple. B. The speed of range query The speed of range query is determined by Cassandra system. The aim of CCIndex for Cassandra is to implement Fig. 6 The architecture of CCIndex for Cassandra. CCIndex while making as few changes as possible. The low speed of range query affects the speed of multi-dimensional CCIndex for Cassandra client connects with a server node range queries but does not restrict the implementation. to perform operations like inserting, reading and range query. As Fig. 6 shows, CCIndex for Cassandra uses a connection C. The API issue 133
  • 5. pool extends from Pelops [11]. The connection pool assigns a not have enough replicas for CCIT. When N changes from 2 random connection to each client to avoid hot spot issue. to 4 and Ls/L changes from 1/30 to 1/10, the overhead ratio The client gets the token ring and data distribution changes from 10% to 116.7%. information by sending a query to a certain node to estimate the query result size. B. Experiment Setup This paper introduces a benchmark to evaluate the basic IV. EVALUATION operations throughput, including sequential read/write, CCIndex for Cassandra is implemented and evaluated random read, and range query. The workload uses a table with through analysis and experiments. columns rowkey, index1, index2, index3 and data. The length of rowkey, index1, index2 and index3 are 10 bytes while the A. Space Overhead Analysis data column is 1 KB. The throughput is defined as rows per For the given metrics, the performance is easy to be second for all clients. evaluated through experiments. As to the space overhead, CCIndex builds index for index1, index2, and index3, theoretical analysis is more suitable. ConsistencyLevel for CCIT is ONE, and is QUORUM for Here we denote the number of index columns by N, the CCT. replica factor of Original Cassandra and CCT by R, the Original Cassandra and Cassandra Indexed set replica to 3 average length of the key and all index columns by Ls, and the and ConsistencyLevel to QUORUM. Original Cassandra does total length of record by L. not build index. Cassandra Indexed builds index for index1, In Original Cassandra, the space for every record is: index2, and index3. SORG = L * R The experimental cluster has 5 nodes. Each node has two (1) 1.8 GHz dual-cores AMD Opteron (tm) Processor270, with 4 In CCIndex, the space for each record is the CCITs plus GB memory. Each node in the cluster has 321 GB RAID5 CCTs. The space for CCITs is: SCSI disks. All nodes are connected by Gigabits Ethernet. SCCIT = L *( N + 1) (2) Each node uses Red Hat CentOS release 5.3 (kernel 2.6.18), The space for CCT is: ext3 file system, Sun JDK1.6.0_14. The test runs on another SCCT = Ls *( N + 1)* R client machine, which has two 2.0 GHz Intel(R) Core(TM) (3) Duo T5750 Processor , with 3 GB memory, Broadcom The total space for CCIndex is: Netlink(TM) fast Ethernet 100M bps. The client uses Ubuntu SCC = SCCIT + SCCT = ( N + 1)( L + Ls * R) (4) 10.04LTS, ext3 file system, Sun JDK 1.6.0_14. The space overhead ratio of CCIndex to Original Cassandra The workload in the experiments has 2 million rows; the is: token of each node is initialized manually to keep load SCC / SORG − 1 = ( N + 1) / R + ( N + 1)* Ls / L − 1 balance. Each test runs three times to report the average value. (5) The client uses 25 concurrent threads for sequential write, In Cassandra, the replica number R is often set to 3. The sequential read, random read and range query, and uses 1 radio is: thread for multi-dimensional range queries. ( N + 1) / 3 + ( N + 1)* Ls / L − 1 (6) C. Experiment Result Equation (6) can be plotted as Fig. 7. The result in Fig. 8 shows that ConsistencyLevel has great effect on every test, which can be confirmed by the great differences between the throughput of Cassandra(1) and Cassandra(3) or Cassandra Indexed(1) and Cassandra Indexed(3). The throughput of sequential write for CCIndex is significantly lower than the Cassandra Indexed and much lower than the Original Cassandra, because maintaining index needs extra random read to get row data from OriginalTable, and if there are old index column values, further delete operations are needed to update index. The performance of Original Cassandra(3) and Cassandra Indexed(3) on range query, random read, and sequential read Fig. 7 The space overhead ratio of CCIndex to Original Cassandra. Using are nearly identical due to the same implementation. They are L/Ls values as the horizontal axis. lower than that of CCIndex because of ConsistencyLevel, From Fig. 7, the overhead ratio drop significantly as the which can be confirmed by the fact that Original Cassandra(1) Ls/L decreases and the N decreases, which indicates that to and Cassandra Indexed(1) have nearly the same throughput avoid huge space overhead, there should be less index with CCIndex. columns in CCIndex and the data length of index columns should be shorter. When N is smaller than 2, CCIndex would 134
  • 6. CCIndex increases to 3.7 times that of Cassandra Indexed(3). In the experiment, CCIndex is about 1.8 to 2.7 times as fast as Cassandra Indexed(1). In another test on Cassandra Indexed, when MAXVALUE is 100 and the query expression is 0 < index1 < 10000, 0 < index2 < 10000 and index3 = 0, exception happens every time in all 10 attempts while CCIndex performs well. We consider it happens when many records are discarded by the non-equal columns ranges. The throughput of recovery is 1819 records/s in average in Fig. 10. To recover one record, CCIndex first executes range query on CCT, writes on CCIT, and random reads on CCIT. The CCT range query speed is 6013 records/s, while the write speed on CCIT is 4778 records/s and the random read speed on CCIT is 4797 records/s. The recovery speed is 1964.7 Fig. 8 Basic Operations for Original Cassandra, Cassandra Indexed and records/s in theory. Comparing with 1819 records/s in practice, CCIndex. Cassandra(1) is Cassandra with 1 replica and ConsistencyLevel is the recover speed matches the theoretical analysis. ONE. Cassandra(3) is Cassandra with 3 replica and ConsistencyLevel is QUORUM. Cassandra Indexed builds index for index columns. In this experiment, N is 4,Ls/L is 1/30, CCIndex uses 46% more space than Original Cassandra(3) in theory. The result shows that Original Cassandra(3) uses 1.39 GB per node while CCIndex uses 2.12 GB per node, which has 52.6% space overhead. Because there are memtables not flushed in memory, we consider the storage overhead confirms the theoretical analysis. The tests of multi-dimensional range query writes records with index1 and index2 whose value is randomly generated from 0 to 2 million and index3 is randomly generated from 0 to MAXVALUE. In this way, the test could use expression 0 < index1 < 2000000 and 0 < index2 < 2000000 and index3 = Fig. 10 CCIndex recovery speed. 0 to match the requirement of Cassandra API. The MAXVALUE of index3 is set from 100 to 1 to change the D. Discussion selectivity from 1% to 100%. The results provide many insights on CCIndex and The results of multi-dimensional range query test on Cassandra. different conditions are shown as Fig. 9. When the selectivity 1) Overall, the results show that CCIndex is a general is under 10%, Cassandra Indexed performs well, but when the approach for DOTs, successfully in improving both selectivity raises from 20% to 100%, the latency increases performance and query expressiveness. significantly. 2) The results show that in Cassandra, the sequential read and random read are the same in throughput and the range query throughput is only 1.3 times as fast as random read. But if a client sets Cassandra’s partitioner to OrderedPartitioner, it suggests that the client is probably willing to use some special operations on ordered table such as sequential read and range query. Cassandra could do some optimization like prefetching and caching on adjacent records. 3) CCIndex is suitable for tables with 2 to 4 index columns. CCIndex cannot guarantee the reliability with fewer than 2 index columns because the CCITs are not replicated. If there are more than 4 index columns, the space overhead is more than 2 times of the Original Cassandra. When a table has more than 4 columns with query requirements, a solution is to build index for 2 to 4 most frequently used columns, and to filter the Fig. 9 Throughput of multi-dimensional range queries by CCIndex , result by non-indexed conditions in applications. Cassandra Indexed(1) and Cassandra Indexed(3) 4) The throughput of CCIndex is determined by the ratio of The throughput ratio of CCIndex to Cassandra Indexed(3) range query to random read. This explains why the throughput is at least 2.4. When the selectivity grows, the throughput of of CCIndex for Cassandra is 2.4 to 3.7 times to Cassandra 135
  • 7. Indexed(3), while the throughput of CCIndex for HBase is 1% to 50% selectivity for 2 million records. This paper shows 11.4 times to that of IndexedTable. CCIndex converts random that CCIndex is a general approach for DOTs, and could gain read on OriginalTable to range query on CCIT, so its better performance on multi-dimensional range queries for performance is associated with the speed improvement from DOTs with slow random read and fast sequential read. This random read to range query. paper implements the CCIndex recovery mechanism and show During the procedure of multi-dimensional range query, that CCIndex recovery performance is 33% of that for IndexedTable executes range query and random read for every sequential write in Cassandra. This paper reveals that record before filtering while CCIndex only needs to execute Cassandra is optimized for hash tables rather than ordered range query for one time. tables in read and range queries. Cassandra could do some We denote the speed of range query by Ss, and the speed of optimizing like prefetching and caching on adjacent records. random read by Sr. The speed for CCIndex to get records is: ACKNOWLEDGMENT Scc = S s (7) This work is supported in part by the Hi-Tech Research and Development (863) Program of China (Grant No. The speed for IndexedTable is: 2006AA01A106), and the major national science and Si = 1/ (1/ S s + 1/ S r ) = S s * Sr / ( S s + Sr ) (8) technology special projects (2010ZX03004-003-03). The ratio of CCIndex to IndexedTable is: Scc / Si = ( S s + Sr ) / Sr = 1 + S s / Sr REFERENCES (9) [1] Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava, Erik Vee, So the ratio of CCIndex to IndexedTable is decided by the Ramana Yerneni, and Raghu Ramakrishnan, “Efficient bulk insertion value of Ss / Sr. For HBase, Ss / Sr is equal to 8.2 and Scc / Si is into a distributed ordered table,” in Proceedings of the 2008 ACM equal to 9.2. As there’s no optimization on query, SIGMOD International conference on Management of Data, 2008. [2] Ymir Vigfusson, Adam Silberstein, Brian F. Cooper, Rodrigo Fonseca, IndexedTable filters more records as candidate results. So the “Adaptively parallelizing distributed range queries,” in Proc. VLDB final ratio of CCIndex to IndexedTable on multi-dimensional Endow., vol. 2, pp. 682–693. VLDB Endowment (2009) range queries, 11.4, meets the analysis. [3] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, From Fig.9, the throughput of CCIndex is 1.9 and 2.4 times Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, “Bigtable: a distributed storage system for to Cassandra Indexed(1) and Cassandra Indexed(3) structured data,” in 7th USENIX Symposium on Operating Systems respectively. CCIndex performs the same with Cassandra Design and Implementation, 2006. Indexed(1) in random read and scan. [4] Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam From Fig.8 Ss / Sr is equal to 1.2 on Cassandra Indexed(1), Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni, “PNUTS: Yahoo!'s hosted data serving and CCIndex takes more time to filter the result, the final ratio platform,” in Proc. VLDB Endow. vol. 1, pp. 1277--1288. 2008 1.9 is close to the predicted value 2.2. [5] Apache HBase project. [Online]. Available: http://hbase.apache.org/. [6] Hai Zhuge, "Probabilistic Resource Space Model for Managing V. CONCLUSIONS Resources in Cyber-Physical Society," IEEE Transactions on Services Computing, vol. 99, no. PrePrints, 2011 Cassandra is a Distributed Ordered Table supporting multi- [7] Yongqiang Zou, Jia Liu, Shicai Wang, Li Zha, and Zhiwei Xu, dimensional range queries. However, current design and “CCIndex: a Complemental Clustering Index on Distributed Ordered implementation of Cassandra have two problems: (1) Tables for Multi-dimensional Range Queries,” in 7th IFIP International Conference on Network and Parallel Computing, 2010. Cassandra’s query expression is limited in that there must be [8] Avinash Lakshman, Prashant Malik, “Cassandra: a decentralized one dimension with an equal operator in the query expression; structured storage system,” SIGOPS Operating Systems Review, vol. (2) The performance is poor. With the success of CCIndex 44 issue 2. pp. 35-40. Apr. 2010 scheme in Apache HBase, this paper tries to study the [9] Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan feasibility of employing CCIndex to improve multi- Sivasubramanian, Peter Vosshall, and Werner Vogels, “Dynamo: dimensional range queries in DOTs like Cassandra. amazon's highly available key-value store,” in Proceedings of 21st There are three mismatches between HBase and Cassandra ACM SIGOPS symposium on Operating systems principles, 2007. when utilizing CCIndex for Cassandra, which imposes [10] Ion Stoica, Robert Morris, David Karger, Frans Kaashoek, and Hari Balakrishnan, “Chord: A scalable peer-to-peer lookup service for challenges: (1) The smallest sorted unit is region in HBase internet applications,” in Proceedings of the 2001 conference on while it’s node in Cassandra, so the estimation method in Applications, Technologies, Architectures, and Protocols for Computer HBase is not suitable for Cassandra; (2) The speed of range Communications, 2001. query of Cassandra is not fast enough to accelerate the [11] Pelops project. [Online]. Available. https://github.com/s7/scale7-pelops CCIndex performance; (3) The APIs of HBase and Cassandra are different. This paper proposes a new approach to estimate result size and exposes the same CCIndex APIs for application to tackle the first and the third mismatch. The speed of range query is determined by Cassandra system, Cassandra could do some optimization like prefetching and caching on adjacent records. The experimental results show that CCIndex gains 2.4 to 3.7 times performance over Cassandra’s index scheme with 136