This document discusses using NoSQL databases like Cassandra to store and analyze traces from the ATLAS DQ2 tracer service. Currently, aggregating the large volume of tracer data (~5 million traces per day) to generate monitoring and analysis reports takes a long time on Oracle. Cassandra allows performing the same queries in real-time by either building indexes from the raw traces or using distributed counters to pre-aggregate the data. Testing showed Cassandra could return results over 100x faster than Oracle for common analysis queries on tracer data.
3. DQ2 tracer service
• Records relevant information about dataset/file access and
usage on the Grid
– type, status, local site, remote site, file size, time, usrdn, etc
• Used by dq2 client tools (dq2‐get,dq2‐put) and other apps
(PanDA, Athena)
• Traces can be analyzed for many purposes
– dataset popularity (popularity.cern.ch by Angelos)
– DDM simulations
– User behavior analysis
– DDM system monitoring
– …
• There are ~5 million traces every day
NoSQL & DQ2 Trace Service 3
4. Tracer monitoring use cases
• Whole system monitoring (real time)
– local‐read, local‐write, remote‐read, remote‐write
– failed‐operation
– breakdown by applications, dataset types, sites, DNs
• Dq2‐get statistics in DDM dashboard (real time)
– transfer rate in files and GB from each DDM endpoint or from each site/SE
– https://savannah.cern.ch/support/?121744
• Specified report (monthly/yearly)
– Get the amount of dq2‐getted data, per dataset type , per destination, per
domain, per DN
– For all end‐points, get the number of dq2‐get operations, breakdown by
distinct user
– For all groupdisk end‐points, give the number of all operations, read, write,
local‐read, remote‐read and distinct users, breakdown by application
NoSQL & DQ2 Trace Service 4
5. Problem
• All these use‐cases need aggregation(count,
sum) queries
• On the production Oracle, it usually takes tens
of minutes or hours
• These queries place a significant I/O workload
on Oracle
• The aggregation metrics can be very dynamic
and in large number
• We want to make the analysis in real time
NoSQL & DQ2 Trace Service 5
6. Possible ways
1. Can we just store the traces in table (Oracle
or NoSQL) and do ad‐hoc queries on it
whenever we need it?
2. If not, we may need to pre‐compute the trace
and store the indexes or counters, and query
on the them
NoSQL & DQ2 Trace Service 6
7. NoSQL ‐ Cassandra
• About Cassandra
– A distributed database, bringing together Dynamo's fully distributed design
and Bigtable's ColumnFamily‐based data model.
– Apache open source
• Some concepts
– Column based
– Replication factor (N)
– Eventually consistence (R+W > N)
– Partition (order‐preserving vs random)
• Order‐preserving partition may cause data imbalance between nodes and need manually
rebalanced
• Random partition balances very well, but loses the ability to do a range query on keys
– MemTable && SSTable
• Memory >> Disk
• Sequential >> Random
– commitlog
NoSQL & DQ2 Trace Service 7
8. Data model in Cassandra
• Column
(name,value,timestamp)
• Row
key:{column1,column2,…}
• Column family
– Something like a table in relational DataBase
• Keyspace
– Usually one application has one keyspace
• Example
Keyspace: DDMTracer
Column family:
t_traces{
1311196995640667:{
‘eventType’ : ‘get’,
‘localSite’ : ‘CERN‐PROD_DATADISK’,
...
}
NoSQL & DQ2 Trace Service 8
9. Test results ‐ write performance
• Using multi‐mechanize, run time:10 minutes ,ramp up: 5s
• Row by row insertion, each row is ~3KB
• Tried 2*5 ,4*5,8*5,16*5 threads,1 connection per thread
Oracle INTR 8*5 threads Oracle RDTEST1 16*5 threads
Mongodb 8*5 threads Cassandra 16*5 threads
https://svnweb.cern.ch/trac/dq2/wiki/Oracle%20and%20NOSQL%20performance%20study#Writeperformance
NoSQL & DQ2 Trace Service 9
10. Test results ‐ query performance
• Migrate one month’s traces (90,578,231 rows / 34 gigabytes) to a test table
• Query 1
– Get the total number of traces
• Query 2
– For each '%GROUPDISK%‘ endpoint, get the "Total Traces“, "Write Traces“, "Total Users", for
the last month
Oracle Oracle RDTEST1 Oracle production
Query Oracle INTR Cassandra
RDTEST1 cache ADCR
Query 1 39 seconds 30 seconds ~1 second 1.14 hour 2.2 minutes
Query 2 47 seconds 30 seconds ~3 seconds >5 hours 28.3 minutes
• Notes on Oracle
– Thanks to Luca
– INTR and RDTEST1 use parallel sequential reading from IO.
• /*+ parallel (t 16) */
– In RDTEST1 with current IO setup speed is ~1.5 GB/sec
– In RDTEST1 cache, 34GB was used.
• Notes on Cassandra
– 9 nodes, default settings
– Using random partition, good for data balance between nodes, bad for range query on keys
NoSQL & DQ2 Trace Service 10
11. conclusion
• For large amount of data, aggregation usually involves lots
of disk I/O and is very slow, and has a significant impact on
Oracle
• Ad‐hoc queries on both Oracle(production) and Cassandra
don’t satisfy our need
• Oracle 11g on RDTEST1 performs well, looking forward it in
production, but
– Queries still affect oracle performance, need separate instances?
– For even larger data (i.e. 1 year), queries would still be slow
• I tried another way: make use of the insertion rate, to get
faster queries
– build up many pre‐defined indexes (slide 12)
– use distributed counters (slide 13)
NoSQL & DQ2 Trace Service 11
12. Use column family to build index
• Query test
– Query: get the count and sum of traces group by site and
eventType in a specific time period
– Use Cassandra CF to build indexes like
{‘site:eventType:traceID’ : filesize}
– Cassandra data model
t_index = {
'2011052017:remoteSite:eventType':{
'CERN‐PROD_DATADISK:put_sm:1304514380628696' : 23444,
'CERN‐PROD_DATADISK:get:1304514380628697' : 32232,
'CERN‐PROD_GROUPDISK:put_sm:1304514380628696' : 43122,
...
},
....
}
– Query results
Oracle(production, ADCR) Cassandra(use CF as index)
48 minutes (query t_traces) 10 seconds (query the index)
NoSQL & DQ2 Trace Service 12
13. Use distributed counters
• The process
– Agents read traces from the Queue
– Buffer for N(10) messages ActiveMQ ActiveMQ
– Increase the corresponding counters in
Cassandra
Trace message config
• This structure is simple
– All components are scalable
(distributed) agent agent agent
– Persistence is supported by MQ server
and Cassandra
– Do not need the trace messages to Increment
come in time order
• High performance on both write and
read Cassandra
– Can afford >10,000 update per second cluster
– Query usually takes less than 0.1
second DQ2 Tracer Infrastructure
– We can use replay to add new
counters on history data quickly
NoSQL & DQ2 Trace Service 13
14. Some monitoring plots from counters
count of dq2‐get for data type,June 2011 count of dq2‐get for dest sites ,June 2011
user CERN‐PROD
NTUP ROAMING
other TOKYO‐LCG2
AOD UKI‐SOUTHGRID‐OX‐HEP
ESD unidentified_BNL
TAG DESY‐HH
• Ref. Eric’s talk
• Will provide a general API for DDM Monitoring
NoSQL & DQ2 Trace Service 14