SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
NoSQL & Dq2 Tracer Service
           Donal Zang (IHEP)
            PH‐ADP‐DDM

  ATLAS Software & Computing Workshop
              July 21,2011




             ph‐adp‐ddm‐lab@cern.ch     1
content
• DQ2 tracer service
• NoSQL experience




                  NoSQL & DQ2 Trace Service   2
DQ2 tracer service
• Records relevant information about dataset/file access and 
  usage on the Grid
   – type, status, local site, remote site, file size, time, usrdn, etc
• Used by dq2 client tools (dq2‐get,dq2‐put) and other apps 
  (PanDA, Athena)
• Traces can be analyzed for many purposes
   –   dataset popularity (popularity.cern.ch by Angelos)
   –   DDM simulations
   –   User behavior analysis
   –   DDM system monitoring
   –   …
• There are ~5 million traces every day

                             NoSQL & DQ2 Trace Service                    3
Tracer monitoring use cases
•   Whole system monitoring (real time)
     – local‐read, local‐write, remote‐read, remote‐write 
     – failed‐operation
     – breakdown by applications, dataset types, sites, DNs
•   Dq2‐get statistics in DDM dashboard (real time)
     – transfer rate in files and GB from each DDM endpoint or from each site/SE
     – https://savannah.cern.ch/support/?121744
•   Specified report (monthly/yearly)
     – Get the amount of dq2‐getted data, per dataset type , per destination, per 
       domain, per DN
     – For all end‐points, get the number of dq2‐get operations, breakdown by 
       distinct user
     – For all groupdisk end‐points, give the number of all operations, read, write, 
       local‐read, remote‐read and distinct users, breakdown by application 




                                  NoSQL & DQ2 Trace Service                             4
Problem
• All these use‐cases need aggregation(count, 
  sum) queries
• On the production Oracle, it usually takes tens 
  of minutes or hours
• These queries place a significant I/O workload 
  on Oracle
• The aggregation metrics can be very dynamic 
  and in large number
• We want to make the analysis in real time
                   NoSQL & DQ2 Trace Service     5
Possible ways
1. Can we just store the traces in table (Oracle 
   or NoSQL) and do ad‐hoc queries on it 
   whenever we need it?
2. If not, we may need to pre‐compute the trace 
   and store the indexes or counters, and query 
   on the them



                   NoSQL & DQ2 Trace Service   6
NoSQL ‐ Cassandra
•   About Cassandra
    – A distributed database, bringing together Dynamo's fully distributed design 
      and Bigtable's ColumnFamily‐based data model. 
    – Apache open source
•   Some concepts
    –   Column based 
    –   Replication factor (N)
    –   Eventually consistence  (R+W > N)
    –   Partition (order‐preserving vs random)
         • Order‐preserving partition may cause data imbalance between nodes and need manually 
           rebalanced
         • Random partition balances very well, but loses the ability to do a range query on keys
    – MemTable && SSTable
         • Memory >> Disk
         • Sequential >> Random
    – commitlog



                                    NoSQL & DQ2 Trace Service                                  7
Data model in Cassandra
•   Column
         (name,value,timestamp)
•   Row
         key:{column1,column2,…}
•   Column family
    – Something like a table in relational DataBase
•   Keyspace
    – Usually one application has one keyspace
•   Example
    Keyspace: DDMTracer
    Column family:
    t_traces{
      1311196995640667:{
         ‘eventType’      :  ‘get’,
         ‘localSite’         :  ‘CERN‐PROD_DATADISK’,
         ...
      }


                                             NoSQL & DQ2 Trace Service   8
Test results ‐ write performance
•   Using multi‐mechanize, run time:10 minutes ,ramp up: 5s 
•   Row by row insertion, each row is ~3KB
•   Tried  2*5 ,4*5,8*5,16*5 threads,1 connection per thread




       Oracle INTR 8*5 threads                                         Oracle RDTEST1 16*5 threads




       Mongodb 8*5 threads                                             Cassandra 16*5 threads


    https://svnweb.cern.ch/trac/dq2/wiki/Oracle%20and%20NOSQL%20performance%20study#Writeperformance
                                           NoSQL & DQ2 Trace Service                                   9
Test results ‐ query performance
•   Migrate one month’s traces (90,578,231 rows / 34 gigabytes) to a test table
•   Query 1
      – Get the total number of traces
•   Query 2
      – For each '%GROUPDISK%‘ endpoint, get the "Total Traces“, "Write Traces“, "Total Users", for 
        the last month

                                                Oracle                  Oracle RDTEST1         Oracle production 
      Query                  Oracle INTR                                                                            Cassandra 
                                                RDTEST1                 cache                  ADCR

      Query 1                39 seconds         30 seconds              ~1 second              1.14 hour            2.2 minutes

      Query 2                47 seconds         30 seconds              ~3 seconds             >5 hours             28.3 minutes

•   Notes on Oracle 
      –    Thanks to Luca
      –    INTR and RDTEST1 use parallel sequential reading from IO. 
               •   /*+ parallel (t 16) */
      –    In RDTEST1 with current IO setup speed is ~1.5 GB/sec 
      –    In RDTEST1 cache, 34GB was used.
•   Notes on Cassandra
      –    9 nodes, default settings
      –    Using random partition, good for data balance between nodes, bad for range query on keys




                                                              NoSQL & DQ2 Trace Service                                            10
conclusion
• For large amount of data, aggregation usually involves lots 
  of disk I/O  and is very slow, and has a significant impact on 
  Oracle
• Ad‐hoc queries on both Oracle(production) and Cassandra 
  don’t satisfy our need
• Oracle 11g on RDTEST1 performs well, looking forward it in 
  production, but
   – Queries still affect oracle performance, need separate instances?
   – For even larger data (i.e. 1 year), queries would still be slow
• I tried another way: make use of the insertion rate, to get 
  faster queries
   – build up many pre‐defined indexes (slide 12)
   – use distributed counters (slide 13)

                          NoSQL & DQ2 Trace Service                 11
Use column family to build index
• Query test
   – Query: get the count and sum of traces group by site and 
     eventType in a specific time period
   – Use Cassandra CF to build indexes like 
      {‘site:eventType:traceID’ : filesize}
   – Cassandra data model
      t_index = {
          '2011052017:remoteSite:eventType':{
                 'CERN‐PROD_DATADISK:put_sm:1304514380628696'        : 23444,
                 'CERN‐PROD_DATADISK:get:1304514380628697'           : 32232,
                 'CERN‐PROD_GROUPDISK:put_sm:1304514380628696'       : 43122,
                 ...
              },
          ....
        }

   – Query results
      Oracle(production, ADCR)                              Cassandra(use CF as index)
      48 minutes (query t_traces)                           10 seconds (query the index)


                                                 NoSQL & DQ2 Trace Service                 12
Use distributed counters
•   The process
     – Agents read traces from the Queue
     – Buffer for N(10) messages                        ActiveMQ             ActiveMQ
     – Increase the corresponding counters in 
       Cassandra
                                                             Trace message               config
•   This structure is simple 
     – All components are scalable 
       (distributed)                                   agent        agent        agent
     – Persistence is supported by MQ server 
       and Cassandra
     – Do not need the trace messages to                               Increment 
       come in time order
•   High performance on both write and 
    read                                                          Cassandra
     – Can afford >10,000 update per second                        cluster
     – Query usually takes less than 0.1 
       second                                                DQ2 Tracer Infrastructure
     – We can use replay to add new 
       counters on history data quickly
                                 NoSQL & DQ2 Trace Service                                  13
Some monitoring plots from counters
count of dq2‐get for data type,June 2011               count of dq2‐get for dest sites ,June 2011
                              user                                                    CERN‐PROD


                              NTUP                                                    ROAMING


                              other                                                   TOKYO‐LCG2


                              AOD                                                     UKI‐SOUTHGRID‐OX‐HEP


                              ESD                                                     unidentified_BNL


                              TAG                                                     DESY‐HH




            • Ref. Eric’s talk
            • Will provide a general API for DDM Monitoring




                                     NoSQL & DQ2 Trace Service                                           14
Thanks!
Questions?




 NoSQL & DQ2 Trace Service   15
backup ‐ Test‐bed setup
•   MongoDB (2 nodes)
     –    Hardware type: Intel(R) Xeon(R) CPU L5520 @ 2.27GHz (2/8), 24098 MB (6142 MB)
     –    MongoDB version: 1.8.1 (latest stable)
•   Cassandra (9 nodes cluster)
     –    Hardware type: Intel(R) Xeon(R) CPU L5520 @ 2.27GHz (2/8),24098 MB (28662 MB)
     –    Cassandra version: apache‐cassandra‐0.7.6‐2
     –    Puppet configuration:  https://github.com/ddmlab/cassandra
•   Oracle 
     –   Hardware type: Intel(R) Xeon(R) CPU L5640 @ 2.27GHz (2/12), 48290 MB (16387 MB)
     –   Storage: ASM and 8Gbps dual‐ported HBAs. 2 storage arrays, 24 SAS disks in total. NAS on 
       10GigE also available.
     –   Oracle version: 11g
     –   DB_name: rdtest1

     –    Hardware type: Intel(R) Xeon(R) CPU L5520 @ 2.27GHz (23/8), 24097 M
     –    Storage: ASM and 4Gbps dual‐ported HBAs. 3 storage arrays,36 SATA disks in total.
     –    Oracle version: 10g
     –    DB_name: intr



                                       NoSQL & DQ2 Trace Service                                     16

Weitere ähnliche Inhalte

Was ist angesagt?

Linux-HA with Pacemaker
Linux-HA with PacemakerLinux-HA with Pacemaker
Linux-HA with PacemakerKris Buytaert
 
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...Junho Suh
 
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017Alex Robinson
 
MySQL HA with Pacemaker
MySQL HA with  PacemakerMySQL HA with  Pacemaker
MySQL HA with PacemakerKris Buytaert
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 
MySQL with DRBD/Pacemaker/Corosync on Linux
 MySQL with DRBD/Pacemaker/Corosync on Linux MySQL with DRBD/Pacemaker/Corosync on Linux
MySQL with DRBD/Pacemaker/Corosync on LinuxPawan Kumar
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Steve Loughran
 
유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리NAVER D2
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyBenjamin Black
 
Taskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task ManagerTaskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task ManagerRaghavendra Prabhu
 
Linux-HA with Pacemaker
Linux-HA with PacemakerLinux-HA with Pacemaker
Linux-HA with PacemakerKris Buytaert
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseBenjamin Bengfort
 
Cluster Computing with Dryad
Cluster Computing with DryadCluster Computing with Dryad
Cluster Computing with Dryadbutest
 
Advanced Spark Programming - Part 1 | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Spark Programming - Part 1 | Big Data Hadoop Spark Tutorial | CloudxLabAdvanced Spark Programming - Part 1 | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Spark Programming - Part 1 | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systemsinside-BigData.com
 

Was ist angesagt? (20)

Linux-HA with Pacemaker
Linux-HA with PacemakerLinux-HA with Pacemaker
Linux-HA with Pacemaker
 
Simon
SimonSimon
Simon
 
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...
Opensample: A Low-latency, Sampling-based Measurement Platform for Software D...
 
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
 
Apache Spark RDD 101
Apache Spark RDD 101Apache Spark RDD 101
Apache Spark RDD 101
 
MySQL HA with Pacemaker
MySQL HA with  PacemakerMySQL HA with  Pacemaker
MySQL HA with Pacemaker
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
MySQL with DRBD/Pacemaker/Corosync on Linux
 MySQL with DRBD/Pacemaker/Corosync on Linux MySQL with DRBD/Pacemaker/Corosync on Linux
MySQL with DRBD/Pacemaker/Corosync on Linux
 
Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)Availability and Integrity in hadoop (Strata EU Edition)
Availability and Integrity in hadoop (Strata EU Edition)
 
Spanner osdi2012
Spanner osdi2012Spanner osdi2012
Spanner osdi2012
 
유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
 
Taskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task ManagerTaskerman: A Distributed Cluster Task Manager
Taskerman: A Distributed Cluster Task Manager
 
Presentation
PresentationPresentation
Presentation
 
Linux-HA with Pacemaker
Linux-HA with PacemakerLinux-HA with Pacemaker
Linux-HA with Pacemaker
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed Database
 
Cluster Computing with Dryad
Cluster Computing with DryadCluster Computing with Dryad
Cluster Computing with Dryad
 
HIGH AVAILABLE CLUSTER IN WEB SERVER WITH HEARTBEAT + DRBD + OCFS2
HIGH AVAILABLE CLUSTER IN WEB SERVER WITH  HEARTBEAT + DRBD + OCFS2HIGH AVAILABLE CLUSTER IN WEB SERVER WITH  HEARTBEAT + DRBD + OCFS2
HIGH AVAILABLE CLUSTER IN WEB SERVER WITH HEARTBEAT + DRBD + OCFS2
 
Advanced Spark Programming - Part 1 | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Spark Programming - Part 1 | Big Data Hadoop Spark Tutorial | CloudxLabAdvanced Spark Programming - Part 1 | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Spark Programming - Part 1 | Big Data Hadoop Spark Tutorial | CloudxLab
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systems
 

Andere mochten auch

ZBiz 2011 Social Media Predictions Revised
ZBiz 2011 Social Media Predictions RevisedZBiz 2011 Social Media Predictions Revised
ZBiz 2011 Social Media Predictions RevisedZach Johnson
 
10 Social Media Predictions For 2011
10 Social Media Predictions For 201110 Social Media Predictions For 2011
10 Social Media Predictions For 2011Zach Johnson
 
Internet strategieplan Saxion Techniek
Internet strategieplan Saxion TechniekInternet strategieplan Saxion Techniek
Internet strategieplan Saxion TechniekWouter Frigge
 
El diamante de cristal
El diamante de cristalEl diamante de cristal
El diamante de cristalcarojesi
 
How to address 21st century skills in the classroom
How to address 21st century skills in the classroomHow to address 21st century skills in the classroom
How to address 21st century skills in the classroomtarmendariz1216
 

Andere mochten auch (6)

ZBiz 2011 Social Media Predictions Revised
ZBiz 2011 Social Media Predictions RevisedZBiz 2011 Social Media Predictions Revised
ZBiz 2011 Social Media Predictions Revised
 
10 Social Media Predictions For 2011
10 Social Media Predictions For 201110 Social Media Predictions For 2011
10 Social Media Predictions For 2011
 
Rapid Upgrades With Pg_Migrator
Rapid Upgrades With Pg_MigratorRapid Upgrades With Pg_Migrator
Rapid Upgrades With Pg_Migrator
 
Internet strategieplan Saxion Techniek
Internet strategieplan Saxion TechniekInternet strategieplan Saxion Techniek
Internet strategieplan Saxion Techniek
 
El diamante de cristal
El diamante de cristalEl diamante de cristal
El diamante de cristal
 
How to address 21st century skills in the classroom
How to address 21st century skills in the classroomHow to address 21st century skills in the classroom
How to address 21st century skills in the classroom
 

Ähnlich wie No sql & dq2 tracer service

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...Codemotion
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...Codemotion Tel Aviv
 
Performance challenges in software networking
Performance challenges in software networkingPerformance challenges in software networking
Performance challenges in software networkingStephen Hemminger
 
Secure lustre on openstack
Secure lustre on openstackSecure lustre on openstack
Secure lustre on openstackJames Beal
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkDemi Ben-Ari
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterDataStax Academy
 
Ten tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache SparkTen tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache SparkWill Du
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
Apache Spark II (SparkSQL)
Apache Spark II (SparkSQL)Apache Spark II (SparkSQL)
Apache Spark II (SparkSQL)Datio Big Data
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kevin Lynch
 
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G coreTối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G coreVietnam Open Infrastructure User Group
 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkDatabricks
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022StreamNative
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBANikhil Kumar
 
Oracle 11g R2 RAC implementation and concept
Oracle 11g R2 RAC implementation and conceptOracle 11g R2 RAC implementation and concept
Oracle 11g R2 RAC implementation and conceptSantosh Kangane
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor API
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor APIBeyond the DSL - Unlocking the power of Kafka Streams with the Processor API
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor APIconfluent
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...DataStax Academy
 

Ähnlich wie No sql & dq2 tracer service (20)

S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark  - Demi Be...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Be...
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
S3, Cassandra or Outer Space? Dumping Time Series Data using Spark - Demi Ben...
 
Performance challenges in software networking
Performance challenges in software networkingPerformance challenges in software networking
Performance challenges in software networking
 
Secure lustre on openstack
Secure lustre on openstackSecure lustre on openstack
Secure lustre on openstack
 
Scala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache sparkScala like distributed collections - dumping time-series data with apache spark
Scala like distributed collections - dumping time-series data with apache spark
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra Cluster
 
Ten tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache SparkTen tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache Spark
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Apache Spark II (SparkSQL)
Apache Spark II (SparkSQL)Apache Spark II (SparkSQL)
Apache Spark II (SparkSQL)
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G coreTối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core
 
Tuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache SparkTuning and Monitoring Deep Learning on Apache Spark
Tuning and Monitoring Deep Learning on Apache Spark
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBA
 
Oracle 11g R2 RAC implementation and concept
Oracle 11g R2 RAC implementation and conceptOracle 11g R2 RAC implementation and concept
Oracle 11g R2 RAC implementation and concept
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor API
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor APIBeyond the DSL - Unlocking the power of Kafka Streams with the Processor API
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor API
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
 

Kürzlich hochgeladen

COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 

Kürzlich hochgeladen (20)

COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 

No sql & dq2 tracer service

  • 1. NoSQL & Dq2 Tracer Service Donal Zang (IHEP) PH‐ADP‐DDM ATLAS Software & Computing Workshop July 21,2011 ph‐adp‐ddm‐lab@cern.ch 1
  • 2. content • DQ2 tracer service • NoSQL experience NoSQL & DQ2 Trace Service 2
  • 3. DQ2 tracer service • Records relevant information about dataset/file access and  usage on the Grid – type, status, local site, remote site, file size, time, usrdn, etc • Used by dq2 client tools (dq2‐get,dq2‐put) and other apps  (PanDA, Athena) • Traces can be analyzed for many purposes – dataset popularity (popularity.cern.ch by Angelos) – DDM simulations – User behavior analysis – DDM system monitoring – … • There are ~5 million traces every day NoSQL & DQ2 Trace Service 3
  • 4. Tracer monitoring use cases • Whole system monitoring (real time) – local‐read, local‐write, remote‐read, remote‐write  – failed‐operation – breakdown by applications, dataset types, sites, DNs • Dq2‐get statistics in DDM dashboard (real time) – transfer rate in files and GB from each DDM endpoint or from each site/SE – https://savannah.cern.ch/support/?121744 • Specified report (monthly/yearly) – Get the amount of dq2‐getted data, per dataset type , per destination, per  domain, per DN – For all end‐points, get the number of dq2‐get operations, breakdown by  distinct user – For all groupdisk end‐points, give the number of all operations, read, write,  local‐read, remote‐read and distinct users, breakdown by application  NoSQL & DQ2 Trace Service 4
  • 5. Problem • All these use‐cases need aggregation(count,  sum) queries • On the production Oracle, it usually takes tens  of minutes or hours • These queries place a significant I/O workload  on Oracle • The aggregation metrics can be very dynamic  and in large number • We want to make the analysis in real time NoSQL & DQ2 Trace Service 5
  • 6. Possible ways 1. Can we just store the traces in table (Oracle  or NoSQL) and do ad‐hoc queries on it  whenever we need it? 2. If not, we may need to pre‐compute the trace  and store the indexes or counters, and query  on the them NoSQL & DQ2 Trace Service 6
  • 7. NoSQL ‐ Cassandra • About Cassandra – A distributed database, bringing together Dynamo's fully distributed design  and Bigtable's ColumnFamily‐based data model.  – Apache open source • Some concepts – Column based  – Replication factor (N) – Eventually consistence  (R+W > N) – Partition (order‐preserving vs random) • Order‐preserving partition may cause data imbalance between nodes and need manually  rebalanced • Random partition balances very well, but loses the ability to do a range query on keys – MemTable && SSTable • Memory >> Disk • Sequential >> Random – commitlog NoSQL & DQ2 Trace Service 7
  • 8. Data model in Cassandra • Column (name,value,timestamp) • Row key:{column1,column2,…} • Column family – Something like a table in relational DataBase • Keyspace – Usually one application has one keyspace • Example Keyspace: DDMTracer Column family: t_traces{ 1311196995640667:{ ‘eventType’      :  ‘get’, ‘localSite’         :  ‘CERN‐PROD_DATADISK’, ... } NoSQL & DQ2 Trace Service 8
  • 9. Test results ‐ write performance • Using multi‐mechanize, run time:10 minutes ,ramp up: 5s  • Row by row insertion, each row is ~3KB • Tried  2*5 ,4*5,8*5,16*5 threads,1 connection per thread Oracle INTR 8*5 threads Oracle RDTEST1 16*5 threads Mongodb 8*5 threads  Cassandra 16*5 threads https://svnweb.cern.ch/trac/dq2/wiki/Oracle%20and%20NOSQL%20performance%20study#Writeperformance NoSQL & DQ2 Trace Service 9
  • 10. Test results ‐ query performance • Migrate one month’s traces (90,578,231 rows / 34 gigabytes) to a test table • Query 1 – Get the total number of traces • Query 2 – For each '%GROUPDISK%‘ endpoint, get the "Total Traces“, "Write Traces“, "Total Users", for  the last month Oracle Oracle RDTEST1  Oracle production  Query  Oracle INTR  Cassandra  RDTEST1  cache  ADCR Query 1 39 seconds  30 seconds  ~1 second  1.14 hour 2.2 minutes Query 2 47 seconds  30 seconds  ~3 seconds  >5 hours 28.3 minutes • Notes on Oracle  – Thanks to Luca – INTR and RDTEST1 use parallel sequential reading from IO.  • /*+ parallel (t 16) */ – In RDTEST1 with current IO setup speed is ~1.5 GB/sec  – In RDTEST1 cache, 34GB was used. • Notes on Cassandra – 9 nodes, default settings – Using random partition, good for data balance between nodes, bad for range query on keys NoSQL & DQ2 Trace Service 10
  • 11. conclusion • For large amount of data, aggregation usually involves lots  of disk I/O  and is very slow, and has a significant impact on  Oracle • Ad‐hoc queries on both Oracle(production) and Cassandra  don’t satisfy our need • Oracle 11g on RDTEST1 performs well, looking forward it in  production, but – Queries still affect oracle performance, need separate instances? – For even larger data (i.e. 1 year), queries would still be slow • I tried another way: make use of the insertion rate, to get  faster queries – build up many pre‐defined indexes (slide 12) – use distributed counters (slide 13) NoSQL & DQ2 Trace Service 11
  • 12. Use column family to build index • Query test – Query: get the count and sum of traces group by site and  eventType in a specific time period – Use Cassandra CF to build indexes like  {‘site:eventType:traceID’ : filesize} – Cassandra data model t_index = { '2011052017:remoteSite:eventType':{ 'CERN‐PROD_DATADISK:put_sm:1304514380628696'  : 23444, 'CERN‐PROD_DATADISK:get:1304514380628697'  : 32232, 'CERN‐PROD_GROUPDISK:put_sm:1304514380628696'  : 43122, ... }, .... } – Query results Oracle(production, ADCR) Cassandra(use CF as index) 48 minutes (query t_traces) 10 seconds (query the index) NoSQL & DQ2 Trace Service 12
  • 13. Use distributed counters • The process – Agents read traces from the Queue – Buffer for N(10) messages ActiveMQ ActiveMQ – Increase the corresponding counters in  Cassandra Trace message config • This structure is simple  – All components are scalable  (distributed) agent agent agent – Persistence is supported by MQ server  and Cassandra – Do not need the trace messages to  Increment  come in time order • High performance on both write and  read Cassandra – Can afford >10,000 update per second cluster – Query usually takes less than 0.1  second  DQ2 Tracer Infrastructure – We can use replay to add new  counters on history data quickly NoSQL & DQ2 Trace Service 13
  • 14. Some monitoring plots from counters count of dq2‐get for data type,June 2011 count of dq2‐get for dest sites ,June 2011 user CERN‐PROD NTUP ROAMING other TOKYO‐LCG2 AOD UKI‐SOUTHGRID‐OX‐HEP ESD unidentified_BNL TAG DESY‐HH • Ref. Eric’s talk • Will provide a general API for DDM Monitoring NoSQL & DQ2 Trace Service 14
  • 16. backup ‐ Test‐bed setup • MongoDB (2 nodes) – Hardware type: Intel(R) Xeon(R) CPU L5520 @ 2.27GHz (2/8), 24098 MB (6142 MB) – MongoDB version: 1.8.1 (latest stable) • Cassandra (9 nodes cluster) – Hardware type: Intel(R) Xeon(R) CPU L5520 @ 2.27GHz (2/8),24098 MB (28662 MB) – Cassandra version: apache‐cassandra‐0.7.6‐2 – Puppet configuration:  https://github.com/ddmlab/cassandra • Oracle  – Hardware type: Intel(R) Xeon(R) CPU L5640 @ 2.27GHz (2/12), 48290 MB (16387 MB) – Storage: ASM and 8Gbps dual‐ported HBAs. 2 storage arrays, 24 SAS disks in total. NAS on  10GigE also available. – Oracle version: 11g – DB_name: rdtest1 – Hardware type: Intel(R) Xeon(R) CPU L5520 @ 2.27GHz (23/8), 24097 M – Storage: ASM and 4Gbps dual‐ported HBAs. 3 storage arrays,36 SATA disks in total. – Oracle version: 10g – DB_name: intr NoSQL & DQ2 Trace Service 16