SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Jon Tedesco




IC2E 2013, San Francisco, CA, USA
Jon Tedesco, Roman Dudko, Abhishek Sharma, Reza Farivar, Roy Campbell
   Problem
    ◦ System administrators
         Bottleneck for detecting & responding to failures
         Communicate state of system quickly

   Monitoring
    ◦ Streaming, real-time data
    ◦ Ganglia
         Widely used, scalable, and flexible
                                                              Visualization
   Prediction
    ◦ Online prediction algorithms (real-time)

   Visualization Problem
    ◦ Ganglia
         Static, time-based graphs
3
4
   Interactive
    ◦ Responsive and controllable
   Real-time
    ◦ Streaming, real-time, automatic
   Informative
    ◦ Direct attention to potential problems and artifacts
   Intuitive
    ◦ Demand skill, not experience
   Scalable
    ◦ Visualize large clusters without sacrificing usability



                                                               5
   Objectives
    ◦ Streaming data
    ◦ Configurable and interactive
    ◦ Informative
   Use cases
    ◦ Heterogeneous cluster
    ◦ Rack failure
    ◦ Node failure
    ◦ Uneven load distribution


                                     6
   Architecture
    ◦ Simulator
       Generates simulated cluster data
       Streams data to clients
    ◦ Webpage
       Asynchronous & interactive

   Implementation
    ◦ JavaScript
       d3.js
       jQuery
    ◦ Python
    ◦ AJAX



                                           7
   Data
    ◦ Methodology
      Data types from previous work
      Heuristic values
    ◦ Examples
      CPU, memory, context switch rate
      Log events
      MapReduce tasks and jobs
      Failure or event prediction



                                          8
9
Main Visualization




              Customizable using control panel
              Aggregate view
               ◦   Summarize and drill down

              Draws attention to anomalies
                                                  10
   Switch between main visualizations
   Seamless transitions
    ◦   Uninterrupted data stream
                                         11
   Hierarchy of nodes, organized by rack
   Color and size configurable
   Scalable using summarization and drill-
    down
   Identify abnormal rack or nodes
                                              12
   Hierarchy of nodes, organized by rack
   Color and size configurable
   Scalable using summarization and drill-down
   Identify abnormal rack or nodes
                                                  13
   Grouped by job
   Color and size configurable
    ◦   Example uses role for color, time remaining for
        size

   Identify abnormal jobs or tasks

                                                          14
   Grouped by rack
   Color and size configurable
    ◦   Example uses CPU usage and rack color coding

   Identify abnormal nodes or racks
                                                       15
   Identify trends with nodes and racks
   Color, size, and plots configurable
   Identify correlations between metrics
                                            16
   Detailed data for individual node
   Traditional visualizations for single
    node
                                            17
Controls




              Configure metrics for visualizations
              Pause and resume data stream
              Legend for main visualization
                                                      18
Aggregate
                                 Data




   Aggregate data for the cluster
    ◦   Log events stream
    ◦   Global node data
    ◦   Summarization data
                                           19
History Controls


            Snapshots of historical data
             ◦   See main visualization and sidebar data at certain
                 time

            Visualize metric across time
                                                                      20
   Scalable
    ◦ Drill-down and summarization
    ◦ Efficient web-based framework
   Intuitive, informative
    ◦ Topological visualization
    ◦ Draw attention to abnormalities
   Interactive, real-time
    ◦ Designed for streaming data
    ◦ Configurable visualization
    ◦ Pause, rewind, resume



                                        21
   Experimental Setup
    ◦ Compare Theius to Ganglia
    ◦ 5 graduate students at UIUC
      No prior experience with Ganglia or Theius
    ◦ 4 comparative tasks
      Both Ganglia & Theius
    ◦ 6 scenarios for trends and correlations
      Theius only
    ◦ Timings & subjective feedback


                                                    22
60
                       Tasks
          50            ◦ Scenario 1
                           CPU usage in single node
          40
                        ◦ Scenario 2
Seconds




          30               Node with highest CPU
                        ◦ Scenario 3
          20
                           High memory usage
          10                nodes
                        ◦ Scenario 4
          0
                           Aggregate cluster use


               Theius
               Ganglia

                                                    23
   Task 1
    ◦ Identify abnormal rack in heterogeneous cluster                        2.2 s
   Task 2
    ◦ Identify rack with abnormal CPU usage
                                                                             6.2 s
   Task 3
                                                                             10.0 s
    ◦ Identify machine that logged the last fatal error
   Task 4
                                                                             67.4 s
    ◦ Identify machine with high CPU, memory usage, or context switch rate
   Task 5
    ◦ Identify rack with high CPU, memory usage, or context switch rate
                                                                             1.2 s
   Task 6
                                                                             7.8 s
    ◦ Identify correlation between context switch rate and CPU usage




                                                                                      24
   Source Code
    ◦ https://github.com/jtedesco/Theius
   Future Work
    ◦ User study
      System administrators
      Larger group
      Timing as appropriate metric
    ◦ MapReduce-specific visualizations
    ◦ Scalability experiments


                                           25
Jon Tedesco




IC2E 2013, San Francisco, CA, USA
Jon Tedesco, Roman Dudko, Abhishek Sharma, Reza Farivar, Roy Campbell

Weitere ähnliche Inhalte

Andere mochten auch

Presentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresecPresentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresecTiago Henriques
 
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...Jonas Traub
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkVenkata Naga Ravi
 
Web 2 0 Projects Elementary
Web 2 0 Projects ElementaryWeb 2 0 Projects Elementary
Web 2 0 Projects ElementaryCinci0987
 
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12c
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12cProcessing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12c
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12cGuido Schmutz
 
Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan confluent
 

Andere mochten auch (6)

Presentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresecPresentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresec
 
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
 
Web 2 0 Projects Elementary
Web 2 0 Projects ElementaryWeb 2 0 Projects Elementary
Web 2 0 Projects Elementary
 
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12c
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12cProcessing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12c
Processing Twitter Events in Real-Time with Oracle Event Processing (OEP) 12c
 
Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan
 

Ähnlich wie Theius: A Streaming Visualization Suite for Hadoop Clusters

Key-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscanaKey-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscanaMatteo Baglini
 
Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...Matthias Trapp
 
Instrumenting parsecs raytrace
Instrumenting parsecs raytraceInstrumenting parsecs raytrace
Instrumenting parsecs raytraceMário Almeida
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexThomas Weise
 
Nuxeo World Session: Scaling Nuxeo Applications
Nuxeo World Session: Scaling Nuxeo ApplicationsNuxeo World Session: Scaling Nuxeo Applications
Nuxeo World Session: Scaling Nuxeo ApplicationsNuxeo
 
Automating Monitoring with Puppet
Automating Monitoring with PuppetAutomating Monitoring with Puppet
Automating Monitoring with PuppetChristian Mague
 
Seattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffersSeattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffersbtoddb
 
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016Zabbix
 
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...Liming Zhu
 
Presentation agile
Presentation agilePresentation agile
Presentation agileuji_geotec
 
MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)Scott Hernandez
 
Open stack china_201109_sjtu_jinyh
Open stack china_201109_sjtu_jinyhOpen stack china_201109_sjtu_jinyh
Open stack china_201109_sjtu_jinyhOpenCity Community
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
 
Urs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural NetworksUrs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural NetworksIntel Nervana
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talkSatish Mehta
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
 
Bergman Enabling Computation for neuro ML external
Bergman Enabling Computation for neuro ML externalBergman Enabling Computation for neuro ML external
Bergman Enabling Computation for neuro ML externalazlefty
 

Ähnlich wie Theius: A Streaming Visualization Suite for Hadoop Clusters (20)

TARDEC Presentation 2
TARDEC Presentation 2TARDEC Presentation 2
TARDEC Presentation 2
 
Key-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscanaKey-value databases in practice Redis @ DotNetToscana
Key-value databases in practice Redis @ DotNetToscana
 
Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...Performance Evaluation and Comparison of Service-based Image Processing based...
Performance Evaluation and Comparison of Service-based Image Processing based...
 
Instrumenting parsecs raytrace
Instrumenting parsecs raytraceInstrumenting parsecs raytrace
Instrumenting parsecs raytrace
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache Apex
 
Nuxeo World Session: Scaling Nuxeo Applications
Nuxeo World Session: Scaling Nuxeo ApplicationsNuxeo World Session: Scaling Nuxeo Applications
Nuxeo World Session: Scaling Nuxeo Applications
 
Automating Monitoring with Puppet
Automating Monitoring with PuppetAutomating Monitoring with Puppet
Automating Monitoring with Puppet
 
Seattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffersSeattle Cassandra Meetup - HasOffers
Seattle Cassandra Meetup - HasOffers
 
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
Mikhail Serkov - Zabbix for HPC Cluster Support | ZabConf2016
 
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
 
Scalarium and CouchDB
Scalarium and CouchDBScalarium and CouchDB
Scalarium and CouchDB
 
Presentation agile
Presentation agilePresentation agile
Presentation agile
 
Dynomite - PerconaLive 2017
Dynomite  - PerconaLive 2017Dynomite  - PerconaLive 2017
Dynomite - PerconaLive 2017
 
MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)
 
Open stack china_201109_sjtu_jinyh
Open stack china_201109_sjtu_jinyhOpen stack china_201109_sjtu_jinyh
Open stack china_201109_sjtu_jinyh
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Urs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural NetworksUrs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural Networks
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talk
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
Bergman Enabling Computation for neuro ML external
Bergman Enabling Computation for neuro ML externalBergman Enabling Computation for neuro ML external
Bergman Enabling Computation for neuro ML external
 

Kürzlich hochgeladen

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Kürzlich hochgeladen (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Theius: A Streaming Visualization Suite for Hadoop Clusters

  • 1. Jon Tedesco IC2E 2013, San Francisco, CA, USA Jon Tedesco, Roman Dudko, Abhishek Sharma, Reza Farivar, Roy Campbell
  • 2. Problem ◦ System administrators  Bottleneck for detecting & responding to failures  Communicate state of system quickly  Monitoring ◦ Streaming, real-time data ◦ Ganglia  Widely used, scalable, and flexible Visualization  Prediction ◦ Online prediction algorithms (real-time)  Visualization Problem ◦ Ganglia  Static, time-based graphs
  • 3. 3
  • 4. 4
  • 5. Interactive ◦ Responsive and controllable  Real-time ◦ Streaming, real-time, automatic  Informative ◦ Direct attention to potential problems and artifacts  Intuitive ◦ Demand skill, not experience  Scalable ◦ Visualize large clusters without sacrificing usability 5
  • 6. Objectives ◦ Streaming data ◦ Configurable and interactive ◦ Informative  Use cases ◦ Heterogeneous cluster ◦ Rack failure ◦ Node failure ◦ Uneven load distribution 6
  • 7. Architecture ◦ Simulator  Generates simulated cluster data  Streams data to clients ◦ Webpage  Asynchronous & interactive  Implementation ◦ JavaScript  d3.js  jQuery ◦ Python ◦ AJAX 7
  • 8. Data ◦ Methodology  Data types from previous work  Heuristic values ◦ Examples  CPU, memory, context switch rate  Log events  MapReduce tasks and jobs  Failure or event prediction 8
  • 9. 9
  • 10. Main Visualization  Customizable using control panel  Aggregate view ◦ Summarize and drill down  Draws attention to anomalies 10
  • 11. Switch between main visualizations  Seamless transitions ◦ Uninterrupted data stream 11
  • 12. Hierarchy of nodes, organized by rack  Color and size configurable  Scalable using summarization and drill- down  Identify abnormal rack or nodes 12
  • 13. Hierarchy of nodes, organized by rack  Color and size configurable  Scalable using summarization and drill-down  Identify abnormal rack or nodes 13
  • 14. Grouped by job  Color and size configurable ◦ Example uses role for color, time remaining for size  Identify abnormal jobs or tasks 14
  • 15. Grouped by rack  Color and size configurable ◦ Example uses CPU usage and rack color coding  Identify abnormal nodes or racks 15
  • 16. Identify trends with nodes and racks  Color, size, and plots configurable  Identify correlations between metrics 16
  • 17. Detailed data for individual node  Traditional visualizations for single node 17
  • 18. Controls  Configure metrics for visualizations  Pause and resume data stream  Legend for main visualization 18
  • 19. Aggregate Data  Aggregate data for the cluster ◦ Log events stream ◦ Global node data ◦ Summarization data 19
  • 20. History Controls  Snapshots of historical data ◦ See main visualization and sidebar data at certain time  Visualize metric across time 20
  • 21. Scalable ◦ Drill-down and summarization ◦ Efficient web-based framework  Intuitive, informative ◦ Topological visualization ◦ Draw attention to abnormalities  Interactive, real-time ◦ Designed for streaming data ◦ Configurable visualization ◦ Pause, rewind, resume 21
  • 22. Experimental Setup ◦ Compare Theius to Ganglia ◦ 5 graduate students at UIUC  No prior experience with Ganglia or Theius ◦ 4 comparative tasks  Both Ganglia & Theius ◦ 6 scenarios for trends and correlations  Theius only ◦ Timings & subjective feedback 22
  • 23. 60  Tasks 50 ◦ Scenario 1  CPU usage in single node 40 ◦ Scenario 2 Seconds 30  Node with highest CPU ◦ Scenario 3 20  High memory usage 10 nodes ◦ Scenario 4 0  Aggregate cluster use Theius Ganglia 23
  • 24. Task 1 ◦ Identify abnormal rack in heterogeneous cluster 2.2 s  Task 2 ◦ Identify rack with abnormal CPU usage 6.2 s  Task 3 10.0 s ◦ Identify machine that logged the last fatal error  Task 4 67.4 s ◦ Identify machine with high CPU, memory usage, or context switch rate  Task 5 ◦ Identify rack with high CPU, memory usage, or context switch rate 1.2 s  Task 6 7.8 s ◦ Identify correlation between context switch rate and CPU usage 24
  • 25. Source Code ◦ https://github.com/jtedesco/Theius  Future Work ◦ User study  System administrators  Larger group  Timing as appropriate metric ◦ MapReduce-specific visualizations ◦ Scalability experiments 25
  • 26. Jon Tedesco IC2E 2013, San Francisco, CA, USA Jon Tedesco, Roman Dudko, Abhishek Sharma, Reza Farivar, Roy Campbell