SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Setting up Monitoring System for Alluxio
with Prometheus and Grafana in 10 minutes
Pan Liu @ Tencent
2021.06.24
• Pan Liu
• Tencent
• Presto on Alluxio
Contents
• Part 1. How Alluxio Metrics System
Works
• Part 2. How to Implement a custom
Alluxio Sink
• Part 3. How to Set Up Monitoring of
Alluxio in 10 mins
Part 1. How Alluxio Metrics System Works
Alluxio Metrics System
• Framework
• Types of metrics
• Name metrics
• Flow of metrics from worker to master
Alluxio Metrics System
• Framework
Source1
Metrics
system
Sink1
Source2 Sink2
… …
Alluxio Metrics System
• Two Types of Metrics:
• Cluster Metrics : Aggegated from Workers and Clients
• Process Metrics: Collected by Each Alluxio Process
Master
Cluster Metrics
Process Metrics
Worker
Worker
Worker
Client
Client
Alluxio Metrics System
• Metrics Name:
• Master: Master.[metricName].[tag1].[tag2]...
• Non-Master: [processType].[metricName].[tag1].[tag2]...[hostName]
Master.GetFileInfoOps
Worker.OpenExistingFile.User:user.UFS:hdfs:%2F%2F9%2E135%2E88%2E92:9000%2F.UFS_TYPE:hdfs.worker_host
Alluxio Metrics System
MetricMaster
BlockMaster
processWorkerMetric
workerHeartbeat
MetricsStore putWorkerMetrics
RpcServer
BlockWorker
BlockMasterSync
BlockMasterClient
heartbeat
RpcClient
MasterProcess WorkerProcess
heartbeat
blockHeartbeat
blockHeartbeat
Part 2. How to Implement a custom Sink of Alluxio
Alluxio Sinks
• Passive VS. Active
• Passive: service PULL
• Active: report periodically PUSH
Active Passive
Alluxio Sinks
Active: Report periodically
Master_host:port/metrics/json
./logs/master.out
Passive: Scratch from server
Active Sink: ConsoleSink
…
…
Passive Sink : PrometheusMetricsServlet
• getHandler() called by master or worker
• Work as service
• Metrics are available only when requested
E.g. MasterProcess
Sink Extension
• Passive or Active ?
• XmlSink as an example of Active type: Print metrics in XML format to a specified path.
• Implement Sink interface
• Construct a XmlReporter to report metrics in XML format
• Config to enable XmlSink
conf/metrics.properties
Part 3. How to Set Up Monitoring for alluxio in 10
minutes
Alluxio Web UI Monitoring
Web UI metrics page
Convenient but doesn’t work well sometimes…
Alluxio Monitoring with Prometheus and Grafana
How it works ?
1. Prometheus scrapes metrics from Alluxio servers and transforms to time series data
2. Grafana server get metrics using the PromQL
3. Grafana web UI displays metrics in dashboards
What we need to do ?
1. Install and start Prometheus and Grafana Server
2. Add Alluxio Jobs to Prometheus
3. Download and import Grafana dashboard
4. Modify variables of dashboard template
Alluxio Monitoring with Prometheus and Grafana
prometheus.yml Version:
- Prometheus Version: 2.22.2
- Grafana Version: 7.5.6
- Alluxio Version: 2.5.0-3
targets/local/masters/master_01.yml
targets/local/workers/worker_01.yml
Services:
- Prometheus Server @ master:9090
- Grafana Server @ worker:3000
- Alluxio Master @ master
- Alluxio Worker @ worker
Alluxio Monitoring with Prometheus and Grafana
Alluxio Monitoring with Prometheus and Grafana
• Alluxio IO Key Metrics
• Read Local & Read Remote & Alive workers …
• Storage
• Space Used & UFS Space Used …
• Workers Blocks
• Cached Blocks & Evicted Blocks …
• Logical Operations
• Mount Operations & File Pinned …
• Alluxio Metadata Operations
• Block Hearbeat Cost & Get status Cost …
• AsyncCache Blocks & Operations
• AsyncCacheSuccessedBlocks & AysncCacheFailedBlocks …
• Master JVM Memory
• Master Heap Memory & Total Memory …
Alluxio Monitoring with Prometheus and Grafana
• Alluxio docs:
https://docs.alluxio.io/os/user/edge/en/operation/Metrics-
System.html#grafana-web-ui-with-prometheus
• Grafana dashboards:
https://grafana.com/grafana/dashboards/13467
Thanks!
panyliu@tencent.com

Weitere ähnliche Inhalte

Was ist angesagt?

Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016Alluxio, Inc.
 
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...Alluxio, Inc.
 
Open Source Memory Speed Virtual Distributed Storage
Open Source Memory Speed Virtual Distributed StorageOpen Source Memory Speed Virtual Distributed Storage
Open Source Memory Speed Virtual Distributed StorageAlluxio, Inc.
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit
 
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio, Inc.
 
Accessing Data Anywhere with Unified Namespace
Accessing Data Anywhere with Unified NamespaceAccessing Data Anywhere with Unified Namespace
Accessing Data Anywhere with Unified NamespaceAlluxio, Inc.
 
Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016Alluxio, Inc.
 
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio, Inc.
 
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017Alluxio, Inc.
 
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio (formerly Tachyon): The Journey thus far and the Road AheadAlluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio (formerly Tachyon): The Journey thus far and the Road AheadAlluxio, Inc.
 
Alluxio: Unify Data at Memory Speed; 2016-11-18
Alluxio: Unify Data at Memory Speed; 2016-11-18Alluxio: Unify Data at Memory Speed; 2016-11-18
Alluxio: Unify Data at Memory Speed; 2016-11-18Alluxio, Inc.
 
Getting Started with Alluxio + Spark + S3
Getting Started with Alluxio + Spark + S3Getting Started with Alluxio + Spark + S3
Getting Started with Alluxio + Spark + S3Alluxio, Inc.
 
Alluxio Keynote at Strata+Hadoop World Beijing 2016
Alluxio Keynote at Strata+Hadoop World Beijing 2016Alluxio Keynote at Strata+Hadoop World Beijing 2016
Alluxio Keynote at Strata+Hadoop World Beijing 2016Alluxio, Inc.
 
Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016Jiří Šimša
 
What’s new in Alluxio 2: from seamless operations to structured data management
What’s new in Alluxio 2: from seamless operations to structured data managementWhat’s new in Alluxio 2: from seamless operations to structured data management
What’s new in Alluxio 2: from seamless operations to structured data managementAlluxio, Inc.
 
CoreOS fest 2016 Summary - DevOps BP 2016 June
CoreOS fest 2016 Summary - DevOps BP 2016 JuneCoreOS fest 2016 Summary - DevOps BP 2016 June
CoreOS fest 2016 Summary - DevOps BP 2016 JuneZsolt Molnar
 
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...Alluxio, Inc.
 
Enabling Scientific Workflows on FermiCloud using OpenNebula
Enabling Scientific Workflows on FermiCloud using OpenNebulaEnabling Scientific Workflows on FermiCloud using OpenNebula
Enabling Scientific Workflows on FermiCloud using OpenNebulaNETWAYS
 
Puppet Camp Charlotte 2015: Use Puppet to Manage your NetApp Storage Infrastr...
Puppet Camp Charlotte 2015: Use Puppet to Manage your NetApp Storage Infrastr...Puppet Camp Charlotte 2015: Use Puppet to Manage your NetApp Storage Infrastr...
Puppet Camp Charlotte 2015: Use Puppet to Manage your NetApp Storage Infrastr...Puppet
 

Was ist angesagt? (20)

Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
Alluxio: The missing piece of on-demand clusters at Alluxio Meetup 2016
 
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
Alluxio (Formerly Tachyon): Unify Data At Memory Speed at Global Big Data Con...
 
Open Source Memory Speed Virtual Distributed Storage
Open Source Memory Speed Virtual Distributed StorageOpen Source Memory Speed Virtual Distributed Storage
Open Source Memory Speed Virtual Distributed Storage
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
 
Accessing Data Anywhere with Unified Namespace
Accessing Data Anywhere with Unified NamespaceAccessing Data Anywhere with Unified Namespace
Accessing Data Anywhere with Unified Namespace
 
Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016Alluxio Presentation at AMPLab Summer Retreat 2016
Alluxio Presentation at AMPLab Summer Retreat 2016
 
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
 
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
Effective Spark with Alluxio at Strata+Hadoop World San Jose 2017
 
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio (formerly Tachyon): The Journey thus far and the Road AheadAlluxio (formerly Tachyon): The Journey thus far and the Road Ahead
Alluxio (formerly Tachyon): The Journey thus far and the Road Ahead
 
Alluxio: Unify Data at Memory Speed; 2016-11-18
Alluxio: Unify Data at Memory Speed; 2016-11-18Alluxio: Unify Data at Memory Speed; 2016-11-18
Alluxio: Unify Data at Memory Speed; 2016-11-18
 
Getting Started with Alluxio + Spark + S3
Getting Started with Alluxio + Spark + S3Getting Started with Alluxio + Spark + S3
Getting Started with Alluxio + Spark + S3
 
Alluxio Keynote at Strata+Hadoop World Beijing 2016
Alluxio Keynote at Strata+Hadoop World Beijing 2016Alluxio Keynote at Strata+Hadoop World Beijing 2016
Alluxio Keynote at Strata+Hadoop World Beijing 2016
 
Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016Alluxio Presentation at Strata San Jose 2016
Alluxio Presentation at Strata San Jose 2016
 
What’s new in Alluxio 2: from seamless operations to structured data management
What’s new in Alluxio 2: from seamless operations to structured data managementWhat’s new in Alluxio 2: from seamless operations to structured data management
What’s new in Alluxio 2: from seamless operations to structured data management
 
CoreOS fest 2016 Summary - DevOps BP 2016 June
CoreOS fest 2016 Summary - DevOps BP 2016 JuneCoreOS fest 2016 Summary - DevOps BP 2016 June
CoreOS fest 2016 Summary - DevOps BP 2016 June
 
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
CNCF Member Webinar: Improving Data Locality for Analytics Jobs on Kubernetes...
 
Enabling Scientific Workflows on FermiCloud using OpenNebula
Enabling Scientific Workflows on FermiCloud using OpenNebulaEnabling Scientific Workflows on FermiCloud using OpenNebula
Enabling Scientific Workflows on FermiCloud using OpenNebula
 
Puppet Camp Charlotte 2015: Use Puppet to Manage your NetApp Storage Infrastr...
Puppet Camp Charlotte 2015: Use Puppet to Manage your NetApp Storage Infrastr...Puppet Camp Charlotte 2015: Use Puppet to Manage your NetApp Storage Infrastr...
Puppet Camp Charlotte 2015: Use Puppet to Manage your NetApp Storage Infrastr...
 
OpenStack Heat
OpenStack HeatOpenStack Heat
OpenStack Heat
 

Ähnlich wie Setting up monitoring system for Alluxio with Prometheus and Grafana in 10 minutes

Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataGetInData
 
Azure Functions in Action #CodePaLOUsa
Azure Functions in Action #CodePaLOUsaAzure Functions in Action #CodePaLOUsa
Azure Functions in Action #CodePaLOUsaBaskar rao Dsn
 
Monitoring kubernetes with prometheus-operator
Monitoring kubernetes with prometheus-operatorMonitoring kubernetes with prometheus-operator
Monitoring kubernetes with prometheus-operatorLili Cosic
 
Hands-on monitoring with Prometheus
Hands-on monitoring with PrometheusHands-on monitoring with Prometheus
Hands-on monitoring with PrometheusBrice Fernandes
 
MuleSoft Meetup Roma - Processi di Automazione su CloudHub
MuleSoft Meetup Roma - Processi di Automazione su CloudHubMuleSoft Meetup Roma - Processi di Automazione su CloudHub
MuleSoft Meetup Roma - Processi di Automazione su CloudHubAlfonso Martino
 
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...IRJET Journal
 
Chef Analytics Webinar
Chef Analytics WebinarChef Analytics Webinar
Chef Analytics WebinarJames Casey
 
2019 hashiconf seattle_consul_ioc
2019 hashiconf seattle_consul_ioc2019 hashiconf seattle_consul_ioc
2019 hashiconf seattle_consul_iocPierre Souchay
 
Advanced Orchestration & Automation
Advanced Orchestration & AutomationAdvanced Orchestration & Automation
Advanced Orchestration & AutomationLuc Raeskin
 
Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a...
Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a...Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a...
Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a...Tony Erwin
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)Lucas Jellema
 
Play with azure functions
Play with azure functionsPlay with azure functions
Play with azure functionsBaskar rao Dsn
 
PyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPiyush Kumar
 
Leveraging Analytics for DevOps
Leveraging Analytics for DevOpsLeveraging Analytics for DevOps
Leveraging Analytics for DevOpsMichael Floyd
 
Itsummit2015 blizzard
Itsummit2015 blizzardItsummit2015 blizzard
Itsummit2015 blizzardkevin_donovan
 
qTest <> TestProject Integration Webinar
qTest <> TestProject Integration WebinarqTest <> TestProject Integration Webinar
qTest <> TestProject Integration WebinarKevin Dunne
 

Ähnlich wie Setting up monitoring system for Alluxio with Prometheus and Grafana in 10 minutes (20)

Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
Azure Functions in Action #CodePaLOUsa
Azure Functions in Action #CodePaLOUsaAzure Functions in Action #CodePaLOUsa
Azure Functions in Action #CodePaLOUsa
 
Monitoring kubernetes with prometheus-operator
Monitoring kubernetes with prometheus-operatorMonitoring kubernetes with prometheus-operator
Monitoring kubernetes with prometheus-operator
 
Prometheus and Grafana
Prometheus and GrafanaPrometheus and Grafana
Prometheus and Grafana
 
Prometheus workshop
Prometheus workshopPrometheus workshop
Prometheus workshop
 
Hands-on monitoring with Prometheus
Hands-on monitoring with PrometheusHands-on monitoring with Prometheus
Hands-on monitoring with Prometheus
 
MuleSoft Meetup Roma - Processi di Automazione su CloudHub
MuleSoft Meetup Roma - Processi di Automazione su CloudHubMuleSoft Meetup Roma - Processi di Automazione su CloudHub
MuleSoft Meetup Roma - Processi di Automazione su CloudHub
 
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
IRJET- Real Time Monitoring of Servers with Prometheus and Grafana for High A...
 
Chef Analytics Webinar
Chef Analytics WebinarChef Analytics Webinar
Chef Analytics Webinar
 
2019 hashiconf seattle_consul_ioc
2019 hashiconf seattle_consul_ioc2019 hashiconf seattle_consul_ioc
2019 hashiconf seattle_consul_ioc
 
Advanced Orchestration & Automation
Advanced Orchestration & AutomationAdvanced Orchestration & Automation
Advanced Orchestration & Automation
 
Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a...
Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a...Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a...
Monitoring Node.js Microservices on CloudFoundry with Open Source Tools and a...
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
Sprint 71
Sprint 71Sprint 71
Sprint 71
 
Play with azure functions
Play with azure functionsPlay with azure functions
Play with azure functions
 
PyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPyCon India 2012: Celery Talk
PyCon India 2012: Celery Talk
 
Leveraging Analytics for DevOps
Leveraging Analytics for DevOpsLeveraging Analytics for DevOps
Leveraging Analytics for DevOps
 
New relic in action at trainline
New relic in action at trainlineNew relic in action at trainline
New relic in action at trainline
 
Itsummit2015 blizzard
Itsummit2015 blizzardItsummit2015 blizzard
Itsummit2015 blizzard
 
qTest <> TestProject Integration Webinar
qTest <> TestProject Integration WebinarqTest <> TestProject Integration Webinar
qTest <> TestProject Integration Webinar
 

Mehr von Alluxio, Inc.

AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAlluxio, Inc.
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAlluxio, Inc.
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAlluxio, Inc.
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...Alluxio, Inc.
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio, Inc.
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioAlluxio, Inc.
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingAlluxio, Inc.
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio, Inc.
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...Alluxio, Inc.
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionAlluxio, Inc.
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeAlluxio, Inc.
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudAlluxio, Inc.
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderAlluxio, Inc.
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionAlluxio, Inc.
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio, Inc.
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...Alluxio, Inc.
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAlluxio, Inc.
 

Mehr von Alluxio, Inc. (20)

AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-Cloud
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 

Kürzlich hochgeladen

The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionWave PLM
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Soroosh Khodami
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems ApproachNeo4j
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesNeo4j
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdfkalichargn70th171
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1KnowledgeSeed
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfVictor Lopez
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Gáspár Nagy
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024vaibhav130304
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfMehmet Akar
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...rajkumar669520
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfmbmh111980
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignNeo4j
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationHelp Desk Migration
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfDeskTrack
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationWave PLM
 
SQL Injection Introduction and Prevention
SQL Injection Introduction and PreventionSQL Injection Introduction and Prevention
SQL Injection Introduction and PreventionMohammed Fazuluddin
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfsteffenkarlsson2
 

Kürzlich hochgeladen (20)

The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 
5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
SQL Injection Introduction and Prevention
SQL Injection Introduction and PreventionSQL Injection Introduction and Prevention
SQL Injection Introduction and Prevention
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
 

Setting up monitoring system for Alluxio with Prometheus and Grafana in 10 minutes

  • 1. Setting up Monitoring System for Alluxio with Prometheus and Grafana in 10 minutes Pan Liu @ Tencent 2021.06.24
  • 2. • Pan Liu • Tencent • Presto on Alluxio
  • 3. Contents • Part 1. How Alluxio Metrics System Works • Part 2. How to Implement a custom Alluxio Sink • Part 3. How to Set Up Monitoring of Alluxio in 10 mins
  • 4. Part 1. How Alluxio Metrics System Works
  • 5. Alluxio Metrics System • Framework • Types of metrics • Name metrics • Flow of metrics from worker to master
  • 6. Alluxio Metrics System • Framework Source1 Metrics system Sink1 Source2 Sink2 … …
  • 7. Alluxio Metrics System • Two Types of Metrics: • Cluster Metrics : Aggegated from Workers and Clients • Process Metrics: Collected by Each Alluxio Process Master Cluster Metrics Process Metrics Worker Worker Worker Client Client
  • 8. Alluxio Metrics System • Metrics Name: • Master: Master.[metricName].[tag1].[tag2]... • Non-Master: [processType].[metricName].[tag1].[tag2]...[hostName] Master.GetFileInfoOps Worker.OpenExistingFile.User:user.UFS:hdfs:%2F%2F9%2E135%2E88%2E92:9000%2F.UFS_TYPE:hdfs.worker_host
  • 9. Alluxio Metrics System MetricMaster BlockMaster processWorkerMetric workerHeartbeat MetricsStore putWorkerMetrics RpcServer BlockWorker BlockMasterSync BlockMasterClient heartbeat RpcClient MasterProcess WorkerProcess heartbeat blockHeartbeat blockHeartbeat
  • 10. Part 2. How to Implement a custom Sink of Alluxio
  • 11. Alluxio Sinks • Passive VS. Active • Passive: service PULL • Active: report periodically PUSH Active Passive
  • 12. Alluxio Sinks Active: Report periodically Master_host:port/metrics/json ./logs/master.out Passive: Scratch from server
  • 14. Passive Sink : PrometheusMetricsServlet • getHandler() called by master or worker • Work as service • Metrics are available only when requested E.g. MasterProcess
  • 15. Sink Extension • Passive or Active ? • XmlSink as an example of Active type: Print metrics in XML format to a specified path. • Implement Sink interface • Construct a XmlReporter to report metrics in XML format • Config to enable XmlSink conf/metrics.properties
  • 16. Part 3. How to Set Up Monitoring for alluxio in 10 minutes
  • 17. Alluxio Web UI Monitoring Web UI metrics page Convenient but doesn’t work well sometimes…
  • 18. Alluxio Monitoring with Prometheus and Grafana How it works ? 1. Prometheus scrapes metrics from Alluxio servers and transforms to time series data 2. Grafana server get metrics using the PromQL 3. Grafana web UI displays metrics in dashboards What we need to do ? 1. Install and start Prometheus and Grafana Server 2. Add Alluxio Jobs to Prometheus 3. Download and import Grafana dashboard 4. Modify variables of dashboard template
  • 19. Alluxio Monitoring with Prometheus and Grafana prometheus.yml Version: - Prometheus Version: 2.22.2 - Grafana Version: 7.5.6 - Alluxio Version: 2.5.0-3 targets/local/masters/master_01.yml targets/local/workers/worker_01.yml Services: - Prometheus Server @ master:9090 - Grafana Server @ worker:3000 - Alluxio Master @ master - Alluxio Worker @ worker
  • 20.
  • 21. Alluxio Monitoring with Prometheus and Grafana
  • 22. Alluxio Monitoring with Prometheus and Grafana • Alluxio IO Key Metrics • Read Local & Read Remote & Alive workers … • Storage • Space Used & UFS Space Used … • Workers Blocks • Cached Blocks & Evicted Blocks … • Logical Operations • Mount Operations & File Pinned … • Alluxio Metadata Operations • Block Hearbeat Cost & Get status Cost … • AsyncCache Blocks & Operations • AsyncCacheSuccessedBlocks & AysncCacheFailedBlocks … • Master JVM Memory • Master Heap Memory & Total Memory …
  • 23. Alluxio Monitoring with Prometheus and Grafana • Alluxio docs: https://docs.alluxio.io/os/user/edge/en/operation/Metrics- System.html#grafana-web-ui-with-prometheus • Grafana dashboards: https://grafana.com/grafana/dashboards/13467