SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Statistics 101 for System
Administrators
EuroPython 2014, 22th
July - Berlin
Roberto Polli - roberto.polli@babel.it
Babel Srl P.zza S. Benedetto da Norcia, 33
00040, Pomezia (RM) - www.babel.it
22 July 2014
Roberto Polli - roberto.polli@babel.it
Who? What? Why?
• Using (and learning) elements of statistics with python.
• Roberto Polli - Community Manager @ Babel.it. Loves writing in C, Java
and Python. Red Hat Certified Engineer and Virtualization Administrator.
• Babel – Proud sponsor of this talk ;) Delivers large mail infrastructures
based on Open Source software for Italian ISP and PA. Contributes to
various FLOSS.
Intro Roberto Polli - roberto.polli@babel.it
Agenda
• A latency issue: what happened?
• Correlation in 30”
• Combining data
• Plotting time
• modules: scipy, matplotlib
Intro Roberto Polli - roberto.polli@babel.it
A Latency Issue
• Episodic network latency issues
• Logs traces: message size, #peers, retransimissions
• Do we need to scale? Was a peak problem?
Find a rapid answer with python!
Intro Roberto Polli - roberto.polli@babel.it
Basic statistics
Python provides basic statistics, like
from scipy.stats import mean # ¯x
from scipy.stats import std # σX
T = { ’ts’: (1, 2, 3, .., ),
’late’: (0.12, 6.31, 0.43, .. ),
’peers’: (2313, 2313, 2312, ..),...}
print([k, max(X), min(X), mean(X), std(X) ]
for k, X in T.items() ])
Intro Roberto Polli - roberto.polli@babel.it
Distributions
Data distribution - aka δX - shows event frequency.
# The fastest way to get a
# distribution is
from matplotlib import pyplot as plt
freq, bins, _ = plt.hist(T[’late’])
# plt.hist returns a
distribution = zip(bins, freq)
A ping rtt distribution
158.0 158.5 159.0 159.5 160.0 160.5 161.0 161.5 162.0
rtt in ms
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0 Ping RTT distribution
r
Intro Roberto Polli - roberto.polli@babel.it
Correlation I
Are two data series X, Y related?
Given ∆xi = xi − ¯x Mr. Pearson answered with this formula
ρ(X, Y ) = i ∆xi ∆yi
i ∆2xi ∆2yi
∈ [−1, +1] (1)
ρ identifies if the values of X and Y ‘move’ together on the same line.
Intro Roberto Polli - roberto.polli@babel.it
You must (scatter) plot
ρ doesn’t find non-linear correlation!
Intro Roberto Polli - roberto.polli@babel.it
Probability Indicator
Python scipy provides a correlation function, returning two values:
• the ρ correlation coefficient ∈ [−1, +1]
• the probability that such datasets are produced by uncorrelated systems
from scipy.stats.stats import pearsonr # our beloved ρ
a, b = range(0, 100), range(0, 400, 4)
c, d = [randint(0, 100) for x in a], [randint(0, 100) for x in a]
correlation, probability = pearsonr(a,b) # ρ = 1.000, p = 0.000
correlation, probability = pearsonr(c,d) # ρ = −0.041, p = 0.683
Intro Roberto Polli - roberto.polli@babel.it
Combinations
itertools is a gold pot of useful tools.
from itertools import combinations
# returns all possible combination of
# items grouped by N at a time
items = "heart spades clubs diamonds".split()
combinations(items, 2)
# And now all possible combinations between
# dataset fields!
combinations(T, 2)
Combinating 4 suites,
2 at a time.
♥♠
♥♣
♥♦
♠♣
♠♦
♣♦
Intro Roberto Polli - roberto.polli@babel.it
Netfishing correlation I
# Now we have all the ingredients for
# net-fishing relations between our data!
for (k1,v1), (k2,v2) in combinations(T.items(), 2):
# Look for correlations between every dataset!
corr, prob = pearsonr(v1, v2)
if corr > .6:
print("Series", k1, k2, "can be correlated", corr)
elif prob < 0.05:
print("Series", k1, k2, "probability lower than 5%%", prob)
Intro Roberto Polli - roberto.polli@babel.it
Netfishing correlation II
Now plot all combinations: there’s more to meet with eyes!
# Plot everything, and insert data in plots!
for (k1,v1), (k2,v2) in combinations(T.items(), 2):
corr, prob = pearsonr(v1, v2)
plt.scatter(v1, v2)
# 3 digit precision on title
plt.title("R={:0.3f} P={:0.3f}".format(corr, prob))
plt.xlabel(k1); plt.ylabel(k2)
# save and close the plot
plt.savefig("{}_{}.png".format(k1, k2)); plt.close()
Intro Roberto Polli - roberto.polli@babel.it
Plotting Correlation
Intro Roberto Polli - roberto.polli@babel.it
Color is the 3rd dimension
from itertools import cycle
colors = cycle("rgb") # use more than 3 colors!
labels = cycle("morning afternoon night".split())
size = datalen / 3 # 3 colors, right?
for (k1,v1), (k2,v2) in combinations(T.items(), 2):
[ plt.scatter( t1[i:i+size] , t2[i:i+size],
color=next(colors),
label=next(labels)
) for i in range(0, datalen, size) ]
# set title, save plot & co
Intro Roberto Polli - roberto.polli@babel.it
Example Correlation
Intro Roberto Polli - roberto.polli@babel.it
Latency Solution
• Latency wasn’t related to packet size or system throughput
• Errors were not related to packet size
• Discovered system throughput
Intro Roberto Polli - roberto.polli@babel.it
Wrap Up
• Use statistics: it’s easy
• Don’t use ρ to exclude relations
• Plot, Plot, Plot
• Continue collecting results
Intro Roberto Polli - roberto.polli@babel.it
That’s all folks!
Thank you for the attention!
Roberto Polli - roberto.polli@babel.it
Intro Roberto Polli - roberto.polli@babel.it

Weitere ähnliche Inhalte

Ähnlich wie Statistics 101 for System Administrators

Magical float repr
Magical float reprMagical float repr
Magical float reprdickinsm
 
SociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisSociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisDataWorks Summit
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientistsaeberspaecher
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?Frank van Harmelen
 
Turbocharge your data science with python and r
Turbocharge your data science with python and rTurbocharge your data science with python and r
Turbocharge your data science with python and rKelli-Jean Chun
 
Class 26: Objectifying Objects
Class 26: Objectifying ObjectsClass 26: Objectifying Objects
Class 26: Objectifying ObjectsDavid Evans
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207Jay Coskey
 
Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Simplilearn
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in PythonMarc Garcia
 
Python For Machine Learning
Python For Machine LearningPython For Machine Learning
Python For Machine LearningYounesCharfaoui
 

Ähnlich wie Statistics 101 for System Administrators (20)

Magical float repr
Magical float reprMagical float repr
Magical float repr
 
Python slide
Python slidePython slide
Python slide
 
Relational Calculus
Relational CalculusRelational Calculus
Relational Calculus
 
CPPDS Slide.pdf
CPPDS Slide.pdfCPPDS Slide.pdf
CPPDS Slide.pdf
 
SociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisSociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data Analysis
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientists
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
 
Ibmr 2014
Ibmr 2014Ibmr 2014
Ibmr 2014
 
Turbocharge your data science with python and r
Turbocharge your data science with python and rTurbocharge your data science with python and r
Turbocharge your data science with python and r
 
Class 26: Objectifying Objects
Class 26: Objectifying ObjectsClass 26: Objectifying Objects
Class 26: Objectifying Objects
 
Python 3.6 Features 20161207
Python 3.6 Features 20161207Python 3.6 Features 20161207
Python 3.6 Features 20161207
 
Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...Python Interview Questions | Python Interview Questions And Answers | Python ...
Python Interview Questions | Python Interview Questions And Answers | Python ...
 
Programming with Python
Programming with PythonProgramming with Python
Programming with Python
 
Have you met Julia?
Have you met Julia?Have you met Julia?
Have you met Julia?
 
Data visualization in Python
Data visualization in PythonData visualization in Python
Data visualization in Python
 
11 Python CBSE Syllabus
11    Python CBSE Syllabus11    Python CBSE Syllabus
11 Python CBSE Syllabus
 
11 syllabus
11    syllabus11    syllabus
11 syllabus
 
Python For Machine Learning
Python For Machine LearningPython For Machine Learning
Python For Machine Learning
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
 

Mehr von Roberto Polli

Ratelimit Headers for HTTP
Ratelimit Headers for HTTPRatelimit Headers for HTTP
Ratelimit Headers for HTTPRoberto Polli
 
Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?Roberto Polli
 
Docker - virtualizzazione leggera
Docker - virtualizzazione leggeraDocker - virtualizzazione leggera
Docker - virtualizzazione leggeraRoberto Polli
 
Just one-shade-of-openstack
Just one-shade-of-openstackJust one-shade-of-openstack
Just one-shade-of-openstackRoberto Polli
 
Test Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetestTest Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetestRoberto Polli
 
Tox as project descriptor.
Tox as project descriptor.Tox as project descriptor.
Tox as project descriptor.Roberto Polli
 
Python for System Administrators
Python for System AdministratorsPython for System Administrators
Python for System AdministratorsRoberto Polli
 
Scaling mysql with python (and Docker).
Scaling mysql with python (and Docker).Scaling mysql with python (and Docker).
Scaling mysql with python (and Docker).Roberto Polli
 
Orchestrating MySQL with Python and Docker
Orchestrating MySQL with Python and DockerOrchestrating MySQL with Python and Docker
Orchestrating MySQL with Python and DockerRoberto Polli
 
Will iPython replace bash?
Will iPython replace bash?Will iPython replace bash?
Will iPython replace bash?Roberto Polli
 
Pysmbc Python C Modules are Easy
Pysmbc Python C Modules are EasyPysmbc Python C Modules are Easy
Pysmbc Python C Modules are EasyRoberto Polli
 
Git gestione comoda del repository
Git   gestione comoda del repositoryGit   gestione comoda del repository
Git gestione comoda del repositoryRoberto Polli
 
Testing with my sql embedded
Testing with my sql embeddedTesting with my sql embedded
Testing with my sql embeddedRoberto Polli
 
Servizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open sourceServizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open sourceRoberto Polli
 
Funambol al Linux Day 2009
Funambol al Linux Day 2009Funambol al Linux Day 2009
Funambol al Linux Day 2009Roberto Polli
 
ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1Roberto Polli
 
Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)Roberto Polli
 
Integrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAPIntegrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAPRoberto Polli
 

Mehr von Roberto Polli (20)

Ratelimit Headers for HTTP
Ratelimit Headers for HTTPRatelimit Headers for HTTP
Ratelimit Headers for HTTP
 
Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?Interoperability rules for an European API ecosystem: do we still need SOAP?
Interoperability rules for an European API ecosystem: do we still need SOAP?
 
Docker - virtualizzazione leggera
Docker - virtualizzazione leggeraDocker - virtualizzazione leggera
Docker - virtualizzazione leggera
 
Just one-shade-of-openstack
Just one-shade-of-openstackJust one-shade-of-openstack
Just one-shade-of-openstack
 
Test Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetestTest Drive Deployment with python and nosetest
Test Drive Deployment with python and nosetest
 
Tox as project descriptor.
Tox as project descriptor.Tox as project descriptor.
Tox as project descriptor.
 
Python for System Administrators
Python for System AdministratorsPython for System Administrators
Python for System Administrators
 
Scaling mysql with python (and Docker).
Scaling mysql with python (and Docker).Scaling mysql with python (and Docker).
Scaling mysql with python (and Docker).
 
Orchestrating MySQL with Python and Docker
Orchestrating MySQL with Python and DockerOrchestrating MySQL with Python and Docker
Orchestrating MySQL with Python and Docker
 
Will iPython replace bash?
Will iPython replace bash?Will iPython replace bash?
Will iPython replace bash?
 
Pysmbc Python C Modules are Easy
Pysmbc Python C Modules are EasyPysmbc Python C Modules are Easy
Pysmbc Python C Modules are Easy
 
Git gestione comoda del repository
Git   gestione comoda del repositoryGit   gestione comoda del repository
Git gestione comoda del repository
 
Testing with my sql embedded
Testing with my sql embeddedTesting with my sql embedded
Testing with my sql embedded
 
Servizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open sourceServizi di messaging & collaboration in mobilità: Il panorama open source
Servizi di messaging & collaboration in mobilità: Il panorama open source
 
Funambol al Linux Day 2009
Funambol al Linux Day 2009Funambol al Linux Day 2009
Funambol al Linux Day 2009
 
ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1ICalendar RFC2445 - draft1
ICalendar RFC2445 - draft1
 
Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)Presenting CalDAV (draft 1)
Presenting CalDAV (draft 1)
 
Integrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAPIntegrating Funambol with CalDAV and LDAP
Integrating Funambol with CalDAV and LDAP
 
ultimo-miglio-v3
ultimo-miglio-v3ultimo-miglio-v3
ultimo-miglio-v3
 
Ultimo Miglio v2
Ultimo Miglio v2Ultimo Miglio v2
Ultimo Miglio v2
 

Kürzlich hochgeladen

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 

Kürzlich hochgeladen (20)

SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 

Statistics 101 for System Administrators

  • 1. Statistics 101 for System Administrators EuroPython 2014, 22th July - Berlin Roberto Polli - roberto.polli@babel.it Babel Srl P.zza S. Benedetto da Norcia, 33 00040, Pomezia (RM) - www.babel.it 22 July 2014 Roberto Polli - roberto.polli@babel.it
  • 2. Who? What? Why? • Using (and learning) elements of statistics with python. • Roberto Polli - Community Manager @ Babel.it. Loves writing in C, Java and Python. Red Hat Certified Engineer and Virtualization Administrator. • Babel – Proud sponsor of this talk ;) Delivers large mail infrastructures based on Open Source software for Italian ISP and PA. Contributes to various FLOSS. Intro Roberto Polli - roberto.polli@babel.it
  • 3. Agenda • A latency issue: what happened? • Correlation in 30” • Combining data • Plotting time • modules: scipy, matplotlib Intro Roberto Polli - roberto.polli@babel.it
  • 4. A Latency Issue • Episodic network latency issues • Logs traces: message size, #peers, retransimissions • Do we need to scale? Was a peak problem? Find a rapid answer with python! Intro Roberto Polli - roberto.polli@babel.it
  • 5. Basic statistics Python provides basic statistics, like from scipy.stats import mean # ¯x from scipy.stats import std # σX T = { ’ts’: (1, 2, 3, .., ), ’late’: (0.12, 6.31, 0.43, .. ), ’peers’: (2313, 2313, 2312, ..),...} print([k, max(X), min(X), mean(X), std(X) ] for k, X in T.items() ]) Intro Roberto Polli - roberto.polli@babel.it
  • 6. Distributions Data distribution - aka δX - shows event frequency. # The fastest way to get a # distribution is from matplotlib import pyplot as plt freq, bins, _ = plt.hist(T[’late’]) # plt.hist returns a distribution = zip(bins, freq) A ping rtt distribution 158.0 158.5 159.0 159.5 160.0 160.5 161.0 161.5 162.0 rtt in ms 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Ping RTT distribution r Intro Roberto Polli - roberto.polli@babel.it
  • 7. Correlation I Are two data series X, Y related? Given ∆xi = xi − ¯x Mr. Pearson answered with this formula ρ(X, Y ) = i ∆xi ∆yi i ∆2xi ∆2yi ∈ [−1, +1] (1) ρ identifies if the values of X and Y ‘move’ together on the same line. Intro Roberto Polli - roberto.polli@babel.it
  • 8. You must (scatter) plot ρ doesn’t find non-linear correlation! Intro Roberto Polli - roberto.polli@babel.it
  • 9. Probability Indicator Python scipy provides a correlation function, returning two values: • the ρ correlation coefficient ∈ [−1, +1] • the probability that such datasets are produced by uncorrelated systems from scipy.stats.stats import pearsonr # our beloved ρ a, b = range(0, 100), range(0, 400, 4) c, d = [randint(0, 100) for x in a], [randint(0, 100) for x in a] correlation, probability = pearsonr(a,b) # ρ = 1.000, p = 0.000 correlation, probability = pearsonr(c,d) # ρ = −0.041, p = 0.683 Intro Roberto Polli - roberto.polli@babel.it
  • 10. Combinations itertools is a gold pot of useful tools. from itertools import combinations # returns all possible combination of # items grouped by N at a time items = "heart spades clubs diamonds".split() combinations(items, 2) # And now all possible combinations between # dataset fields! combinations(T, 2) Combinating 4 suites, 2 at a time. ♥♠ ♥♣ ♥♦ ♠♣ ♠♦ ♣♦ Intro Roberto Polli - roberto.polli@babel.it
  • 11. Netfishing correlation I # Now we have all the ingredients for # net-fishing relations between our data! for (k1,v1), (k2,v2) in combinations(T.items(), 2): # Look for correlations between every dataset! corr, prob = pearsonr(v1, v2) if corr > .6: print("Series", k1, k2, "can be correlated", corr) elif prob < 0.05: print("Series", k1, k2, "probability lower than 5%%", prob) Intro Roberto Polli - roberto.polli@babel.it
  • 12. Netfishing correlation II Now plot all combinations: there’s more to meet with eyes! # Plot everything, and insert data in plots! for (k1,v1), (k2,v2) in combinations(T.items(), 2): corr, prob = pearsonr(v1, v2) plt.scatter(v1, v2) # 3 digit precision on title plt.title("R={:0.3f} P={:0.3f}".format(corr, prob)) plt.xlabel(k1); plt.ylabel(k2) # save and close the plot plt.savefig("{}_{}.png".format(k1, k2)); plt.close() Intro Roberto Polli - roberto.polli@babel.it
  • 13. Plotting Correlation Intro Roberto Polli - roberto.polli@babel.it
  • 14. Color is the 3rd dimension from itertools import cycle colors = cycle("rgb") # use more than 3 colors! labels = cycle("morning afternoon night".split()) size = datalen / 3 # 3 colors, right? for (k1,v1), (k2,v2) in combinations(T.items(), 2): [ plt.scatter( t1[i:i+size] , t2[i:i+size], color=next(colors), label=next(labels) ) for i in range(0, datalen, size) ] # set title, save plot & co Intro Roberto Polli - roberto.polli@babel.it
  • 15. Example Correlation Intro Roberto Polli - roberto.polli@babel.it
  • 16. Latency Solution • Latency wasn’t related to packet size or system throughput • Errors were not related to packet size • Discovered system throughput Intro Roberto Polli - roberto.polli@babel.it
  • 17. Wrap Up • Use statistics: it’s easy • Don’t use ρ to exclude relations • Plot, Plot, Plot • Continue collecting results Intro Roberto Polli - roberto.polli@babel.it
  • 18. That’s all folks! Thank you for the attention! Roberto Polli - roberto.polli@babel.it Intro Roberto Polli - roberto.polli@babel.it