Paulo Canas Rodrigues
Research Director
CAST (Centre for Applied Statistics and Data Analytics) University of Tampere
The role of Statistics in the Internet of Things
Mindtrek 2016
What Are The Drone Anti-jamming Systems Technology?
Paulo Canas Rodrigues - The role of Statistics in the Internet of Things - Mindtrek 2016
1. 1/20
The role of Statistics
in the
Internet of Things
Paulo Canas Rodrigues
Research Director
CAST (Centre for Applied Statistics and Data Analytics)
University of Tampere
MINDTREK
October 18, 2016
2. 2/20
CAST – Vision and Mission
• With the increase of (big and unstructured) data collected every day
in many disciplines, appropriate quantitative methodology tools are
needed for both evaluating the research hypotheses timely, and for
reducing the risks in the decision-making procedure
• MISSION: To promote the understanding and good practice of statistics and
data analytics within UTA/Tampere3 and in a global scene
• VISION: To become recognized as a strong partner to researchers and to
industry in statistics and data analytics, locally and globally, especially in
applied sciences
• MEMBERS: wide range of expertise, from methodological statistics to
biostatistics, machine learning, data mining, data visualization, time series,
computational statistics, etc.
3. 3/20
CAST – R&D services
• CAST provides diverse expert quantitative research R&D services
for different purposes starting from project planning to data analysis
and results reporting. These services range from tailored and
complex statistical research to traditional survey research, for
private and public sector.
• For companies: operation and maintenance management visualization,
performance prediction and problem case diagnostics, customer and
product content insights, sales predictions, marketing recommendations,
operation evaluation and optimization, data analytics expertise in
developing digital services
• For public organizations: expertise in quantitative methodology part of a
research and development project, quantitative analytics for operation
evaluation and rationalization, data analytics expertise in developing digital
public services
4. 4/20
CAST – R&D services
• Different kinds of methodologies are available for graphical
visualization, data mining, analysis, modeling and prediction
including but not limited to:
• big data analytics
• Internet of Things
• machine learning
• time series modeling, analysis and prediction
• Bayesian modeling
• data and text mining
• interactive analytics
• survey design and analysis.
• training
• More details in http://www.uta.fi/cast/
7. 7/20
How much is data driving decisions today?
• recently surveyed senior business stakeholders across
Europe to ask about their attitudes about data and analytics. The
findings of the survey were summarized in the Business Grammar
Research Report.
• Here’s a glimpse of some of the key findings:
• 96% use data and analytics to inform business decisions today
• 59% of European business leaders consider data and analytics
savviness to be one of the two most important skills for new employees
• Data and analytics skills are now considered more important than
industry experience or a second language
8. 8/20
Unstructured data
June 8, 2011 (Joe McKendrick): Unstructured
data: the elephant in the Big Data room
• [… Many organizations are becoming overwhelmed with the
volumes of unstructured information -- audio, video, graphics, social
media messages -- that falls outside the purview of their "traditional"
databases. Organizations that do get their arms around this data will
gain significant competitive edge…]
• [… 91% (in the survey) say unstructured information already lives in
their organizations, but many aren’t sure what to do about it…]
9. 9/20
• But, what to do with all this data?
• How to transform data in information
for decision making?
10. 10/20
Statistics: A world of possibilities
Marie Davidian
(North Carolina
State University)
Thomas Louis
(Johns Hopkins
Bloomberg School
of Public Health)
Statistics is the science of learning
from data, and of measuring,
controlling, and communicating
uncertainty; and it thereby
provides the navigation essential
for controlling the course of
scientific and societal advances.
11. 11/20
IoT – A time series challenge
• The world is becoming more and more instrumented,
interconnected and intelligent, resulting in mountains of newly
generated data.
• With storage costs coming down significantly, companies now
want to leverage this instrument-generated data (including
meter, temperature and all types of sensor data over time) for
conducting analysis.
• Among all the types of big data, data from sensors is the most
widespread and is referred to as time-series data.
• So many records, so little time!
Source: http://www.ibmbigdatahub.com/blog/internet-things-time-series-data-challenge/
13. 13/20
Smartphone sensor data – How to identify breakpoints?
Source: http://beautifuldata.net/tag/sensor-data/
Data: accelerometer smartphone that Datarella provided in its Data Fiction competition.
The dataset shows the acceleration along the three axes of the smartphone:
x – sideways acceleration of the device
y – forward and backward acceleration of the device
z – acceleration up and down
So, for example, the activity of taking the smartphone out of your pocket and reading a
tweet can look the following way:
• y acceleration – the smartphone had been in the pocket top down and is now taken out of the pocket
• z and y acceleration – turning the smartphone so that is horizontal
• x acceleration – moving the smartphone from the left to the middle of your body
• z acceleration – lifting the smartphone so you can read the fine print of the tweet
14. 14/20
Smartphone sensor data – How to identify breakpoints?
Source: http://beautifuldata.net/tag/sensor-data/
This is the sensor data for one user on one day:
15. 15/20
Smartphone sensor data – How to identify breakpoints?
Source: http://beautifuldata.net/tag/sensor-data/
Let’s zoom in to the period between 12:32 and 13:00:
• In the beginning, the smartphone seems to lie flat on a horizontal surface – the sensor is reading a value
of around 9.8 in positive direction – this means, the gravitational force only effects the z axis and not
the x and y axes.
• But then things change and after a few movements (our change points) the last observation has the
smartphone on a position where the x axis has around -9.6 acceleration, i.e. the smartphone is being
held in landscape orientation pointing to the right.
16. 16/20
Smartphone sensor data – How to identify breakpoints?
Source: http://beautifuldata.net/tag/sensor-data/
• This quick analysis of the acceleration in the x direction gives us 4 change points, where the acceleration
suddenly changes.
17. 17/20
Anomaly Detection with Wikipedia Page View Data
Source: https://www.r-bloggers.com/anomaly-detection-with-wikipedia-page-view-data/
• We choose an interesting Wikipedia page and download 90 days of PageView statistics
• A first plot shows this pattern (for the USA Wikipedia page)
• Now, let’s look for anomalies using the R package AnomalyDetection
18. 18/20
Anomaly Detection with Wikipedia Page View Data
Source: https://www.r-bloggers.com/anomaly-detection-with-wikipedia-page-view-data/
• In our case, the algorithm has discovered 4 anomalies.
• The first on October 30 2014 being an exceptionally high value overall
• The second is a very high Sunday
• The third a high value overall
• The forth a high Saturday (normally, this day is also quite weak).
19. 19/20
Concluding remarks
• The Internet of Things brought a new way of looking at
the world and, with it, mountains of newly generated data
have been collected.
• With decreasing costs of data storage, companies are
now looking at strategies to analyze and to take proper
advantage of the deluge of data they have been storing.
• How to do that? Statistics can help!