SlideShare ist ein Scribd-Unternehmen logo
1 von 80
1
Intelligent Data Processing for
the Internet of Things
Payam Barnaghi
Institute for Communication Systems (ICS)
University of Surrey
Guildford, United Kingdom
International “IoT 360 Summer School″
October 2015 – Rome, Italy
Wireless Sensor (and Actuator)
Networks
Sink
node Gateway
Core network
e.g. InternetGateway
End-user
Computer services
- The networks typically run Low Power Devices
- Consist of one or more sensors, could be different type of sensors (or actuators)
Operating
Systems?
Services?
Protocols?
Protocols?
In-node Data
Processing
Data
Aggregation/
Fusion
Inference/
Processing of
IoT data
3
Key characteristics of IoT devices
−Often inexpensive sensors (actuators) equipped with a radio
transceiver for various applications, typically low data rate ~
10-250 kbps.
−Deployed in large numbers
−The sensors should coordinate to perform the desired task.
−The acquired information (periodic or event-based) is
reported back to the information processing centre (or some
cases in-network processing is required)
−Solutions are often application-dependent.
3
4
Beyond conventional sensors
− Human as a sensor (citizen sensors)
− e.g. tweeting real world data and/or events
− Software sensors
− e.g. Software agents/services generating/representing
data
Road block, A3
Road block, A3
Suggest a different route
5P. Barnaghi et al., "Digital Technology Adoption in the Smart Built Environment", IET Sector Technical Briefing, The Institution
of Engineering and Technology (IET), I. Borthwick (editor), March 2015.
Internet of Things: The story so far
P. Barnaghi, A. Sheth, “The Internet of Things: The story so far”, IEEE IoT Newsletter, September 2014.
IoT data- challenges
− Multi-modal, distributed and heterogeneous
− Noisy and incomplete
− Time and location dependent
− Dynamic and varies in quality
− Crowdsourced data can be unreliable
− Requires (near-) real-time analysis
− Privacy and security are important issues
− Data can be biased- we need to know our data!
7
P. Barnaghi, A. Sheth, C. Henson, "From data to actionable knowledge: Big Data Challenges in the Web of Things,"
IEEE Intelligent Systems, vol.28 , issue.6, Dec 2013.
Data is not what we want or is it?
What we need are insights
and actionable-information.
Diffusion of innovation
image source: Wikipedia
IoT
Problem #1
Data: We seem to have lots of it…
Real World Data: it is always difficult to get
(silos, format, privacy, business interests or lack
of interest!...)
Problem #2
Data: interoperability and metadata
frameworks…
Real World Data: there are solutions for service
based (RESTful) access, meta-data/semantic
representation frameworks (e.g. W3C SSN,
HyperCat,…) but none of them are still widely
adapted.
Problem #3
Data: quality, reliability…
Real World Data: data can be noisy, crowed
source data can be inaccurate, contradictory,
delay in accessing/processing the data…
Problem #4
Data: having too much data and using analytics
tools alone won’t solve the problem…
Real World Data: in addition to the HPC issues,
we need new methods/solutions that can
provide real-time analysis of dynamic, variable
quality and multi-modal streams…
Problem #5
Data: abstraction, discovering the associations…
Real World Data: co-occurrence vs. causation;
we need hypothesis, background knowledge,…
After all data is not what we are really after…
We need more linked open data…
(near) real-time
linked open data
Streams
Sometimes it’s even better if we have:
(near) real-time
linked open data
Streams
+
meta-data (semantic annotations)
+
Adaptable and scalable analytics tools
+
Sufficient background knowledge
or even better than that if we have:
IoT Data
19
20
“Each single data item could be important.”
“Relying merely on data from sources that are
unevenly distributed, without considering
background information or social context, can
lead to imbalanced interpretations and
decisions.”
?
Current focus on Big Data
− Emphasis on power of data and data mining solutions
− Technology solutions to handle large volumes of data; e.g.
Hadoop, NoSQL, Graph Databases, …
− Trying to find patterns and trends from large volumes of
data…
Myths About Big Data
− Big Data is only about massive data volume
− Big Data means Hadoop
− Big Data means unstructured data
− If we have enough data we can draw conclusions (enough
here often means massive amounts)
− NoSQL means No SQL
− It is about increasing computational power and taking more
data and running data mining algorithms.
22
Some of the items are adapted from: Brain Gentile, http://mashable.com/2012/06/19/big-data-myths/
What happens if we only focus on data
− Number of burgers consumed per day.
− Number of cats outside.
− Number of people checking their facebook account.
− What insight would you draw?
23
It is also important to note what type of
problems we expect to solve.
Back to the future
25
26Source LAT Times, http://documents.latimes.com/la-2013/
A smart City example
Future cities: A view from 1998
27Source: http://robertluisrabello.com/denial/traffic-in-la/#gallery[default]/0/
Source: wikipedia
Back to the Future: 2013
Common problems
28
Guildford, Surrey
29
Data alone is not enough
− Domain knowledge
− Machine interpretable meta-data
− Delivery, sharing and representation services
− Query, discovery, aggregation services
− Publish, subscribe, notification, and access interfaces/services
− More open solutions for innovation and citizen participation
− Efficient feedback and control mechanisms
− Social network and social system analysis
− In cities, interactions with people and social systems is the
key.
30
Some of the technical challenges
− Discovery: finding appropriate device and data sources
− Access: Availability and (open) access to data resources and
data
− Search: querying for data
− Integration: dealing with heterogeneous devices, networks
and data
− Large-scale data mining, adaptable learning and efficient
computing and processing
− Interpretation: translating data to knowledge that can be used
by people and applications
− Scalability: dealing with large numbers of devices and a myriad
of data and the computational complexity of interpreting the
data.
31
Accessing IoT data
− Publish/Subscribe (long-term/short-term)
− Ad-hoc query
− The typical types of data request for sensory data:
− Query based on
− ID (resource/service) – for known resources
− Location
− Type
− Time – requests for freshness data or historical data;
− One of the above + a range [+ Unit of Measurement]
− Type/Location/Time + A combination of Quality of Information attributes
− An entity of interest (a feature of an entity on interest)
− Complex Data Types (e.g. pollution data could be a combination of different types)
32
IoT Data in the Cloud
Image courtesy: http://images.mathrubhumi.com
http://www.anacostiaws.org/userfiles/image/Blog-Photos/river2.jpg
IoT Data Processing
WSN
WSN
WSN
WSN
WSN
Network-enabled
Devices
Network-enabled
Devices
Network
services/storage
and processing
units Data/service
access at
application level
Data collections
and processing
within the
networks
Data Discovery
Service/
Resource
Discovery
34
In-network processing
− Mobile Ad-hoc Networks can be seen as a set of nodes that deliver bits
from one end to the other;
− WSNs, on the other end, are expected to provide information, not
necessarily original bits
− Gives additional options
− e.g., manipulate or process the data in the network
− Main example: aggregation
− Applying aggregation functions to a obtain an average value of measurement
data
− Typical functions: minimum, maximum, average, sum, …
− Not amenable functions: median
Source: Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks
Holger Karl, Andreas Willig, chapter 3, Wiley, 2005 .
35
In-network processing
− Depending on application, more sophisticated processing of
data can take place within the network
− Example edge detection: locally exchange raw data with neighboring
nodes, compute edges, only communicate edge description to far
away data sinks.
− Example tracking/angle detection of signal source: Conceive of sensor
nodes as a distributed microphone array, use it to compute the angle
of a single source, only communicate this angle, not all the raw data.
− Exploit temporal and spatial correlation
− Observed signals might vary only slowly in time; so no need to
transmit all data at full rate all the time.
− Signals of neighboring nodes are often quite similar; only try to
transmit differences (details a bit complicated, see later).
Source: Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks
Holger Karl, Andreas Willig, chapter 3, Wiley, 2005 .
36
Data-centric networking
− In typical networks (including ad-hoc networks), network
transactions are addressed to the identities of specific nodes
− A “node-centric” or “address-centric” networking paradigm
− In a redundantly deployed sensor networks, specific source of
an event, alarm, etc. might not be important
− Redundancy: e.g., several nodes can observe the same area
− Thus: focus networking transactions on the data directly
instead of their senders and transmitters ! data-centric
networking
− Specially this idea is reinforced by the fact that we might have multiple
sources to provide information and observations form the same or
similar areas.
− Principal design change
Source: Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks
Holger Karl, Andreas Willig, chapter 3, Wiley, 2005 .
Data Aggregation
− Computing a smaller representation of a number of data
items (or messages) that is extracted from all the individual
data items.
− For example computing min/max or mean of sensor data.
− More advance aggregation solutions could use approximation
techniques to transform high-dimensionality data to lower-
dimensionality abstractions/representations.
− The aggregated data can be smaller in size, represent
patterns/abstractions; so in multi-hop networks, nodes can
receive data form other node and aggregate them before
forwarding them to a sink or gateway.
− Or the aggregation can happen on a sink/gateway node.
Aggregation example
− Reduce number of transmitted bits/packets by applying an aggregation
function in the network
1
1
3
1
1
6
1
1
1
1
1
1
Source: Holger Karl, Andreas Willig, Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks, chapter 3, Wiley, 2005 .
Efficacy of an aggregation mechanism
− Accuracy: difference between the resulting value or representation and
the original data
− Some solutions can be lossless or lossly depending on the applied
techniques.
− Completeness: the percentage of all the data items that are included in
the computation of the aggregated data.
− Latency: delay time to compute and report the aggregated data
− Computation foot-print; complexity;
− Overhead: the main advantage of the aggregation is reducing the size of
the data representation;
− Aggregation functions can trade-off between accuracy, latency and overhead;
− Aggregation should happen close to the source.
Publish/Subscribe
− Achieved by publish/subscribe paradigm
− Idea: Entities can publish data under certain names
− Entities can subscribe to updates of such named data
− Conceptually: Implemented by a software bus
− Software bus stores subscriptions, published data; names used as filters;
subscribers notified when values of named data changes
Software bus
Publisher 1 Publisher 2
Subscriber 1 Subscriber 2 Subscriber 3
− Variations
− Topic-based P/S –
inflexible
− Content-based P/S –
use general predicates
over named data
Source: Holger Karl, Andreas Willig, Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks, chapter 12, Wiley, 2005 .
MQTT Pub/Sub Protocol
− MQ Telemetry Transport (MQTT) is a lightweight broker-based
publish/subscribe messaging protocol.
− MQTT is designed to be open, simple, lightweight and easy to implement.
− These characteristics make MQTT ideal for use in constrained
environments, for example in IoT.
−Where the network is expensive, has low bandwidth or is
unreliable
−When run on an embedded device with limited processor or
memory resources;
− A small transport overhead (the fixed-length header is just 2 bytes), and
protocol exchanges minimised to reduce network traffic
− MQTT was developed by Andy Stanford-Clark of IBM, and Arlen Nipper
of Cirrus Link Solutions.
Source: MQTT V3.1 Protocol Specification, IBM, http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html
MQTT
− It supports publish/subscribe message pattern to provide one-to-many message
distribution and decoupling of applications
− A messaging transport that is agnostic to the content of the payload
− The use of TCP/IP to provide basic network connectivity
− Three qualities of service for message delivery:
− "At most once", where messages are delivered according to the best efforts of
the underlying TCP/IP network. Message loss or duplication can occur.
− This level could be used, for example, with ambient sensor data where it
does not matter if an individual reading is lost as the next one will be
published soon after.
− "At least once", where messages are assured to arrive but duplicates may
occur.
− "Exactly once", where message are assured to arrive exactly once. This level
could be used, for example, with billing systems where duplicate or lost
messages could lead to incorrect charges being applied.
Source: MQTT V3.1 Protocol Specification, IBM, http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html
MQTT Message Format
− The message header for each MQTT command message contains a fixed header.
− Some messages also require a variable header and a payload.
− The format for each part of the message header:
Source: MQTT V3.1 Protocol Specification, IBM, http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html
— DUP: Duplicate delivery
— QoS: Quality of Service
— RETAIN: RETAIN flag
—This flag is only used on PUBLISH messages. When a client sends a PUBLISH
to a server, if the Retain flag is set (1), the server should hold on to the message
after it has been delivered to the current subscribers.
—This allows new subscribers to instantly receive data with the retained, or
Last Known Good, value.
Sensor Data as time-series data
− The sensor data (or IoT data in general) can be seen as time-
series data.
− A sensor stream refers to a source that provide sensor data
over time.
− The data can be sampled/collected at a rate (can be also
variable) and is sent as a series of values.
− Over time, there will be a large number of data items
collected.
− Using time-series processing techniques can help to reduce
the size of the data that is communicated;
−Let’s remember, communication can consume more
energy than communication;
Sensor Data as time-series data
− Different representation method that introduced for time-series data can
be applied.
− The goal is to reduce the dimensionality (and size) of the data, to find
patterns, detect anomalies, to query similar data;
− Dimensionality reduction techniques transform a data series with n items
to a representation with w items where w < n.
− This functions are often lossy in comparison with solutions like normal
compression that preserve all the data.
− One of these techniques is called Symbolic Aggregation Approximation
(SAX).
− SAX was originally proposed for symbolic representation of time-series
data; it can be also used for symbolic representation of time-series sensor
measurements.
− The computational foot-print of SAX is low; so it can be also used as a an
in-network processing technique.
46
In-network processing
Using Symbolic Aggregate Approximation (SAX)
SAX Pattern (blue) with word length of 20 and a vocabulary of 10 symbols
over the original sensor time-series data (green)
Source: P. Barnaghi, F. Ganz, C. Henson, A. Sheth, "Computing Perception from Sensor Data",
in Proc. of the IEEE Sensors 2012, Oct. 2012.
fggfffhfffffgjhghfff
jfhiggfffhfffffgjhgi
fggfffhfffffgjhghfff
Symbolic Aggregate Approximation (SAX)
− SAX transforms time-series data into symbolic string representations.
− Symbolic Aggregate approXimation was proposed by Jessica Lin et al at the
University of California –Riverside;
− http://www.cs.ucr.edu/~eamonn/SAX.htm .
− It extends Piecewise Aggregate Approximation (PAA) symbolic representation
approach.
− SAX algorithm is interesting for in-network processing in WSN because of its
simplicity and low computational complexity.
− SAX provides reasonable sensitivity and selectivity in representing the data.
− The use of a symbolic representation makes it possible to use several other
algorithms and techniques to process/utilise SAX representations such as hashing,
pattern matching, suffix trees etc.
Processing Steps in SAX
− SAX transforms a time-series X of length n into the string of
arbitrary length, where typically, using an alphabet A of size a
> 2.
− The SAX algorithm has two main steps:
−Transforming the original time-series into a PAA
representation
−Converting the PAA intermediate representation into a
string during.
− The string representations can be used for pattern matching,
distance measurements, outlier detection, etc.
Piecewise Aggregate Approximation
− In PAA, to reduce the time series from n dimensions to w
dimensions, the data is divided into w equal sized “frames.”
− The mean value of the data falling within a frame is calculated
and a vector of these values becomes the data-reduced
representation.
− Before applying PAA, each time series to have a needs to be
normalised to achieve a mean of zero and a standard
deviation of one.
−The reason is to avoid comparing time series with
different offsets and amplitudes;
Source: Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. 2003. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the
8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery (DMKD '03). ACM, New York, NY, USA, 2-11.
SAX- normalisation before PAA
Timeseries (c): 2, 3, 4.5, 7.6, 4, 2, 2, 2, 3, 1
Mean (μ): =μ (2+3+4.5+7.6+4+2+2+2+3+1)/10= 3.11
Standard deviation (σ):
(2-3.11)2
= 1.2321
(3-3.11)2
= 0.0121
(4.5-3.11)2
= 1.9321
(7.6-3.11)2
= 20.1601
(4-3.11)2
= 0.7921
(2-3.11)2
= 1.2321
(2-3.11)2
= 1.2321
(2-3.11)2
= 1.2321
(3-3.11)2
= 0.0121
(1-3.11)2
= 4.4521
1.2321+0.0121+ 1.9321+ 20.1601+
0.7921+ 1.2321+ 1.2321+ 1.2321+ 1.2321+
0.0121+4.4521 = 33.5211
= √ (33.5211/10) = 1.83087683911σ
Normalisation
Timeseries (c): 2, 3, 4.5, 7.6, 4, 2, 2, 2, 3, 1
Normalised: zi = (ci – )/μ σ
= 1.83087683911σ
= 3.11μ
z1 = (2- 3.11)/1.83087683911 = -0.606
z2 = (3-3.11)/ 1.83087683911= -0.600
z3 = (4.5-3.11)/ 1.83087683911= 2.452
z4 = (7.6-3.11)/ 1.83087683911= -0.600
z5 = (4-3.11)/ 1.83087683911= 0.486
z6 = (2-3.11)/ 1.83087683911= -0.606
z7 = (2-3.11)/ 1.83087683911= -0.606
z8 = (2-3.11)/ 1.83087683911= -0.606
z9 = (3-3.11)/ 1.83087683911= -0.600
z10 = (1-3.11)/ 1.83087683911= -1.152
Normalised Timeseries (z): -0.606, -0.600, 2.452, -0.600, 0.486,
-0.606, -0.606, -0.606 , -0.600, -1.152
PAA calculation
Timeseries (c): 2, 3, 4.5, 7.6, 4, 2, 2, 2, 3, 1
Normalised Timeseries (z): -0.606, -0.600, 2.452, -0.600, 0.486, -0.606,
-0.606, -0.606 , -0.600, -1.152
PAA (w=5): -0.603, 0.926, -0.06, -0.606, 0.273
PAA to SAX Conversion
− Conversion of the PAA representation of a time-series into SAX is based
on producing symbols that correspond to the time-series features with
equal probability.
− The SAX developers have shown that time-series which are normalised
(zero mean and standard deviation of 1) follow a Normal distribution
(Gaussian distribution).
− The SAX method introduces breakpoints that divides the PAA
representation to equal sections and assigns an alphabet for each section.
− For defining breakpoints, Normal inverse cumulative distribution function
Breakpoints in SAX
− “Breakpoints: breakpoints are a sorted list of numbers B = β 1,…, β a-1 such
that the area under a N(0,1) Gaussian curve from βi to βi+1 = 1/a”.
Source: Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. 2003. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the
8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery (DMKD '03). ACM, New York, NY, USA, 2-11.
Alphabet representation in SAX
− Let’s assume that we will have 4 symbols alphabet: a,b,c,d
− As shown in the table in the previous slide, the cut lines for
this alphabet (also shown as the thin red lines on the plot
below) will be { -0.67, 0, 0.67 }
Source: JMOTIF Time series mining, http://code.google.com/p/jmotif/wiki/SAX
SAX Represetantion
Timeseries (c): 2, 3, 4.5, 7.6, 4, 2, 2, 2, 3, 1
Normalised Timeseries (z): -0.606, -0.600, 2.452, -0.600, 0.486,
-0.606, -0.606, -0.606 , -0.600, -1.152
PAA (w=5): -0.603, 0.926, -0.06, -0.606, 0.273
Cut off ranges: {-0.67, 0, 0.67}
Alphabet: a ,b ,c, d
SAX representation: bdbbc
Features of the SAX technique
−SAX divides a time series data into equal segments
and then creates a string representation for each
segment.
−The SAX patterns create the lower-level
abstractions that are used to create the higher-level
interpretation of the underlying data.
−The string representation of the SAX mechanism
enables to compare the patterns using a specific type
of string similarity function.
Adaptive learning
Slides: Daniel Puschmann, Frieder Ganz (now with Adobe)
Creating Patterns-
Adaptive sensor SAX
59
F. Ganz, P. Barnaghi, F. Carrez, "Information Abstraction for Heterogeneous Real World Internet Data”, IEEE Sensors Journal, 2013.
Data abstraction
60
F. Ganz, P. Barnaghi, F. Carrez, "Information Abstraction for Heterogeneous Real World Internet Data", IEEE Sensors Journal, 2013.
From patterns/events to a situation ontology
61
F. Ganz, P. Barnaghi, F. Carrez, "Automated Semantic Knowledge Acquisition from Sensor Data", IEEE
Systems Journal, 2014.
Learning ontology from sensory data
62
KAT- Knowledge Acquisition Toolkit
http://kat.ee.surrey.ac.uk/
F. Ganz, D. Puschmann, P. Barnaghi, F. Carrez, "A Practical Evaluation of Information Processing and Abstraction
Techniques for the Internet of Things", IEEE Internet of Things Journal, 2015.
63
64
https://github.com/UniSurreyIoT/KAT
Website: http://kat.ee.surrey.ac.uk
Social data analysis: A case study
Slides: Pramod Anantharam, Kno.e.sis Centre, Wright State
University
P. Anantharam, P. Barnaghi, K. Thirunarayan, A.P. Sheth, "Extracting City Traffic Events from Social
Streams", ACM Trans. on Intelligent Systems and Technology, 2015.
Social data analysis: A case study
− Are people talking about city infrastructure on twitter?
− Can we extract city infrastructure related events from
twitter?
− How can we leverage event and location knowledge bases for
event extraction?
− How well can we extract city events?
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
Do people talk about city
infrastructure on Twitter?
67Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
Some challenges in extracting events from
Tweets
− No well accepted definition of “events related to a city”.
− Tweets are short (140 characters) and its informal nature
make it hard to analyze
− Entity, location, time, and type of the event
− Multiple reports of the same event and sparse report of some
events (biased sample)
− Numbers don’t necessarily indicate intensity
− Validation of the solution is hard due to the open domain
nature of the problem
68Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
Event extraction techniques
− N-grams + Regression
− Text analysis to extract uni- and bi-grams (event markers)
− Feature selection to select best possible event markers
− Apply regression to predict P(Y|X) where Y is the target (rainfall) and X is
the input (event marker).
− Clustering
− Create event clusters incrementally over time
− Identify clusters of interest based on its relevance (manual inspection)
− Granularity remains at the tweet/cluster level (tweets are assigned to
clusters of interest)
− Sequence Labeling (CRFs)
− Text analysis to extract features such as named entities, POS1
tagging
− Each event indicator is modeled as a mixture of event types that are latent
variables
− Each type corresponds to a distribution over named entities n (labels
assigned to event types by manual inspection) and other features
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
City Event Extraction
− Event extraction should be open domain (no a priori event
types) with event metadata.
− Incorporate background knowledge related to city related
events e.g., 511.org hierarchy, SCRIBE ontology, city location
names.
− Assess the intensity of an event using content and network
cues.
− Robust to noise, informal nature, and variability of data.
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
City Infrastructure
Tweets from a city POS
Tagging
POS
Tagging
Hybrid NER+
Event term
extraction
Hybrid NER+
Event term
extraction
GeohashingGeohashing
Temporal
Estimation
Temporal
Estimation
Impact
Assessment
Impact
Assessment
Event
Aggregation
Event
AggregationOSM LocationsOSM Locations SCRIBE ontologySCRIBE ontology
511.org hierarchy511.org hierarchy
City Event Extraction
City Event Extraction Solution ArchitectureCity Event Extraction Solution Architecture
City Event Annotation
P. Anantharam, P. Barnaghi, K. Thirunarayan, A.P. Sheth, "Extracting City Traffic Events from Social
Streams", ACM Trans. on Intelligent Systems and Technology, 2015.
City Event Extraction - Key Insights
a) Space: events reported within a grid (gi G where G is a set∈
of all grids in a city)at a certain time are most likely
reporting the same event
b) Time: events reported within a time ∆t in a grid gi are most
likely to be reporting the same event
c) Theme: events with similar entities within a grid gi and time
∆t are most likely reporting the same event
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
CRF formalisation – for annotation
73
A General CRF Model
0.6 miles
Max-lat
Min-lat
Min-long
Max-long
0.38 miles
37.7545166015625, -122.40966796875
37.7490234375, -122.40966796875
37.7545166015625, -122.420654296875
37.7490234375, -122.420654296875
4
37.74933, -122.4106711
Hierarchical spatial structure of geohash for
representing locations with variable precision.
Here the location string is 5H34
0 1 2 3 4 5 6
7 8 9 B C D E
F G H I J K L
0 1
7
2 3 4
5 6 8 9
0 1 2 3 4
5 6 7
0 1 2
3 4 5
6 7 8
City Event Extraction – Geohashing
Implmentation
− City Event Annotation
− Automated creation of training data
− Annotation task (our CRF model vs. baseline CRF model)
− City Event Extraction
− Use aggregation algorithm for event extraction
− Extracted events AND ground truth
− Dataset (Aug – Nov 2013) ~ 8 GB of data on disk
− Over 8 million tweets
− Over 162 million sensor data points
− 311 active events and 170 scheduled events
Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
Extracted Events and Ground Truth
Automated creation of training data
77
Evaluation over 500 randomly chosen tweets from around 8,000 annotated
tweets
Conclusions
− The ultimate goal is to transform raw data into
insights and actionable information and/or create
effective representations for machines and also
human users.
− This requires data from multiple sources, (near-) real
time analytics and visualisation and/or semantic
representations.
− We need solutions such as streaming processing, pattern recognition,
multi-view and deep learning.
− Applications in areas such as intelligent urban development and
combining physical-cyber-social data for developing smart city services,
and healthcare services (e.g. elderly-care, remote patient monitoring).
78
Acknowledgements
− Some parts of the content are adapted from:
− Holger Karl, Andreas Willig, Protocols and Architectures for Wireless Sensor
Networks, Protocols and Architectures for Wireless Sensor Networks,
chapters 3 and 12, Wiley, 2005 .
− Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. 2003. A symbolic
representation of time series, with implications for streaming algorithms. In
Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining
and knowledge discovery (DMKD '03). ACM, New York, NY, USA, 2-11.
− JMOTIF Time series mining, http://code.google.com/p/jmotif/wiki/SAX
Q&A
− Thank you.
http://personal.ee.surrey.ac.uk/Personal/P.Barnaghi/
@pbarnaghi
p.barnaghi@surrey.ac.uk

Weitere ähnliche Inhalte

Was ist angesagt?

Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things PayamBarnaghi
 
The impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart citiesThe impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart citiesPayamBarnaghi
 
Physical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City ApplicationsPhysical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City ApplicationsPayamBarnaghi
 
Dynamic Data Analytics for the Internet of Things: Challenges and Opportunities
Dynamic Data Analytics for the Internet of Things: Challenges and OpportunitiesDynamic Data Analytics for the Internet of Things: Challenges and Opportunities
Dynamic Data Analytics for the Internet of Things: Challenges and OpportunitiesPayamBarnaghi
 
Internet of Things and Data Analytics for Smart Cities and eHealth
Internet of Things and Data Analytics for Smart Cities and eHealthInternet of Things and Data Analytics for Smart Cities and eHealth
Internet of Things and Data Analytics for Smart Cities and eHealthPayamBarnaghi
 
Internet of Things and Large-scale Data Analytics
Internet of Things and Large-scale Data Analytics Internet of Things and Large-scale Data Analytics
Internet of Things and Large-scale Data Analytics PayamBarnaghi
 
CityPulse: Large-scale data analytics for smart cities
CityPulse: Large-scale data analytics for smart cities CityPulse: Large-scale data analytics for smart cities
CityPulse: Large-scale data analytics for smart cities PayamBarnaghi
 
Opportunities and Challenges of Large-scale IoT Data Analytics
Opportunities and Challenges of Large-scale IoT Data AnalyticsOpportunities and Challenges of Large-scale IoT Data Analytics
Opportunities and Challenges of Large-scale IoT Data AnalyticsPayamBarnaghi
 
Smart Cities and Data Analytics: Challenges and Opportunities
Smart Cities and Data Analytics: Challenges and Opportunities Smart Cities and Data Analytics: Challenges and Opportunities
Smart Cities and Data Analytics: Challenges and Opportunities PayamBarnaghi
 
Dynamic Semantics for Semantics for Dynamic IoT Environments
Dynamic Semantics for Semantics for Dynamic IoT EnvironmentsDynamic Semantics for Semantics for Dynamic IoT Environments
Dynamic Semantics for Semantics for Dynamic IoT EnvironmentsPayamBarnaghi
 
Large-scale data analytics for smart cities
Large-scale data analytics for smart citiesLarge-scale data analytics for smart cities
Large-scale data analytics for smart citiesPayamBarnaghi
 
A Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
A Knowledge-based Approach for Real-Time IoT Stream Annotation and ProcessingA Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
A Knowledge-based Approach for Real-Time IoT Stream Annotation and ProcessingPayamBarnaghi
 
Discovering Things and Things’ data/services
Discovering Things and  Things’ data/servicesDiscovering Things and  Things’ data/services
Discovering Things and Things’ data/servicesPayamBarnaghi
 
The Future is Cyber-Healthcare
The Future is Cyber-Healthcare The Future is Cyber-Healthcare
The Future is Cyber-Healthcare PayamBarnaghi
 
Smart Cities….Smart Future
Smart Cities….Smart FutureSmart Cities….Smart Future
Smart Cities….Smart FuturePayamBarnaghi
 
The Internet of Things: What's next?
The Internet of Things: What's next? The Internet of Things: What's next?
The Internet of Things: What's next? PayamBarnaghi
 
Internet of Things: The story so far
Internet of Things: The story so farInternet of Things: The story so far
Internet of Things: The story so farPayamBarnaghi
 
Internet of Things: Concepts and Technologies
Internet of Things: Concepts and TechnologiesInternet of Things: Concepts and Technologies
Internet of Things: Concepts and TechnologiesPayamBarnaghi
 
Data Analytics for Smart Cities: Looking Back, Looking Forward
Data Analytics for Smart Cities: Looking Back, Looking Forward Data Analytics for Smart Cities: Looking Back, Looking Forward
Data Analytics for Smart Cities: Looking Back, Looking Forward PayamBarnaghi
 
Semantic Technologies for the Internet of Things: Challenges and Opportunities
Semantic Technologies for the Internet of Things: Challenges and Opportunities Semantic Technologies for the Internet of Things: Challenges and Opportunities
Semantic Technologies for the Internet of Things: Challenges and Opportunities PayamBarnaghi
 

Was ist angesagt? (20)

Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things
 
The impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart citiesThe impact of Big Data on next generation of smart cities
The impact of Big Data on next generation of smart cities
 
Physical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City ApplicationsPhysical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City Applications
 
Dynamic Data Analytics for the Internet of Things: Challenges and Opportunities
Dynamic Data Analytics for the Internet of Things: Challenges and OpportunitiesDynamic Data Analytics for the Internet of Things: Challenges and Opportunities
Dynamic Data Analytics for the Internet of Things: Challenges and Opportunities
 
Internet of Things and Data Analytics for Smart Cities and eHealth
Internet of Things and Data Analytics for Smart Cities and eHealthInternet of Things and Data Analytics for Smart Cities and eHealth
Internet of Things and Data Analytics for Smart Cities and eHealth
 
Internet of Things and Large-scale Data Analytics
Internet of Things and Large-scale Data Analytics Internet of Things and Large-scale Data Analytics
Internet of Things and Large-scale Data Analytics
 
CityPulse: Large-scale data analytics for smart cities
CityPulse: Large-scale data analytics for smart cities CityPulse: Large-scale data analytics for smart cities
CityPulse: Large-scale data analytics for smart cities
 
Opportunities and Challenges of Large-scale IoT Data Analytics
Opportunities and Challenges of Large-scale IoT Data AnalyticsOpportunities and Challenges of Large-scale IoT Data Analytics
Opportunities and Challenges of Large-scale IoT Data Analytics
 
Smart Cities and Data Analytics: Challenges and Opportunities
Smart Cities and Data Analytics: Challenges and Opportunities Smart Cities and Data Analytics: Challenges and Opportunities
Smart Cities and Data Analytics: Challenges and Opportunities
 
Dynamic Semantics for Semantics for Dynamic IoT Environments
Dynamic Semantics for Semantics for Dynamic IoT EnvironmentsDynamic Semantics for Semantics for Dynamic IoT Environments
Dynamic Semantics for Semantics for Dynamic IoT Environments
 
Large-scale data analytics for smart cities
Large-scale data analytics for smart citiesLarge-scale data analytics for smart cities
Large-scale data analytics for smart cities
 
A Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
A Knowledge-based Approach for Real-Time IoT Stream Annotation and ProcessingA Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
A Knowledge-based Approach for Real-Time IoT Stream Annotation and Processing
 
Discovering Things and Things’ data/services
Discovering Things and  Things’ data/servicesDiscovering Things and  Things’ data/services
Discovering Things and Things’ data/services
 
The Future is Cyber-Healthcare
The Future is Cyber-Healthcare The Future is Cyber-Healthcare
The Future is Cyber-Healthcare
 
Smart Cities….Smart Future
Smart Cities….Smart FutureSmart Cities….Smart Future
Smart Cities….Smart Future
 
The Internet of Things: What's next?
The Internet of Things: What's next? The Internet of Things: What's next?
The Internet of Things: What's next?
 
Internet of Things: The story so far
Internet of Things: The story so farInternet of Things: The story so far
Internet of Things: The story so far
 
Internet of Things: Concepts and Technologies
Internet of Things: Concepts and TechnologiesInternet of Things: Concepts and Technologies
Internet of Things: Concepts and Technologies
 
Data Analytics for Smart Cities: Looking Back, Looking Forward
Data Analytics for Smart Cities: Looking Back, Looking Forward Data Analytics for Smart Cities: Looking Back, Looking Forward
Data Analytics for Smart Cities: Looking Back, Looking Forward
 
Semantic Technologies for the Internet of Things: Challenges and Opportunities
Semantic Technologies for the Internet of Things: Challenges and Opportunities Semantic Technologies for the Internet of Things: Challenges and Opportunities
Semantic Technologies for the Internet of Things: Challenges and Opportunities
 

Andere mochten auch

IoT-Lite: A Lightweight Semantic Model for the Internet of Things
IoT-Lite:  A Lightweight Semantic Model for the Internet of ThingsIoT-Lite:  A Lightweight Semantic Model for the Internet of Things
IoT-Lite: A Lightweight Semantic Model for the Internet of ThingsPayamBarnaghi
 
20081114 friday food i labt ingrid moerman
20081114 friday food i labt ingrid moerman20081114 friday food i labt ingrid moerman
20081114 friday food i labt ingrid moermanguestbd4497
 
Spatial Data on the Web
Spatial Data on the WebSpatial Data on the Web
Spatial Data on the WebPayamBarnaghi
 
CityPulse: Large-scale data analysis for smart city applications
CityPulse: Large-scale data analysis for smart city applicationsCityPulse: Large-scale data analysis for smart city applications
CityPulse: Large-scale data analysis for smart city applicationsPayamBarnaghi
 
How to make cities "smarter"?
How to make cities "smarter"?How to make cities "smarter"?
How to make cities "smarter"?PayamBarnaghi
 
IoTMeetupGuildford#4: CityPulse Project Overview - Sefki Kolozali, Daniel Pus...
IoTMeetupGuildford#4: CityPulse Project Overview - Sefki Kolozali, Daniel Pus...IoTMeetupGuildford#4: CityPulse Project Overview - Sefki Kolozali, Daniel Pus...
IoTMeetupGuildford#4: CityPulse Project Overview - Sefki Kolozali, Daniel Pus...MicheleNati
 
Preserving location privacy in geo social applications
Preserving location privacy in geo social applications Preserving location privacy in geo social applications
Preserving location privacy in geo social applications Adz91 Digital Ads Pvt Ltd
 
Context Aware interactions for diagnosis of Bipolar Disorder
Context Aware interactions for diagnosis of Bipolar DisorderContext Aware interactions for diagnosis of Bipolar Disorder
Context Aware interactions for diagnosis of Bipolar DisorderCoRehab
 
Arquam_reportfinal
Arquam_reportfinalArquam_reportfinal
Arquam_reportfinalArquam Md
 
Actor Network Theory
Actor Network TheoryActor Network Theory
Actor Network Theoryishtar_0055
 
Supporting the Sink mobility: a case study for WSN
Supporting the Sink mobility: a case study for WSNSupporting the Sink mobility: a case study for WSN
Supporting the Sink mobility: a case study for WSNCoRehab
 
Introduction into Actor Network Theory from Bruno Latour
Introduction into Actor Network Theory from Bruno LatourIntroduction into Actor Network Theory from Bruno Latour
Introduction into Actor Network Theory from Bruno LatourStefan Kasberger
 
Multi-resolution Data Communication in Wireless Sensor Networks
Multi-resolution Data Communication in Wireless Sensor NetworksMulti-resolution Data Communication in Wireless Sensor Networks
Multi-resolution Data Communication in Wireless Sensor NetworksPayamBarnaghi
 

Andere mochten auch (13)

IoT-Lite: A Lightweight Semantic Model for the Internet of Things
IoT-Lite:  A Lightweight Semantic Model for the Internet of ThingsIoT-Lite:  A Lightweight Semantic Model for the Internet of Things
IoT-Lite: A Lightweight Semantic Model for the Internet of Things
 
20081114 friday food i labt ingrid moerman
20081114 friday food i labt ingrid moerman20081114 friday food i labt ingrid moerman
20081114 friday food i labt ingrid moerman
 
Spatial Data on the Web
Spatial Data on the WebSpatial Data on the Web
Spatial Data on the Web
 
CityPulse: Large-scale data analysis for smart city applications
CityPulse: Large-scale data analysis for smart city applicationsCityPulse: Large-scale data analysis for smart city applications
CityPulse: Large-scale data analysis for smart city applications
 
How to make cities "smarter"?
How to make cities "smarter"?How to make cities "smarter"?
How to make cities "smarter"?
 
IoTMeetupGuildford#4: CityPulse Project Overview - Sefki Kolozali, Daniel Pus...
IoTMeetupGuildford#4: CityPulse Project Overview - Sefki Kolozali, Daniel Pus...IoTMeetupGuildford#4: CityPulse Project Overview - Sefki Kolozali, Daniel Pus...
IoTMeetupGuildford#4: CityPulse Project Overview - Sefki Kolozali, Daniel Pus...
 
Preserving location privacy in geo social applications
Preserving location privacy in geo social applications Preserving location privacy in geo social applications
Preserving location privacy in geo social applications
 
Context Aware interactions for diagnosis of Bipolar Disorder
Context Aware interactions for diagnosis of Bipolar DisorderContext Aware interactions for diagnosis of Bipolar Disorder
Context Aware interactions for diagnosis of Bipolar Disorder
 
Arquam_reportfinal
Arquam_reportfinalArquam_reportfinal
Arquam_reportfinal
 
Actor Network Theory
Actor Network TheoryActor Network Theory
Actor Network Theory
 
Supporting the Sink mobility: a case study for WSN
Supporting the Sink mobility: a case study for WSNSupporting the Sink mobility: a case study for WSN
Supporting the Sink mobility: a case study for WSN
 
Introduction into Actor Network Theory from Bruno Latour
Introduction into Actor Network Theory from Bruno LatourIntroduction into Actor Network Theory from Bruno Latour
Introduction into Actor Network Theory from Bruno Latour
 
Multi-resolution Data Communication in Wireless Sensor Networks
Multi-resolution Data Communication in Wireless Sensor NetworksMulti-resolution Data Communication in Wireless Sensor Networks
Multi-resolution Data Communication in Wireless Sensor Networks
 

Ähnlich wie Intelligent Data Processing for the Internet of Things

Internet of Things & Big Data
Internet of Things & Big DataInternet of Things & Big Data
Internet of Things & Big DataArun Rajput
 
Data processing in Cyber-Physical Systems
Data processing in Cyber-Physical SystemsData processing in Cyber-Physical Systems
Data processing in Cyber-Physical SystemsBob Marcus
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Artificial Intelligence Institute at UofSC
 
Data Modelling and Knowledge Engineering for the Internet of Things
Data Modelling and Knowledge Engineering for the Internet of ThingsData Modelling and Knowledge Engineering for the Internet of Things
Data Modelling and Knowledge Engineering for the Internet of ThingsCory Andrew Henson
 
1_IoT_Fundamentals.ppt
1_IoT_Fundamentals.ppt1_IoT_Fundamentals.ppt
1_IoT_Fundamentals.pptHodaKHajeh1
 
Roberto minerva 20181130
Roberto minerva 20181130  Roberto minerva 20181130
Roberto minerva 20181130 Roberto Minerva
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Data Modeling and Knowledge Engineering for the Internet of Things
Data Modeling and Knowledge Engineering for the Internet of ThingsData Modeling and Knowledge Engineering for the Internet of Things
Data Modeling and Knowledge Engineering for the Internet of ThingsPayamBarnaghi
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptxAlbert Alex
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsBig Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsSherinMariamReji05
 
18CS81 IOT MODULE 4 PPT.pdf
18CS81 IOT MODULE 4 PPT.pdf18CS81 IOT MODULE 4 PPT.pdf
18CS81 IOT MODULE 4 PPT.pdfFURYGaming22
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_publicAttila Barta
 
Internet of things - what is really happening
Internet of things - what is really happeningInternet of things - what is really happening
Internet of things - what is really happeningThor Henning Hetland
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfahmedibrahimghnnam01
 
Week 8 - Module 19 - PPT- Internet of Things for Libraries.pdf
Week 8 - Module 19 - PPT- Internet of Things for Libraries.pdfWeek 8 - Module 19 - PPT- Internet of Things for Libraries.pdf
Week 8 - Module 19 - PPT- Internet of Things for Libraries.pdfMohamedAli899919
 
Lecture15_DataAnalytics.pptx
Lecture15_DataAnalytics.pptxLecture15_DataAnalytics.pptx
Lecture15_DataAnalytics.pptxishwar69
 

Ähnlich wie Intelligent Data Processing for the Internet of Things (20)

Internet of Things & Big Data
Internet of Things & Big DataInternet of Things & Big Data
Internet of Things & Big Data
 
Data processing in Cyber-Physical Systems
Data processing in Cyber-Physical SystemsData processing in Cyber-Physical Systems
Data processing in Cyber-Physical Systems
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
 
Data Modelling and Knowledge Engineering for the Internet of Things
Data Modelling and Knowledge Engineering for the Internet of ThingsData Modelling and Knowledge Engineering for the Internet of Things
Data Modelling and Knowledge Engineering for the Internet of Things
 
1_IoT_Fundamentals.ppt
1_IoT_Fundamentals.ppt1_IoT_Fundamentals.ppt
1_IoT_Fundamentals.ppt
 
Roberto minerva 20181130
Roberto minerva 20181130  Roberto minerva 20181130
Roberto minerva 20181130
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Data Modeling and Knowledge Engineering for the Internet of Things
Data Modeling and Knowledge Engineering for the Internet of ThingsData Modeling and Knowledge Engineering for the Internet of Things
Data Modeling and Knowledge Engineering for the Internet of Things
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsBig Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
 
8_iot.pdf
8_iot.pdf8_iot.pdf
8_iot.pdf
 
18CS81 IOT MODULE 4 PPT.pdf
18CS81 IOT MODULE 4 PPT.pdf18CS81 IOT MODULE 4 PPT.pdf
18CS81 IOT MODULE 4 PPT.pdf
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Internet of things - what is really happening
Internet of things - what is really happeningInternet of things - what is really happening
Internet of things - what is really happening
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
 
Week 8 - Module 19 - PPT- Internet of Things for Libraries.pdf
Week 8 - Module 19 - PPT- Internet of Things for Libraries.pdfWeek 8 - Module 19 - PPT- Internet of Things for Libraries.pdf
Week 8 - Module 19 - PPT- Internet of Things for Libraries.pdf
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Iot presentation
Iot presentationIot presentation
Iot presentation
 
Lecture15_DataAnalytics.pptx
Lecture15_DataAnalytics.pptxLecture15_DataAnalytics.pptx
Lecture15_DataAnalytics.pptx
 

Mehr von PayamBarnaghi

Academic Research: A Survival Guide
Academic Research: A Survival GuideAcademic Research: A Survival Guide
Academic Research: A Survival GuidePayamBarnaghi
 
Reproducibility in machine learning
Reproducibility in machine learningReproducibility in machine learning
Reproducibility in machine learningPayamBarnaghi
 
Search, Discovery and Analysis of Sensory Data Streams
Search, Discovery and Analysis of Sensory Data StreamsSearch, Discovery and Analysis of Sensory Data Streams
Search, Discovery and Analysis of Sensory Data StreamsPayamBarnaghi
 
Internet Search: the past, present and the future
Internet Search: the past, present and the futureInternet Search: the past, present and the future
Internet Search: the past, present and the futurePayamBarnaghi
 
Scientific and Academic Research: A Survival Guide 
Scientific and Academic Research:  A Survival Guide Scientific and Academic Research:  A Survival Guide 
Scientific and Academic Research: A Survival Guide PayamBarnaghi
 
Lecture 8: IoT System Models and Applications
Lecture 8: IoT System Models and ApplicationsLecture 8: IoT System Models and Applications
Lecture 8: IoT System Models and ApplicationsPayamBarnaghi
 
Lecture 7: Semantic Technologies and Interoperability
Lecture 7: Semantic Technologies and InteroperabilityLecture 7: Semantic Technologies and Interoperability
Lecture 7: Semantic Technologies and InteroperabilityPayamBarnaghi
 
Lecture 6: IoT Data Processing
Lecture 6: IoT Data Processing Lecture 6: IoT Data Processing
Lecture 6: IoT Data Processing PayamBarnaghi
 
Lecture 5: Software platforms and services
Lecture 5: Software platforms and services Lecture 5: Software platforms and services
Lecture 5: Software platforms and services PayamBarnaghi
 
Internet of Things for healthcare: data integration and security/privacy issu...
Internet of Things for healthcare: data integration and security/privacy issu...Internet of Things for healthcare: data integration and security/privacy issu...
Internet of Things for healthcare: data integration and security/privacy issu...PayamBarnaghi
 
Scientific and Academic Research: A Survival Guide 
Scientific and Academic Research:  A Survival Guide Scientific and Academic Research:  A Survival Guide 
Scientific and Academic Research: A Survival Guide PayamBarnaghi
 
Semantic Technolgies for the Internet of Things
Semantic Technolgies for the Internet of ThingsSemantic Technolgies for the Internet of Things
Semantic Technolgies for the Internet of ThingsPayamBarnaghi
 

Mehr von PayamBarnaghi (12)

Academic Research: A Survival Guide
Academic Research: A Survival GuideAcademic Research: A Survival Guide
Academic Research: A Survival Guide
 
Reproducibility in machine learning
Reproducibility in machine learningReproducibility in machine learning
Reproducibility in machine learning
 
Search, Discovery and Analysis of Sensory Data Streams
Search, Discovery and Analysis of Sensory Data StreamsSearch, Discovery and Analysis of Sensory Data Streams
Search, Discovery and Analysis of Sensory Data Streams
 
Internet Search: the past, present and the future
Internet Search: the past, present and the futureInternet Search: the past, present and the future
Internet Search: the past, present and the future
 
Scientific and Academic Research: A Survival Guide 
Scientific and Academic Research:  A Survival Guide Scientific and Academic Research:  A Survival Guide 
Scientific and Academic Research: A Survival Guide 
 
Lecture 8: IoT System Models and Applications
Lecture 8: IoT System Models and ApplicationsLecture 8: IoT System Models and Applications
Lecture 8: IoT System Models and Applications
 
Lecture 7: Semantic Technologies and Interoperability
Lecture 7: Semantic Technologies and InteroperabilityLecture 7: Semantic Technologies and Interoperability
Lecture 7: Semantic Technologies and Interoperability
 
Lecture 6: IoT Data Processing
Lecture 6: IoT Data Processing Lecture 6: IoT Data Processing
Lecture 6: IoT Data Processing
 
Lecture 5: Software platforms and services
Lecture 5: Software platforms and services Lecture 5: Software platforms and services
Lecture 5: Software platforms and services
 
Internet of Things for healthcare: data integration and security/privacy issu...
Internet of Things for healthcare: data integration and security/privacy issu...Internet of Things for healthcare: data integration and security/privacy issu...
Internet of Things for healthcare: data integration and security/privacy issu...
 
Scientific and Academic Research: A Survival Guide 
Scientific and Academic Research:  A Survival Guide Scientific and Academic Research:  A Survival Guide 
Scientific and Academic Research: A Survival Guide 
 
Semantic Technolgies for the Internet of Things
Semantic Technolgies for the Internet of ThingsSemantic Technolgies for the Internet of Things
Semantic Technolgies for the Internet of Things
 

Kürzlich hochgeladen

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 

Kürzlich hochgeladen (20)

Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

Intelligent Data Processing for the Internet of Things

  • 1. 1 Intelligent Data Processing for the Internet of Things Payam Barnaghi Institute for Communication Systems (ICS) University of Surrey Guildford, United Kingdom International “IoT 360 Summer School″ October 2015 – Rome, Italy
  • 2. Wireless Sensor (and Actuator) Networks Sink node Gateway Core network e.g. InternetGateway End-user Computer services - The networks typically run Low Power Devices - Consist of one or more sensors, could be different type of sensors (or actuators) Operating Systems? Services? Protocols? Protocols? In-node Data Processing Data Aggregation/ Fusion Inference/ Processing of IoT data
  • 3. 3 Key characteristics of IoT devices −Often inexpensive sensors (actuators) equipped with a radio transceiver for various applications, typically low data rate ~ 10-250 kbps. −Deployed in large numbers −The sensors should coordinate to perform the desired task. −The acquired information (periodic or event-based) is reported back to the information processing centre (or some cases in-network processing is required) −Solutions are often application-dependent. 3
  • 4. 4 Beyond conventional sensors − Human as a sensor (citizen sensors) − e.g. tweeting real world data and/or events − Software sensors − e.g. Software agents/services generating/representing data Road block, A3 Road block, A3 Suggest a different route
  • 5. 5P. Barnaghi et al., "Digital Technology Adoption in the Smart Built Environment", IET Sector Technical Briefing, The Institution of Engineering and Technology (IET), I. Borthwick (editor), March 2015.
  • 6. Internet of Things: The story so far P. Barnaghi, A. Sheth, “The Internet of Things: The story so far”, IEEE IoT Newsletter, September 2014.
  • 7. IoT data- challenges − Multi-modal, distributed and heterogeneous − Noisy and incomplete − Time and location dependent − Dynamic and varies in quality − Crowdsourced data can be unreliable − Requires (near-) real-time analysis − Privacy and security are important issues − Data can be biased- we need to know our data! 7 P. Barnaghi, A. Sheth, C. Henson, "From data to actionable knowledge: Big Data Challenges in the Web of Things," IEEE Intelligent Systems, vol.28 , issue.6, Dec 2013.
  • 8. Data is not what we want or is it?
  • 9. What we need are insights and actionable-information.
  • 10. Diffusion of innovation image source: Wikipedia IoT
  • 11. Problem #1 Data: We seem to have lots of it… Real World Data: it is always difficult to get (silos, format, privacy, business interests or lack of interest!...)
  • 12. Problem #2 Data: interoperability and metadata frameworks… Real World Data: there are solutions for service based (RESTful) access, meta-data/semantic representation frameworks (e.g. W3C SSN, HyperCat,…) but none of them are still widely adapted.
  • 13. Problem #3 Data: quality, reliability… Real World Data: data can be noisy, crowed source data can be inaccurate, contradictory, delay in accessing/processing the data…
  • 14. Problem #4 Data: having too much data and using analytics tools alone won’t solve the problem… Real World Data: in addition to the HPC issues, we need new methods/solutions that can provide real-time analysis of dynamic, variable quality and multi-modal streams…
  • 15. Problem #5 Data: abstraction, discovering the associations… Real World Data: co-occurrence vs. causation; we need hypothesis, background knowledge,… After all data is not what we are really after…
  • 16. We need more linked open data…
  • 17. (near) real-time linked open data Streams Sometimes it’s even better if we have:
  • 18. (near) real-time linked open data Streams + meta-data (semantic annotations) + Adaptable and scalable analytics tools + Sufficient background knowledge or even better than that if we have:
  • 20. 20 “Each single data item could be important.” “Relying merely on data from sources that are unevenly distributed, without considering background information or social context, can lead to imbalanced interpretations and decisions.” ?
  • 21. Current focus on Big Data − Emphasis on power of data and data mining solutions − Technology solutions to handle large volumes of data; e.g. Hadoop, NoSQL, Graph Databases, … − Trying to find patterns and trends from large volumes of data…
  • 22. Myths About Big Data − Big Data is only about massive data volume − Big Data means Hadoop − Big Data means unstructured data − If we have enough data we can draw conclusions (enough here often means massive amounts) − NoSQL means No SQL − It is about increasing computational power and taking more data and running data mining algorithms. 22 Some of the items are adapted from: Brain Gentile, http://mashable.com/2012/06/19/big-data-myths/
  • 23. What happens if we only focus on data − Number of burgers consumed per day. − Number of cats outside. − Number of people checking their facebook account. − What insight would you draw? 23
  • 24. It is also important to note what type of problems we expect to solve.
  • 25. Back to the future 25
  • 26. 26Source LAT Times, http://documents.latimes.com/la-2013/ A smart City example Future cities: A view from 1998
  • 29. 29 Data alone is not enough − Domain knowledge − Machine interpretable meta-data − Delivery, sharing and representation services − Query, discovery, aggregation services − Publish, subscribe, notification, and access interfaces/services − More open solutions for innovation and citizen participation − Efficient feedback and control mechanisms − Social network and social system analysis − In cities, interactions with people and social systems is the key.
  • 30. 30 Some of the technical challenges − Discovery: finding appropriate device and data sources − Access: Availability and (open) access to data resources and data − Search: querying for data − Integration: dealing with heterogeneous devices, networks and data − Large-scale data mining, adaptable learning and efficient computing and processing − Interpretation: translating data to knowledge that can be used by people and applications − Scalability: dealing with large numbers of devices and a myriad of data and the computational complexity of interpreting the data.
  • 31. 31 Accessing IoT data − Publish/Subscribe (long-term/short-term) − Ad-hoc query − The typical types of data request for sensory data: − Query based on − ID (resource/service) – for known resources − Location − Type − Time – requests for freshness data or historical data; − One of the above + a range [+ Unit of Measurement] − Type/Location/Time + A combination of Quality of Information attributes − An entity of interest (a feature of an entity on interest) − Complex Data Types (e.g. pollution data could be a combination of different types)
  • 32. 32 IoT Data in the Cloud Image courtesy: http://images.mathrubhumi.com http://www.anacostiaws.org/userfiles/image/Blog-Photos/river2.jpg
  • 33. IoT Data Processing WSN WSN WSN WSN WSN Network-enabled Devices Network-enabled Devices Network services/storage and processing units Data/service access at application level Data collections and processing within the networks Data Discovery Service/ Resource Discovery
  • 34. 34 In-network processing − Mobile Ad-hoc Networks can be seen as a set of nodes that deliver bits from one end to the other; − WSNs, on the other end, are expected to provide information, not necessarily original bits − Gives additional options − e.g., manipulate or process the data in the network − Main example: aggregation − Applying aggregation functions to a obtain an average value of measurement data − Typical functions: minimum, maximum, average, sum, … − Not amenable functions: median Source: Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks Holger Karl, Andreas Willig, chapter 3, Wiley, 2005 .
  • 35. 35 In-network processing − Depending on application, more sophisticated processing of data can take place within the network − Example edge detection: locally exchange raw data with neighboring nodes, compute edges, only communicate edge description to far away data sinks. − Example tracking/angle detection of signal source: Conceive of sensor nodes as a distributed microphone array, use it to compute the angle of a single source, only communicate this angle, not all the raw data. − Exploit temporal and spatial correlation − Observed signals might vary only slowly in time; so no need to transmit all data at full rate all the time. − Signals of neighboring nodes are often quite similar; only try to transmit differences (details a bit complicated, see later). Source: Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks Holger Karl, Andreas Willig, chapter 3, Wiley, 2005 .
  • 36. 36 Data-centric networking − In typical networks (including ad-hoc networks), network transactions are addressed to the identities of specific nodes − A “node-centric” or “address-centric” networking paradigm − In a redundantly deployed sensor networks, specific source of an event, alarm, etc. might not be important − Redundancy: e.g., several nodes can observe the same area − Thus: focus networking transactions on the data directly instead of their senders and transmitters ! data-centric networking − Specially this idea is reinforced by the fact that we might have multiple sources to provide information and observations form the same or similar areas. − Principal design change Source: Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks Holger Karl, Andreas Willig, chapter 3, Wiley, 2005 .
  • 37. Data Aggregation − Computing a smaller representation of a number of data items (or messages) that is extracted from all the individual data items. − For example computing min/max or mean of sensor data. − More advance aggregation solutions could use approximation techniques to transform high-dimensionality data to lower- dimensionality abstractions/representations. − The aggregated data can be smaller in size, represent patterns/abstractions; so in multi-hop networks, nodes can receive data form other node and aggregate them before forwarding them to a sink or gateway. − Or the aggregation can happen on a sink/gateway node.
  • 38. Aggregation example − Reduce number of transmitted bits/packets by applying an aggregation function in the network 1 1 3 1 1 6 1 1 1 1 1 1 Source: Holger Karl, Andreas Willig, Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks, chapter 3, Wiley, 2005 .
  • 39. Efficacy of an aggregation mechanism − Accuracy: difference between the resulting value or representation and the original data − Some solutions can be lossless or lossly depending on the applied techniques. − Completeness: the percentage of all the data items that are included in the computation of the aggregated data. − Latency: delay time to compute and report the aggregated data − Computation foot-print; complexity; − Overhead: the main advantage of the aggregation is reducing the size of the data representation; − Aggregation functions can trade-off between accuracy, latency and overhead; − Aggregation should happen close to the source.
  • 40. Publish/Subscribe − Achieved by publish/subscribe paradigm − Idea: Entities can publish data under certain names − Entities can subscribe to updates of such named data − Conceptually: Implemented by a software bus − Software bus stores subscriptions, published data; names used as filters; subscribers notified when values of named data changes Software bus Publisher 1 Publisher 2 Subscriber 1 Subscriber 2 Subscriber 3 − Variations − Topic-based P/S – inflexible − Content-based P/S – use general predicates over named data Source: Holger Karl, Andreas Willig, Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks, chapter 12, Wiley, 2005 .
  • 41. MQTT Pub/Sub Protocol − MQ Telemetry Transport (MQTT) is a lightweight broker-based publish/subscribe messaging protocol. − MQTT is designed to be open, simple, lightweight and easy to implement. − These characteristics make MQTT ideal for use in constrained environments, for example in IoT. −Where the network is expensive, has low bandwidth or is unreliable −When run on an embedded device with limited processor or memory resources; − A small transport overhead (the fixed-length header is just 2 bytes), and protocol exchanges minimised to reduce network traffic − MQTT was developed by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions. Source: MQTT V3.1 Protocol Specification, IBM, http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html
  • 42. MQTT − It supports publish/subscribe message pattern to provide one-to-many message distribution and decoupling of applications − A messaging transport that is agnostic to the content of the payload − The use of TCP/IP to provide basic network connectivity − Three qualities of service for message delivery: − "At most once", where messages are delivered according to the best efforts of the underlying TCP/IP network. Message loss or duplication can occur. − This level could be used, for example, with ambient sensor data where it does not matter if an individual reading is lost as the next one will be published soon after. − "At least once", where messages are assured to arrive but duplicates may occur. − "Exactly once", where message are assured to arrive exactly once. This level could be used, for example, with billing systems where duplicate or lost messages could lead to incorrect charges being applied. Source: MQTT V3.1 Protocol Specification, IBM, http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html
  • 43. MQTT Message Format − The message header for each MQTT command message contains a fixed header. − Some messages also require a variable header and a payload. − The format for each part of the message header: Source: MQTT V3.1 Protocol Specification, IBM, http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/mqtt-v3r1.html — DUP: Duplicate delivery — QoS: Quality of Service — RETAIN: RETAIN flag —This flag is only used on PUBLISH messages. When a client sends a PUBLISH to a server, if the Retain flag is set (1), the server should hold on to the message after it has been delivered to the current subscribers. —This allows new subscribers to instantly receive data with the retained, or Last Known Good, value.
  • 44. Sensor Data as time-series data − The sensor data (or IoT data in general) can be seen as time- series data. − A sensor stream refers to a source that provide sensor data over time. − The data can be sampled/collected at a rate (can be also variable) and is sent as a series of values. − Over time, there will be a large number of data items collected. − Using time-series processing techniques can help to reduce the size of the data that is communicated; −Let’s remember, communication can consume more energy than communication;
  • 45. Sensor Data as time-series data − Different representation method that introduced for time-series data can be applied. − The goal is to reduce the dimensionality (and size) of the data, to find patterns, detect anomalies, to query similar data; − Dimensionality reduction techniques transform a data series with n items to a representation with w items where w < n. − This functions are often lossy in comparison with solutions like normal compression that preserve all the data. − One of these techniques is called Symbolic Aggregation Approximation (SAX). − SAX was originally proposed for symbolic representation of time-series data; it can be also used for symbolic representation of time-series sensor measurements. − The computational foot-print of SAX is low; so it can be also used as a an in-network processing technique.
  • 46. 46 In-network processing Using Symbolic Aggregate Approximation (SAX) SAX Pattern (blue) with word length of 20 and a vocabulary of 10 symbols over the original sensor time-series data (green) Source: P. Barnaghi, F. Ganz, C. Henson, A. Sheth, "Computing Perception from Sensor Data", in Proc. of the IEEE Sensors 2012, Oct. 2012. fggfffhfffffgjhghfff jfhiggfffhfffffgjhgi fggfffhfffffgjhghfff
  • 47. Symbolic Aggregate Approximation (SAX) − SAX transforms time-series data into symbolic string representations. − Symbolic Aggregate approXimation was proposed by Jessica Lin et al at the University of California –Riverside; − http://www.cs.ucr.edu/~eamonn/SAX.htm . − It extends Piecewise Aggregate Approximation (PAA) symbolic representation approach. − SAX algorithm is interesting for in-network processing in WSN because of its simplicity and low computational complexity. − SAX provides reasonable sensitivity and selectivity in representing the data. − The use of a symbolic representation makes it possible to use several other algorithms and techniques to process/utilise SAX representations such as hashing, pattern matching, suffix trees etc.
  • 48. Processing Steps in SAX − SAX transforms a time-series X of length n into the string of arbitrary length, where typically, using an alphabet A of size a > 2. − The SAX algorithm has two main steps: −Transforming the original time-series into a PAA representation −Converting the PAA intermediate representation into a string during. − The string representations can be used for pattern matching, distance measurements, outlier detection, etc.
  • 49. Piecewise Aggregate Approximation − In PAA, to reduce the time series from n dimensions to w dimensions, the data is divided into w equal sized “frames.” − The mean value of the data falling within a frame is calculated and a vector of these values becomes the data-reduced representation. − Before applying PAA, each time series to have a needs to be normalised to achieve a mean of zero and a standard deviation of one. −The reason is to avoid comparing time series with different offsets and amplitudes; Source: Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. 2003. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery (DMKD '03). ACM, New York, NY, USA, 2-11.
  • 50. SAX- normalisation before PAA Timeseries (c): 2, 3, 4.5, 7.6, 4, 2, 2, 2, 3, 1 Mean (μ): =μ (2+3+4.5+7.6+4+2+2+2+3+1)/10= 3.11 Standard deviation (σ): (2-3.11)2 = 1.2321 (3-3.11)2 = 0.0121 (4.5-3.11)2 = 1.9321 (7.6-3.11)2 = 20.1601 (4-3.11)2 = 0.7921 (2-3.11)2 = 1.2321 (2-3.11)2 = 1.2321 (2-3.11)2 = 1.2321 (3-3.11)2 = 0.0121 (1-3.11)2 = 4.4521 1.2321+0.0121+ 1.9321+ 20.1601+ 0.7921+ 1.2321+ 1.2321+ 1.2321+ 1.2321+ 0.0121+4.4521 = 33.5211 = √ (33.5211/10) = 1.83087683911σ
  • 51. Normalisation Timeseries (c): 2, 3, 4.5, 7.6, 4, 2, 2, 2, 3, 1 Normalised: zi = (ci – )/μ σ = 1.83087683911σ = 3.11μ z1 = (2- 3.11)/1.83087683911 = -0.606 z2 = (3-3.11)/ 1.83087683911= -0.600 z3 = (4.5-3.11)/ 1.83087683911= 2.452 z4 = (7.6-3.11)/ 1.83087683911= -0.600 z5 = (4-3.11)/ 1.83087683911= 0.486 z6 = (2-3.11)/ 1.83087683911= -0.606 z7 = (2-3.11)/ 1.83087683911= -0.606 z8 = (2-3.11)/ 1.83087683911= -0.606 z9 = (3-3.11)/ 1.83087683911= -0.600 z10 = (1-3.11)/ 1.83087683911= -1.152 Normalised Timeseries (z): -0.606, -0.600, 2.452, -0.600, 0.486, -0.606, -0.606, -0.606 , -0.600, -1.152
  • 52. PAA calculation Timeseries (c): 2, 3, 4.5, 7.6, 4, 2, 2, 2, 3, 1 Normalised Timeseries (z): -0.606, -0.600, 2.452, -0.600, 0.486, -0.606, -0.606, -0.606 , -0.600, -1.152 PAA (w=5): -0.603, 0.926, -0.06, -0.606, 0.273
  • 53. PAA to SAX Conversion − Conversion of the PAA representation of a time-series into SAX is based on producing symbols that correspond to the time-series features with equal probability. − The SAX developers have shown that time-series which are normalised (zero mean and standard deviation of 1) follow a Normal distribution (Gaussian distribution). − The SAX method introduces breakpoints that divides the PAA representation to equal sections and assigns an alphabet for each section. − For defining breakpoints, Normal inverse cumulative distribution function
  • 54. Breakpoints in SAX − “Breakpoints: breakpoints are a sorted list of numbers B = β 1,…, β a-1 such that the area under a N(0,1) Gaussian curve from βi to βi+1 = 1/a”. Source: Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. 2003. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery (DMKD '03). ACM, New York, NY, USA, 2-11.
  • 55. Alphabet representation in SAX − Let’s assume that we will have 4 symbols alphabet: a,b,c,d − As shown in the table in the previous slide, the cut lines for this alphabet (also shown as the thin red lines on the plot below) will be { -0.67, 0, 0.67 } Source: JMOTIF Time series mining, http://code.google.com/p/jmotif/wiki/SAX
  • 56. SAX Represetantion Timeseries (c): 2, 3, 4.5, 7.6, 4, 2, 2, 2, 3, 1 Normalised Timeseries (z): -0.606, -0.600, 2.452, -0.600, 0.486, -0.606, -0.606, -0.606 , -0.600, -1.152 PAA (w=5): -0.603, 0.926, -0.06, -0.606, 0.273 Cut off ranges: {-0.67, 0, 0.67} Alphabet: a ,b ,c, d SAX representation: bdbbc
  • 57. Features of the SAX technique −SAX divides a time series data into equal segments and then creates a string representation for each segment. −The SAX patterns create the lower-level abstractions that are used to create the higher-level interpretation of the underlying data. −The string representation of the SAX mechanism enables to compare the patterns using a specific type of string similarity function.
  • 58. Adaptive learning Slides: Daniel Puschmann, Frieder Ganz (now with Adobe)
  • 59. Creating Patterns- Adaptive sensor SAX 59 F. Ganz, P. Barnaghi, F. Carrez, "Information Abstraction for Heterogeneous Real World Internet Data”, IEEE Sensors Journal, 2013.
  • 60. Data abstraction 60 F. Ganz, P. Barnaghi, F. Carrez, "Information Abstraction for Heterogeneous Real World Internet Data", IEEE Sensors Journal, 2013.
  • 61. From patterns/events to a situation ontology 61 F. Ganz, P. Barnaghi, F. Carrez, "Automated Semantic Knowledge Acquisition from Sensor Data", IEEE Systems Journal, 2014.
  • 62. Learning ontology from sensory data 62
  • 63. KAT- Knowledge Acquisition Toolkit http://kat.ee.surrey.ac.uk/ F. Ganz, D. Puschmann, P. Barnaghi, F. Carrez, "A Practical Evaluation of Information Processing and Abstraction Techniques for the Internet of Things", IEEE Internet of Things Journal, 2015. 63
  • 65. Social data analysis: A case study Slides: Pramod Anantharam, Kno.e.sis Centre, Wright State University P. Anantharam, P. Barnaghi, K. Thirunarayan, A.P. Sheth, "Extracting City Traffic Events from Social Streams", ACM Trans. on Intelligent Systems and Technology, 2015.
  • 66. Social data analysis: A case study − Are people talking about city infrastructure on twitter? − Can we extract city infrastructure related events from twitter? − How can we leverage event and location knowledge bases for event extraction? − How well can we extract city events? Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
  • 67. Do people talk about city infrastructure on Twitter? 67Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
  • 68. Some challenges in extracting events from Tweets − No well accepted definition of “events related to a city”. − Tweets are short (140 characters) and its informal nature make it hard to analyze − Entity, location, time, and type of the event − Multiple reports of the same event and sparse report of some events (biased sample) − Numbers don’t necessarily indicate intensity − Validation of the solution is hard due to the open domain nature of the problem 68Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
  • 69. Event extraction techniques − N-grams + Regression − Text analysis to extract uni- and bi-grams (event markers) − Feature selection to select best possible event markers − Apply regression to predict P(Y|X) where Y is the target (rainfall) and X is the input (event marker). − Clustering − Create event clusters incrementally over time − Identify clusters of interest based on its relevance (manual inspection) − Granularity remains at the tweet/cluster level (tweets are assigned to clusters of interest) − Sequence Labeling (CRFs) − Text analysis to extract features such as named entities, POS1 tagging − Each event indicator is modeled as a mixture of event types that are latent variables − Each type corresponds to a distribution over named entities n (labels assigned to event types by manual inspection) and other features Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
  • 70. City Event Extraction − Event extraction should be open domain (no a priori event types) with event metadata. − Incorporate background knowledge related to city related events e.g., 511.org hierarchy, SCRIBE ontology, city location names. − Assess the intensity of an event using content and network cues. − Robust to noise, informal nature, and variability of data. Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
  • 71. City Infrastructure Tweets from a city POS Tagging POS Tagging Hybrid NER+ Event term extraction Hybrid NER+ Event term extraction GeohashingGeohashing Temporal Estimation Temporal Estimation Impact Assessment Impact Assessment Event Aggregation Event AggregationOSM LocationsOSM Locations SCRIBE ontologySCRIBE ontology 511.org hierarchy511.org hierarchy City Event Extraction City Event Extraction Solution ArchitectureCity Event Extraction Solution Architecture City Event Annotation P. Anantharam, P. Barnaghi, K. Thirunarayan, A.P. Sheth, "Extracting City Traffic Events from Social Streams", ACM Trans. on Intelligent Systems and Technology, 2015.
  • 72. City Event Extraction - Key Insights a) Space: events reported within a grid (gi G where G is a set∈ of all grids in a city)at a certain time are most likely reporting the same event b) Time: events reported within a time ∆t in a grid gi are most likely to be reporting the same event c) Theme: events with similar entities within a grid gi and time ∆t are most likely reporting the same event Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
  • 73. CRF formalisation – for annotation 73 A General CRF Model
  • 74. 0.6 miles Max-lat Min-lat Min-long Max-long 0.38 miles 37.7545166015625, -122.40966796875 37.7490234375, -122.40966796875 37.7545166015625, -122.420654296875 37.7490234375, -122.420654296875 4 37.74933, -122.4106711 Hierarchical spatial structure of geohash for representing locations with variable precision. Here the location string is 5H34 0 1 2 3 4 5 6 7 8 9 B C D E F G H I J K L 0 1 7 2 3 4 5 6 8 9 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8 City Event Extraction – Geohashing
  • 75. Implmentation − City Event Annotation − Automated creation of training data − Annotation task (our CRF model vs. baseline CRF model) − City Event Extraction − Use aggregation algorithm for event extraction − Extracted events AND ground truth − Dataset (Aug – Nov 2013) ~ 8 GB of data on disk − Over 8 million tweets − Over 162 million sensor data points − 311 active events and 170 scheduled events Source: Pramod Anantharam, Kno.e.sis Centre, Wright State University.
  • 76. Extracted Events and Ground Truth
  • 77. Automated creation of training data 77 Evaluation over 500 randomly chosen tweets from around 8,000 annotated tweets
  • 78. Conclusions − The ultimate goal is to transform raw data into insights and actionable information and/or create effective representations for machines and also human users. − This requires data from multiple sources, (near-) real time analytics and visualisation and/or semantic representations. − We need solutions such as streaming processing, pattern recognition, multi-view and deep learning. − Applications in areas such as intelligent urban development and combining physical-cyber-social data for developing smart city services, and healthcare services (e.g. elderly-care, remote patient monitoring). 78
  • 79. Acknowledgements − Some parts of the content are adapted from: − Holger Karl, Andreas Willig, Protocols and Architectures for Wireless Sensor Networks, Protocols and Architectures for Wireless Sensor Networks, chapters 3 and 12, Wiley, 2005 . − Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. 2003. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery (DMKD '03). ACM, New York, NY, USA, 2-11. − JMOTIF Time series mining, http://code.google.com/p/jmotif/wiki/SAX

Hinweis der Redaktion

  1. Publishing and upload the data into the Cloud platforms is not enough; The IoT data needs to be pre-processed and processed; the aim is to extract meaningful knowledge from the data and to use this knowledge and information for control and monitoring and/or for decision making (automated or human controlled).
  2. Classification is another method – but it is not feasible for the open domain problem we are interested in. CRFs – Conditional Random Fields Sequence Labeling Consider deeper features such as named entities, POS tags Captures context of mention of entities within a tweet Allows natural incorporation of domain knowledge
  3. Intensity vs. density =&amp;gt; tweets, page-rank analogous to importance of event, importance of events Importance of event based on location + time Contrasting density Noise Flood - people affected vs. people conversing
  4. Distance computed using the formula: dlon = lon2 - lon1 dlat = lat2 - lat1 a = (sin(dlat/2))^2 + cos(lat1) * cos(lat2) * (sin(dlon/2))^2 c = 2 * atan2( sqrt(a), sqrt(1-a) ) d = R * c (where R is the radius of the Earth) Found the box for the tweet! 37.7545166015625, -122.420654296875 37.7545166015625, -122.40966796875 37.7490234375, -122.40966796875 37.7490234375, -122.420654296875