Discovering Things and
Things’ data/services
1
Payam Barnaghi
Centre for Communication Systems Research (CCSR)
Faculty of Engineering and Physical Sciences
University of Surrey
Guildford, United Kingdom
Internet of Things
RFID oriented WSAN oriented,
Distributed WANs,
Communication
technologies, energy
efficiency, routing, …
Smart Devices/
Web-enabled
Apps/Services, initial
products,
vertical applications,
concepts and demos, …
Motion sensor
Motion sensor
ECG sensor
Physical-Cyber-Social
Data, Linked-data,
semantics, M2M,
More products, more
heterogeneity,
control and monitoring, …
Future: Cloud, Big (IoT) Data
Analytics, Interoperability,
Enhanced Cellular/Wireless Com.
for IoT, Real-world operational
use-cases and commercial
services/applications,
more Standards…
We have lots of things,
large volumes of data and/or services
related to things
But Web search is mainly tuned for:
Text-based data, archival data
Web search engines are often
Information locators rather than
information discovery.
Google knowledge graph, Wolfram
alpha are some examples
towards information/knowledge
discovery.
10
Thing’s Data
time
location
type
Query formulating
[#location | #type | time][#location | #type | time]
Discovery ID
Discovery/
DHT Server
Data repository
(archived data)
#location
#type
#location
#type
#location
#type
Gateway
Core network
Network Connection
Logical Connection
Data
11
Query
− The typical types of data query for sensory data:
− Query based on
− Location
− Type
− Time (freshness of data/historical data)
− One of the above + Value range [+ Unit of Measurement]
− Type/Location/Time + A combination of Quality of Information
attributes
− An entity of interest (a feature of an entity on interest)
12
Types of queries
− Exact Query
− Q (target, metadata) both target and metadata are known
− Target, Type, Location, Time
− Meta data: QoI/Unit attributes
− Proximate Query
− Q (target, metadata)
− e.g. approximate Location (location range)
− QoI range
− Range Query
− Q (target, metadata)
− Time Range
− Queries can be Ad-hoc or they can be based on Pub/Sub
13
Hashing and Indexing
− One method is that each node (Gateway?) contains its own index and
search mechanism
− Large decentralised data/index structure
− Using distributed hash table
− Using Hashing the key(s) and querying the network to find the node that contains
the key
− In conventional ICN often one dimensional key space
− In M2M/IoT we need multi-dimensional hash/key space
− Proposal: Hashing Type and Location
− But then the key challenge is how to decide where to look for data
− Split the space
− Duplicate the query
− How to split the space
− Location data
− Type
− Hierarchical index (hash)
How to index, search and discover:
-Dynamic
- Multi-modal,
- and large-scale (streaming) data
Common Data Models
− (semantic) models (W3C SSN, HyperCat, …)
− SensorML, OGC/SWE models
− Several other ontologies/Semantic models
15
16
SSN Ontology
Ontology Link: http://www.w3.org/2005/Incubator/ssn/ssnx/ssn
M. Compton, et al, "The SSN Ontology of the W3C Semantic Sensor Network Incubator Group", Journal of Web Semantics, 2012.
Stream annotation
17
Sefki Kolozali, Maria Bermudez-Edo, Daniel Puschmann, Frieder, Ganz, Payam Barnaghi, “A Knowledge-based Approach for Real-
Time IoT Data Stream Annotation and Processing”, IEEE iThings 2014.
Data Discovery
- Mechanisms that enable the clients to access the IoT
data without requiring knowing the actual source of
information
−Index the available data
−Heterogeneous
−Distributed
−Large scale
−Dynamic
−Updates the indices
−Process the user queries
−Search and discover the IoT data
18
Data Discovery Challenges
− Indexing each individual data point is computationally
expensive and maintaining these indices across the
network is problematic
− Dynamicity, mobility and unreliability of the data
attributes requires the indices to be updated
frequently which in turn adds considerable traffic to
the network
− Searching the attribute space at DS level could be
computationally expensive
Data discovery in IoT: A schematic view
20
Time
Location
Type
Query
pre-
processing
Query
attributes Information
Repository (IR)
(archived data)
# location
# type
Discovery Server
(DS)
Gateway
Device/Sensor
domain
Network/Back-end
domain
Application/user
domain
[#location|#Time
|Type]
Distributed/scalable
Meta-data (semantics) plays a key role
But:
- Current solutions are often centralised
- Use logical reasoning, graph
processing
- Scalability, especially with large set of
updates, is a key challenge
Looking back, looking forward
− Data Modelling, semantics are important
− Attribute indexing/selection using the semantics
− How to index/discover the distributed data?
− Data/index distribution
− Effective semantics and efficient use of semantics
− Reasoning and query processing mechanisms
− Data abstraction and pre-processing techniques
22
Looking back, looking forward
Data/service discovery is a step forward but the
key goal is:
information extraction and knowledge discovery
23
Large-scale data discovery
24
time
location
type
Query formulating
[#location | #type | time][#location | #type | time]
Discovery ID
Discovery/
DHT Server
Data repository
(archived data)
#location
#type
#location
#type
#location
#type
Gateway
Core network
Network Connection
Logical Connection
Data
Seyed Amir Hoseinitabatabaei, Payam Barnaghi, Chonggang Wang, Rahim Tafazolli,
Lijun Dong, "A Distributed Data Discovery Mechanism for the Internet of Things", 2014.