1. Large-scale data analytics for smart
cities
1
Payam Barnaghi
Institute for Communication Systems (ICS)
University of Surrey
Guildford, United Kingdom
The Cyber-Physical Cloud Computing Workshop, August 2014, Osaka, Japan
2. 2
Things, Data, and lots of it
image courtesy: Smarter Data - I.03_C by Gwen Vanhee
3. Current focus on Big Data
− Emphasis on power of data and data mining
solutions
− Technology solutions to handle large volumes of
data; e.g. Hadoop, NoSQL, Graph Databases, …
− Trying to find patterns and trends from large
volumes of data…
4. Myths About Big Data
− Big Data is only about massive data volume
− Big Data means Hadoop
− Big Data means unstructured data
− If we have enough data we can draw conclusions
(enough here often means massive amounts)
− NoSQL means No SQL
− It is about increasing computational power and
taking more data and running data mining
algorithms.
4
Some of the items are adapted from: Brain Gentile, http://mashable.com/2012/06/19/big-data-myths/
5. What happens if we only focus on data
− Number of burgers consumed per day.
− Number of cats outside.
− Number of people checking their facebook
account.
− What insight would you draw?
5
6. Smart City Data
− Data is multi-modal and heterogeneous
− Noisy and incomplete
− Time and location dependent
− Dynamic and varies in quality
− Crowed sourced data can be unreliable
− Requires (near-) real-time analysis
− Privacy and security are important issues
− Data alone may not give a clear picture -we need
contextual information, background knowledge, multi-source
information and obviously better data analytics
solutions…
6
15. Some of the key issues
− Data collection, representation, interoperability
− Indexing, search and selection
− Storage and provision
− Stream analysis, fusion and integration of multi-source,
multi-modal and variable-quality data
− Aggregation, abstraction, pattern extraction and
time/location dependencies
− Adaptive learning models for dynamic data
− Reasoning methods for uncertain and incomplete data
− Privacy, trust, security
− Scalability and flexibility of the solutions
15
17. Data discovery in the IoT
17
Time
Location
Type
Query
pre -
procesing
Query
attributes Information
Repository (IR)
(archived data)
Discovery Server
# location
# type
(DS)
Gateway
Device/Sensor
domain
Network/Back-end
domain
Application/user
domain
| Type ]
[ # location |# Time
Distributed/scalable
18. Large-scale data discovery
18
time
location
type
[[##llooccaattiioonn || ##ttyyppee || ttiimmee]]
Query formulating
Discovery ID
Discovery/
DHT Server
Data repository
(archived data)
#location
#type
#location
#type
#location
#type
Gateway
Core network
Logical Connection
Network Connection
Data
Seyed Amir Hoseinitabatabaei, Payam Barnaghi, Chonggang Wang, Rahim Tafazolli,
Lijun Dong, "A Distributed Data Discovery Mechanism for the Internet of Things", 2014.
19. Data abstraction
19
F. Ganz, P. Barnaghi, F. Carrez, "Information Abstraction for Heterogeneous Real World Internet Data", IEEE Sensors Journal, 2013.
22. Social media analysis (collaboration with Kno.e.sis)
22
Tweets from a city
City Infrastructure
https://osf.io/b4q2t/
P. Anantharam, P. Barnaghi, K. Thirunarayan, A. Sheth, "Extracting city events from social streams,“, under review, 2014.
24. Equilibrium in transient and non-uniform
world
A D
B C
Image source for equilibrium diagram: John D. Hey, The University of York.
25. Data analytics framework
Ambient
Intelligence
Social
systems Interactions Interactions
25
Data Data
Data:
Domain
Knowledge
Domain
Knowledge
Social
systems
Open
Interfaces
Open
Interfaces
Ambient
Intelligence
Quality and
Trust
Quality and
Trust
Privacy and
Security
Privacy and
Security
Open Data Open Data
26. 101 Smart City Use-case Scenarios
http://www.ict-citypulse.eu/page/content/smart-city-use-cases-and-requirements
27. In Conclusion
− Smart cities are complex social systems and no technological and data-analytics-
driven solution alone can solve the problems.
− Combination of data from Physical, Cyber and Social sources can give more
complete, complementary data and contributes to better analysis and
insights.
− Intelligent processing methods should be adaptable and handle dynamic,
multi-modal, heterogeneous and noisy and incomplete data.
− Effective visualisation and interaction methods are also key to develop
successful solutions.
− There are several solution for different parts of a data analytics framework in
smart cities. An integrated approach is more effective in which IoT devices,
communication networks, data analytics and learning algorithms and
methods, services and interaction and visualistions and methods (and their
optimisation algorithms) can work and cooperate together.
27