1. Data Processing
in Cyber-Physical Systems
Bob Marcus
Co-Chair NIST Big Data PWG
robert.marcus@et-strategies.com
Caveat: This is a rough first cut and will be revised extensively!
Thursday, June 2, 16
2. Key Points on CPS Data Processing - Initial Thoughts
• Large scale Cyber-Physical Systems (CPS) ingest physical data, perform analytics, and
generate commands that initiate actuator actions
• Due to the performance requirements, data must be processed in a timely fashion at an
appropriate level in a CPS hierarchical architecture
• The results of lower level data processing can be sent to higher levels for deeper analytics
• Lower level data processing can also be used to filter and/or summarize sensor data to
reduce data processing and storage requirements at higher levels
• A wide range of analytics will be necessary for robust large-scale CPS including descriptive,
predictive, and prescriptive analytics; event processing, machine learning, cognitive
processing, and human analysis
• Data interoperability standards will be necessary for interfacing heterogeneous systems and
data processing nodes. See the slide set on “Standards and Open Source for CPS” at
http://www.slideshare.net/bobmarcus/standards-and-open-source-for-big-data-cloud-and-iot
• Security and Privacy considerations will constrain access to data. See the slide set on
“Security in CPS” at http://www.slideshare.net/bobmarcus/security-in-cyberphysical-systems
Thursday, June 2, 16
3. Outline of Presentation
1.Terminology
2. Data Processing Architectures Alternatives
- IoT to Cloud with no Fog
- IoT to Fog with no Cloud
- IoT to Fog to Cloud (3 layer)
3. Definitions of Different Data Processing Types
- Device Data Processing (e.g in IoT)
- Data Flow Processing (e.g. in Fog)
- Analytics Data Processing (e.g in Cloud)
- Multilayer Data Processing (e.g. in IoT, Fog, Cloud)
- Big Data + Cloud Computing based Requirements and Capabilities from ITU
4. References
Key Question: Where should each data processing type be performed?
Thursday, June 2, 16
4. Terminology
Cloud - One or more data centers supporting scalable storage and processing
Things - Devices that interact with physical world (e.g. sensors, actuators)
Internet of Things - Devices and local cyber components connected to networks
Analytics - Processing of data to understand past events and support future decisions
Big Data - Data that requires parallel or distributed processing for analytics
Middle Layer - Cyber components between devices and Clouds
Cloud Facing Sublayer - Middle Layer components interfacing with Cloud
Device Facing Sublayer - Middle Layer components interfacing with Devices in IoT
Thursday, June 2, 16
5. Terminology continued
・Middle Layer - Cyber components between devices and Cloud Data Centers
IoT Device Gateways - Connect devices to network nodes and/or data centers
Edge/Fog Computing - Distributes some of the resources and services of computation,
communication, control, and storage away from Cloud and closer to devices and gateways
IoT Cloud API Gateways - Connects Cloud with intermediate network node and/or device gateways
Hubs - Manage devices, collect device data, process device data, and distribute data and messages
Hub Locations - Hubs can be located in the IoT, Fog and/or Cloud
Hub Roles - Key differentiator for architectures is Hubs locations, capabilities, and interactions
Hub Capabilities - Hubs can have different capabilities based on location
Hub Sizes - There may be many small IoT hubs, fewer larger Fog Hubs, and several very large Cloud Hubs
Hub Standardizations - Standardized Hub interfaces and capabilities will enable open architectures
Thursday, June 2, 16
7. Seven Key Categories for IoT Infrastructure
From www.networkcomputing.com/internet-things/10-leaders-internet-things-infrastructure/1612927605
• Security and privacy
• Data analytics and management
• Data integration (sharing of data across a massive number of devices)
• Governance (new rules and processes)
• Data transportation (bandwidth and pipes required to transport data
between devices and compute engines)
• Computing near the data (as large amounts of data get created it is better to
bring computing closer to the data)
• Power (powering 25 billion-plus devices)
Thursday, June 2, 16
8. Alternative Data Processing Architectures
・IoT to Cloud with no Middle Layer
- IoT Gateways transform and transmit data to Cloud
- Majority of data processing in Cloud
- Cloud applications transmit commands and queries to IoT Gateways
・IoT to Middle Layer with no Cloud
- IoT Gateways transform and transmit data to distributed hubs
- Majority of data processing is in Hubs
- Hubs are networked and communicate when necessary
・Three Layer: IoT to Middle Layer to Cloud
- Data flows from IoT Gateways to Middle Layer to Cloud
- Data processing are divided among the three layers
- System management and command decisions are divided among the layers
Thursday, June 2, 16
9. Internet of Things to Cloud
Gateways
Clouds
Centralized Processing with Large Data Flows
Applications
and End-users
Thursday, June 2, 16
10. Areas of Concern in Connecting IoT Directly to the Cloud
From https://www.usenix.org/system/files/conference/hotcloud15/hotcloud15-zhang.pdf
1. Privacy and Security
2. Scalability
3. Modeling: Peripheral devices are physical
4. Latency:The cloud model differs from reality
5. Bandwidth: Upstream traffic dominates
6. Quality of Service (QoS) Guarantees
7. Durability Management
Thursday, June 2, 16
11. Fog Computing with no Cloud
Device Facing Nodes
Hub Nodes
Decentralized Processing with Reduced Data Flows
Applications
and End-users
Thursday, June 2, 16
12. Benefits of Fog Computing from Cisco
From http://www.dataversity.net/the-future-of-cloud-computing-fog-computing-and-the-internet-of-things/
Benefits
• Cost: The bandwidth required for regularly transmitting decentralized data (which could
originate from anywhere in the country or in the world) to centralized locations is
expensive and can create bottlenecks as various enterprise use cases contend for those
same resources. Fog Computing requires significantly less movement of data, which frees
up the network for other uses.
• Expedience: By processing data closer to its source, Fog Computing can significantly
expedite computations and processes—enabling organizations to go from chimeric ‘near
real-time’ processing speeds to true real-time processing. Again, the proliferation of
mobile devices and demands projected for the IoT make time a critical component of
service delivery and customer satisfaction. IoT applications such as vehicle to vehicle
communication require the least amount of latency as possible.
• Security and Governance: The less frequently and the less distance that data has to
travel, the more secure it is. Additionally, there are strict regulatory requirements about
where data is stored and accessed (which vary by industry and country) to which local Fog
Computing at the extremities of the Cloud can innately conform.
Thursday, June 2, 16
13. Drawbacks of Fog Computing from Cisco
From http://www.dataversity.net/the-future-of-cloud-computing-fog-computing-and-the-internet-of-things/
Drawbacks
• Physical locality: There are some who would argue that the whole point of utilizing the
Cloud is to access data and resources from anywhere, regardless of physical location.
Although Fog Computing merely functions as a more selective way of ascertaining which
data becomes centralized and which stays local, some perceive that the limitations of the
latter are disadvantageous in terms of access.
• Security: Security has long been regarded as the Achilles heel of the Cloud, but with a
number of developments in this space within the past several years, issues of security
really amount to a matter of trust. Certain organizations feel more comfortable having
their data in a centralized location rather in remote, disparate ones—although the former
option can exacerbate Data Governance when considered on a global scale.
• Confusion: There is also the perspective that facilitating Fog Computing merely adds to
the number of Cloud options (public, private, hybrids, cloudlets, etc.) and is needlessly
complicating architecture that is already complex enough. Conceivably, such pundits
would harbor the same opinion about the IoT in general.
Thursday, June 2, 16
14. Middle Layer Between IoT Devices and Clouds
Clouds
Cloud Facing Nodes
Internal Nodes
Device Facing Nodes
Centralized Processing as needed with Decentralized Processing to Reduced Data Flows
Applications
and End-users
Thursday, June 2, 16
15. Three Layer CPS Architecture Data Processing by Bob Marcus
1. Internet of Things - Data is collected from devices attached to the physical
world. Initially the data is processed on local nodes for data conversion, anomaly
detection, real-time responses, distribution to more remote nodes, etc.
2. Fog/Edge Computing - Data flows through network nodes.The data is
processed for data transformation, data filtering, event processing, command
generation to IoT, query responses, data storage, alert generation, distribution to
Cloud/data center, etc.
3. Cloud Computing - Data is ingested and stored in the Cloud.The data is used
for many types of analytics and is made available to applications and end-users. The
results of data processing are used to generate messages, queries, and commands to
other layers etc.
Thursday, June 2, 16
16. Device to Cloud Data Processing
From http://www.dedicatedcomputing.com/Internet-of-Things-Capabilities/Internet-of-Things-Capabilities-%281%29
Thursday, June 2, 16
17. Main Problems faced by the IoT today
1. Where to store the data circulating within the network?
2. How to maintain safe storage of data?
3. How will particular pieces of data sensibly interact?
4. How to maintain continuity and integrity of data?
5. The information entropy issue. (Higher entropy for data codes is more secure)
6. What is more important: fast and scalable protocol or data privacy?
From forklog.net/the-way-iot-and-blockchain-change-the-world/
Thursday, June 2, 16
18. From Fast Data and Enterprise Architecture e-book at
https://drive.google.com/file/d/0B7BBDfRwGErZQm1CV2VHcnlCVWM/view
Example of Response Times Needed for Data Processing
Thursday, June 2, 16
19. Multi-Level Data Processing from IBM
From http://www.slideshare.net/Cisco/building-the-internet-of-things-trusted-security
Thursday, June 2, 16
20. Example: Smart City Architecture with Middle Layer from U of RI
From http://dl.acm.org/citation.cfm?id=2818898&CFID=733103700&CFTOKEN=57270032 (AMS Digital Library)
Thursday, June 2, 16
30. Data Processing Types for Cyber-Physical Systems
Key Question: Where should each processing type be performed?
・Device Data Processing (e.g. in IoT)
- Sensor data processing
- User inputs processing (keyboard, speech, images, video)
- Anomaly detection
- Data logging for sensor data processing
- Data conversion gateways
・Data Flow Processing (e.g. in Fog)
- Message generation and processing
- Query generation and processing
- Event processing
- Data filtering, fusion, transformation, distribution and virtualization
- Distributed data storage for data flow processing
・Analytics (e.g. in Cloud)
- Streaming analytics
- Distributed processing (batch, interactive, streaming)
- Descriptive analytics
- Diagnostic Analytics
- Predictive analytics
- Prescriptive analytics
- Exploratory analytics
- Machine learning
- Scalable long term data storage for analytics
Thursday, June 2, 16
33. IoT Data Processing for Cyber-Physical Systems from Bob Marcus
・Sensor data ingestion, logging, and initial processing
・User inputs processing (keyboard, speech)
・Data transformation
・Data filtering
・Event detection and rapid real-time response
・Anomaly detection
・Data caching for sensor data processing
Thursday, June 2, 16
34. Key Categories for IoT Infrastructure
• Security and privacy
• Data analytics and management
• Data integration (sharing of data across a massive number of devices)
• Governance (new rules and processes)
• Data transportation (bandwidth and pipes required to transport data between devices
and compute engines)
• Computing near the data (as large amounts of data get created it is better to bring
computing closer to the data)
• Power (powering 25 billion-plus devices)
From www.networkcomputing.com/internet-things/10-leaders-internet-things-infrastructure/1612927605
Thursday, June 2, 16
35. Why IoT Data is Different
From http://cloudcomputing.sys-con.com/node/3253081
Thursday, June 2, 16
37. Some Examples of the Many Different Sensors
From http://www.electrical4u.com/sensor-types-of-sensor/
Thursday, June 2, 16
38. https://en.wikipedia.org/wiki/Sensor https://en.wikipedia.org/wiki/Sensor_grid https://en.wikipedia.org/wiki/Sensor_fusion
https://en.wikipedia.org/wiki/Data_acquisition https://en.wikipedia.org/wiki/Digital_image_processing
From https://en.wikipedia.org/wiki/Sensor_node
“Sensors are hardware devices that produce a measurable response to a change in a physical
condition like temperature or pressure. Sensors measure physical data of the parameter to be
monitored.The continual analog signal produced by the sensors is digitized by an analog-to-digital
converter and sent to controllers for further processing.A sensor node should be small in size,
consume extremely low energy, operate in high volumetric densities, be autonomous and operate
unattended, and be adaptive to the environment.”
Example: From http://lifesciences.ieee.org/publications/newsletter/june-2013/343-embedded-
computing-frameworks-for-body-sensor-networks
Signal Processing in a Node Environment (SPINE)
Sensor Data Processing
Thursday, June 2, 16
40. Anomaly Detection
“.. anomaly detection (or outlier detection) is the identification of items, events or
observations which do not conform to an expected pattern or other items in a dataset.”
From https://en.wikipedia.org/wiki/Anomaly_detection
Example: From http://archive.his.se/english/research/infofusion/research/scenarios-and-
projects/gsa/gsa2-visualization/index.html
Anomaly Detection for Situation Awareness
Thursday, June 2, 16
41. User Input Processing (keyboard, speech, video, image)
https://en.wikipedia.org/wiki/Speech_recognition
https://en.wikipedia.org/wiki/Text_processing
https://en.wikipedia.org/wiki/Video_processing
https://en.wikipedia.org/wiki/Image_processing
Example: From http://www.w3.org/TR/mmi-framework/
Multimodal user interactions from W3
Thursday, June 2, 16
43. Fog Data Processing for Cyber-Physical Systems
・Streaming analytics
・Storage of localized data for security and privacy
・Data filtering
・Anomaly detection and quick response
・Data fusion
・Data virtualization
・Complex event processing
・Query generation and processing
・Data transformation
・Distributed data storage for data flow processing
Thursday, June 2, 16
44. Data Distribution Service (DDS) for Internet of Things
From http://blog.omg.org/2014/12/omgs-data-distribution-service-the-internet-of-things-fabric.htm
Thursday, June 2, 16
47. OneM2M in Standardization Landscape
From http://www.slideshare.net/motive_alu/alcatel-lucent-motive-team-motivation-2013-onem2m-global-standards
Thursday, June 2, 16
48. Message Generation and Processing
From https://en.wikipedia.org/wiki/Message-oriented_middleware
“Message-oriented middleware (MOM) is software or hardware infrastructure supporting
sending and receiving messages between distributed systems. MOM allows application
modules to be distributed over heterogeneous platforms and reduces the complexity of
developing applications that span multiple operating systems and network protocols.”
Example: From http://www.slideshare.net/BrianPulito/could-iot-be-webrtcs-greatest-source-of-innovation
IoT Message Broker Architecture from IBM
Thursday, June 2, 16
49. Data Filtering
From https://www.techopedia.com/definition/26202/data-filtering
“Data filtering in IT can refer to a wide range of strategies or solutions for refining data sets.This
means the data sets are refined into simply what a user (or set of users) needs, without including
other data that can be repetitive, irrelevant or even sensitive.”
Example: From: http://www.ags.gov.ab.ca/publications/wcsb_atlas/a_ch35/ch_35.html
Geological Data Filtering
Thursday, June 2, 16
50. DataVirtualization
From https://en.wikipedia.org/wiki/Data_virtualization
“Data virtualization is any approach to data management that allows an application to
retrieve and manipulate data without requiring technical details about the data, such as
how it is formatted or where it is physically located”
Example: From http://www.denodo.com/en/data-virtualization/overview
DataVirtualization from Denodo
Thursday, June 2, 16
51. Data Fusion
http://www.hindawi.com/journals/tswj/2013/704504/
From https://en.wikipedia.org/wiki/Data_fusion
“Data fusion is the process of integration of multiple data and knowledge representing the same
real-world object into a consistent, accurate, and useful representation.Data fusion processes are
often categorized as low, intermediate or high, depending on the processing stage at which fusion
takes place”
Example: From http://www.arlut.utexas.edu/sisl/ISD.htm
Threat Detection using Sensor Data Fusion from the University of Texas
Thursday, June 2, 16
52. Challenges in Data Integration from NIST
From http://www.cpspwg.org/Portals/3/docs/CPS%20PWG%20Draft%20Framework%20for%20Cyber-Physical%20Systems%20Release%200.8%20September%202015.pdf
• Data fusion that is done at any time from multiple sensor or source types, or use of a single data
stream for diverse purposes
• Data fusion of streaming data and predictive analytics capabilities
• Complex data paths that cross-scale and cross-level connecting architectural layers, dedicated
systems, connected infrastructure, systems of systems, and networks
• Data-driven interactions between dependent and independent CPS
• Privacy-protecting data policies and procedures in light of the ubiquitous nature of IoT
• Data interoperability issues including metadata, identification of type and instance, data quality and
provenance, timing, governance, and privacy and cybersecurity
Thursday, June 2, 16
53. JDL Data Fusion Framework from DoD
From https://s3.amazonaws.com/nist-sgcps/cpspwg/pwgglobal/CPS_PWG_Draft_Framework_for_Cyber-Physical_Systems_Release_0_8_September_2015.pdf
From Section 4.5.2.1 Data Fusion
Thursday, June 2, 16
54. Event Processing
From https://en.wikipedia.org/wiki/Complex_event_processing
”Event processing is a method of tracking and analyzing (processing) streams of information
(data) about things that happen (events), and deriving a conclusion from them. Complex event
processing, or CEP, is event processing that combines data from multiple sources to infer
events or patterns that suggest more complicated circumstances.
From https://en.wikipedia.org/wiki/Event-driven_architecture
From http://www.fujitsu.com/global/about/resources/news/press-releases/2011/1216-02.html
Complex Event Processing Architecture from Fujitu
Thursday, June 2, 16
55. Query Generation and Processing
https://en.wikipedia.org/wiki/Information_retrieval
From http://www.webopedia.com/TERM/Q/query.html
“A query is a request for information from a database “
Example: From http://www.slideshare.net/PayamBarnaghi/semantic-technologies-for-the-
internet-of-things-challenges-and-opportunities
Query Processing IoT Framework from University of Surrey
Thursday, June 2, 16
58. Distributed Data Stores
From https://en.wikipedia.org/wiki/Distributed_database
“A distributed database is a database in which storage devices are not all attached
to a common processing unit such as the CPU,[1] and which is controlled by a
distributed database management system “
Example: From https://medium.com/aws-activate-startup-blog/distributed-data-stores-for-mere-
mortals-994945c0c2d1
Distributed Data Store Alternatives
Thursday, June 2, 16
60. Linked Sensor Framework
From https://www.deri.ie/sites/default/files/publications/decision-support-using-linked-social-and-sensor-data.pdf
Thursday, June 2, 16
61. Global Sensor Network (GSN) Overview
From http://www.slideshare.net/jpcik/gsn-global-sensor-networks
Thursday, June 2, 16
62. Global Sensor Network (GSN)
From http://www.slideshare.net/jpcik/gsn-global-sensor-networks
Thursday, June 2, 16
63. XGSN High Level Architecture
From http://ceur-ws.org/Vol-1401/paper-04.pdf
Thursday, June 2, 16
65. Cloud Data Processing for Cyber-Physical Systems
・Streaming analytics
・Descriptive analytics
・Diagnostic analytics
・Predictive analytics
・Prescriptive analytics
・Exploratory analytics
・Data fusion
・Data virtualization
・Complex event processing
・Query generation and processing
・Distributed processing (batch, interactive, streaming)
・Machine learning
・Scalable long term data storage for analytics
・Data Transformation
Thursday, June 2, 16
66. Priority Issues for NSF Big Data Hubs
From http://www.nsf.gov/news/news_summ.jsp?cntn_id=136784
Thursday, June 2, 16
67. Levels of Analytics from Mnubo
From http://mnubo.com/wp-content/uploads/2016/01/mnubo-overview-rev3-1.pdf
Thursday, June 2, 16
68. Lambda Architecture for IoT Data Processing
From https://www.talend.com/blog/2015/07/15/hadoop-summit-2015-takeaway-the-lambda-architecture
Thursday, June 2, 16
69. Lambda Architecture combining Batch and Real-Time Data
From https://www.mapr.com/developercentral/lambda-architecture
Thursday, June 2, 16
70. IoT Data Management Framework from EU IERC
From http://tinyurl.com/zs5g8qx
Thursday, June 2, 16
71. IoT Interoperability Research Topics from EU IERC
From www.internet-of-things-research.eu/pdf/Building_the_Hyperconnected_Society_IERC_2015_Cluster_eBook_978-87-93237-98-8_P_Web.pdf
Thursday, June 2, 16
72. IoT Data Life Cycle from EU IERC
From www.internet-of-things-research.eu/pdf/Building_the_Hyperconnected_Society_IERC_2015_Cluster_eBook_978-87-93237-98-8_P_Web.pdf
Thursday, June 2, 16
73. Data Analytics Levels in an IoT Framework from EU IERC
From www.internet-of-things-research.eu/pdf/Building_the_Hyperconnected_Society_IERC_2015_Cluster_eBook_978-87-93237-98-8_P_Web.pdf
Thursday, June 2, 16
74. Intelligent Reasoning over IoT Data from EU IERC
From www.internet-of-things-research.eu/pdf/Building_the_Hyperconnected_Society_IERC_2015_Cluster_eBook_978-87-93237-98-8_P_Web.pdfS
Thursday, June 2, 16
75. IoT Big Data Applications from EU IERC
From www.internet-of-things-research.eu/pdf/Building_the_Hyperconnected_Society_IERC_2015_Cluster_eBook_978-87-93237-98-8_P_Web.pdf
Thursday, June 2, 16
76. Internet of Things Cloud from IBM
From www.internet-of-things-research.eu/pdf/Building_the_Hyperconnected_Society_IERC_2015_Cluster_eBook_978-87-93237-98-8_P_Web.pdfS
Thursday, June 2, 16
77. Scalable Long Term Data Stores for Analytics
https://en.wikipedia.org/wiki/Cloud_database
“A cloud database is a database that typically runs on a cloud computing platform”
https://en.wikipedia.org/wiki/Cloud_storage
“Cloud storage is a model of data storage in which the digital data is stored in logical
pools, the physical storage spans multiple servers (and often locations), and the physical
environment is typically owned and managed by a hosting company”
Example: From https://databasesincloud.wordpress.com/2011/05/25/talking-sql-to-nosql-data-stores-part-2/
SQL Interfaces to Multiple Cloud Data Stores from Toad
Thursday, June 2, 16
78. Distributed Batch Processing
http://www.slideshare.net/FerranGalReniu/distributed-batch-processing-with-hadoop-30157636
https://en.wikipedia.org/wiki/Bulk_synchronous_parallel
http://www.slideshare.net/EdurekaIN/spark-webinar29-june
From https://en.wikipedia.org/wiki/Batch_processing
“Batch processing is the execution of a series of programs ("jobs") on a computer without manual
intervention. Jobs are set up so they can be run to completion without human interaction.All input
parameters are predefined through scripts, command-line arguments, control files, or job control language.“
From https://en.wikipedia.org/wiki/Distributed_computing
“A distributed system is a software system in which components located on networked computers
communicate and coordinate their actions by passing messages”
Example: From http://opensourceforu.efytimes.com/2011/03/mapreduce-more-power-less-code-hadoop/
Map-Reduce Distributed Batch Flow
Thursday, June 2, 16
79. Multiple Paradigm Distributed Processing (e.g. Spark)
http://spark.apache.org/
From https://en.wikipedia.org/wiki/Apache_Spark
Apache Spark is an open source cluster computing framework originally developed in the AMPLab at University of
California, Berkeley but was later donated to the Apache Software Foundation. In contrast to Hadoop's two-stage
disk-based MapReduce paradigm, Spark's multi-stage in-memory primitives provides performance up to 100 times
faster for certain applications. By allowing user programs to load data into a cluster's memory and query it
repeatedly, Spark is well-suited to machine learning algorithms
Example: From http://www.slideshare.net/databricks/unified-big-data-processing-with-apache-spark-qcon-2014
Unified Data Processing with Spark
Thursday, June 2, 16
81. Analytics and Machine Learning by Bob Marcus
Streaming Analytics
Predictive Analytics
Prescriptive Analytics
Exploratory Analytics
Descriptive and
Diagnostic Analytics
Past
Possibilities
Recommendations
Present Future
Data Sources
Data
Data Data
Data
Data
Machine
Learning
or Manual
Program
Algorithms
Algorithms
Algorithms
Algorithms
Algorithms
Training Data
Thursday, June 2, 16
82. Streaming Analytics
From https://www.informatica.com/resources.asset.d3dca81ab5ac013d3d1ff05788de363f.pdf
“Software that can filter, aggregate, enrich, and analyze a high throughput of data from multiple
disparate live data sources and in any data format to identify simple and complex patterns to
visualize business in real-time, detect urgent situations, and automate immediate actions”.
https://en.wikipedia.org/wiki/Data_stream_mining
Example: From https://developer.ibm.com/bluemix/2015/07/29/ibm-streaming-analytics-now-available-bluemix/
Streaming Analytics from IBM
Thursday, June 2, 16
84. Descriptive and Diagnostic Analytics
From http://whatis.techtarget.com/definition/descriptive-analytics
“Descriptive analytics is a preliminary stage of data processing that creates a summary of historical data to
yield useful information and possibly prepare the data for further analysis.”
From http://www.gartner.com/it-glossary/diagnostic-analytics
“Diagnostic Analytics is a form of advance analytics which examines data or content to answer the question
“Why did it happen?”, and is characterized by techniques such as drill-down, data discovery, data mining and
correlations.”
From http://www.gartner.com/it-glossary/predictive-analytics
Descriptive to Prescriptive Analytics
Thursday, June 2, 16
85. Predictive Analytics
Fromhttps://en.wikipedia.org/wiki Predictive_analytics
“Predictive analytics encompasses a variety of statistical techniques .. that analyze current
and historical facts to make predictions about future, or otherwise unknown, events.”
From https://en.wikipedia.org/wiki/Prescriptive_analytics
“Predictive analytics answers the question what will happen.This is when historical
performance data is combined with rules, algorithms, and occasionally external data to
determine the probable future outcome of an event or the likelihood of a situation
occurring.”
From http://www.butleranalytics.com/sas-predictive-analytics-2014/
Thursday, June 2, 16
86. From https://en.wikipedia.org/wiki/Prescriptive_analytics
“Prescriptive analytics automatically synthesizes big data, multiple disciplines of
mathematical sciences and computational sciences, and business rules, to make predictions
and then suggests decision options to take advantage of the predictions.“
“... goes beyond predicting future outcomes by also suggesting actions to benefit from the
predictions and showing the implications of each decision option”
Prescriptive Analytics
Thursday, June 2, 16
87. Exploratory Analytics
From https://en.wikipedia.org/wiki/Data_mining
“Data mining is an interdisciplinary subfield of computer science. It is the computational
process of discovering patterns in large data sets ("big data") involving methods at the
intersection of artificial intelligence, machine learning, statistics, and database systems.”
From https://en.wikipedia.org/wiki/Exploratory_data_analysis
“.. exploratory data analysis (EDA) is an approach to analyzing data sets to summarize
their main characteristics, often with visual methods”. See diagram below.
Thursday, June 2, 16
88. Machine Learning
https://en.wikipedia.org/wiki/List_of_machine_learning_concepts
From https://en.wikipedia.org/wiki/Machine_learning
“Machine learning is a subfield of computer science that evolved from the study of pattern recognition and computational
learning theory in artificial intelligence. Machine learning explores the study and construction of algorithms that can learn
from and make predictions on data. Such algorithms operate by building a model from example inputs in order to make
data-driven predictions or decisions, rather than following strictly static program instructions.”
Example: From http://peekaboo-vision.blogspot.com/2013/01/machine-learning-cheat-sheet-for-scikit.html
Choosing a Machine Learning Algorithm
Thursday, June 2, 16
89. Machine Learning Mind Map
From “A Tour of Machine Learning Algorithms” at machinelearningmastery.com
Thursday, June 2, 16
90. Top 10 Machine Learning Algorithms
From http://www.datasciencecentral.com/forum/topics/0-top-machine-learning-algorithms
and https://machinelearningmastery.com/master-machine-learning-algorithms/
Linear Algorithms:
• Algorithm 1: Linear Regression
• Algorithm 2: Logistic Regression
• Algorithm 3: Linear Discriminant Analysis
Nonlinear Algorithms:
• Algorithm 4: Classification and Regression Trees
• Algorithm 5: Naive Bayes
• Algorithm 6: K-Nearest Neighbors
• Algorithm 7: LearningVector Quantization
• Algorithm 8: SupportVector Machines
Ensemble Algorithms:
• Algorithm 9: Bagged Decision Trees and Random Forest
• Algorithm 10: Boosting and AdaBoost
Bonus #1: Gradient Descent
Thursday, June 2, 16
91. European Union CPS-Related Projects
From http://ec.europa.eu/newsroom/dae/document.cfm?doc_id=428
Thursday, June 2, 16
92. ICT-1 Smart Cyber-Physical Systems
From http://ec.europa.eu/newsroom/dae/document.cfm?doc_id=428
Thursday, June 2, 16
93. Deep Learning
From http://deeplearning.net/
Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of
moving Machine Learning closer to one of its original goals:Artificial Intelligence.This website is intended to host a
variety of resources and pointers to information about Deep Learning. In these pages you will find
• a reading list,
• links to software,
• datasets,
• a list of deep learning research groups and labs,
• a list of announcements for deep learning related jobs (job listings),
• as well as tutorials and cool demos.
• announcements and news about deep learning
From http://www.slideshare.net/hustwj/cikm-keynotenov2014
Thursday, June 2, 16
94. TensorFlow from Google
From https://www.tensorflow.org
TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph
represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors)
communicated between them.The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in
a desktop, server, or mobile device with a single API.TensorFlow was originally developed by researchers and engineers
working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting
machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of
other domains as well.
An error occurred.
enable JavaScript if it is disabled in your browser.From http://download.tensorflow.org/paper/whitepaper2015.pdf
Examples of Tensor Flow Operations at Data Flow Graph Nodes
Thursday, June 2, 16
95. Viv from the Developers of SIRI
From http://viv.ai
Thursday, June 2, 16
97. Multilayer Architecture from Cisco
From http://tf.nist.gov/seminars/WSTS/PDFs/1-0_Cisco_FBonomi_ConnectedVehicles.pdf
Thursday, June 2, 16
98. Multilayer Distributed Data Management Architecture from Cisco
From http://tf.nist.gov/seminars/WSTS/PDFs/1-0_Cisco_FBonomi_ConnectedVehicles.pdf
Thursday, June 2, 16
99. Multilayer Content Distribution Architecture from Cisco
From http://tf.nist.gov/seminars/WSTS/PDFs/1-0_Cisco_FBonomi_ConnectedVehicles.pdf
Thursday, June 2, 16
100. Big Data – Cloud Computing based
Requirements and Capabilities from ITU-T
See https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-Y.3600-201511-I!!PDF-E&type=items
Thursday, June 2, 16
101. ITU Big Data in a Cloud Context
Thursday, June 2, 16
102. ITU Cloud Computing Data Capabilities
1. Data collection capabilities
2. Data pre-processing capabilities
3. Data storage capabilities
4. Data analytics capabilities
5. Data visualization capabilities
6. Data management capabilities
7. Data security and protection capabilities
Thursday, June 2, 16
103. ITU Cloud Data Collection Capabilities
Data source intelligent recognition - offers the capabilities to locate the
data sources and detect the types of data being collected
Data adaptation - offers the capabilities to transform and organize the data
being collected with targeted structured and attributes (numbering, location,
ownerships, etc.)
Data integration - offers the capabilities to integrate data from different data
sources (different data types) using metadata or ontology
Data brokerage - offers the capabilities to provide a brokerage service for
searching data.
Thursday, June 2, 16
104. ITU Cloud Data Pre-processing Capabilities
Data extraction - offers the capabilities to extract information from the semi-
structured data or unstructured data
Data transmission - offers the capabilities to transport datasets (static data
and real-time data) from data sources or between one location to another keeping
the integrity and consistency
Data de-noising - offers the capabilities to eliminate noise information from a
mixture of signal data and noise data
Data aggregation - offers the capabilities to aggregate data which come from
different sources in the same data model or data format.
Thursday, June 2, 16
105. ITU Cloud Data Storage Capabilities
Data storing - offers the capabilities to store different types and formats of
data with elastic storage capacity
Data registration - offers the capabilities to create, update and delete the
metadata with corresponding changes in data storage
Data access - offers the capabilities to access data through multiple
interfaces, such as web service interfaces, file system interfaces, database
interfaces and so on
Data indexing - offers the capabilities to create and update index for
datasets
Data duplication and backup- offers the capabilities to duplicate and
make backup for data sets.
Thursday, June 2, 16
106. ITU Cloud Data Analytics Capabilities
Data preparation - offers the capabilities to transform data into a
form that can be analyzed.This capabilities includes exploring, changing,
and shaping of the raw data;
Data analysis - offers the capabilities of investigation, inspection, and
modelling of data in order to discover useful information;
Workflow automation - offers the automation processes, in whole
or part, during which data or functions are passed from one step to
another for action, according to a set of procedural rules;
Analysis algorithm adaptation - offers the capabilities to apply
algorithms of classification, regression, clustering, association rules, ranking
etc. to process the datasets according to the CSC demands;
Distributed processing - offers the capabilities to distribute the
processing tasks to a cluster of computing nodes;
Data application - offers the capabilities to support applications or
application plug-ins to use the analysis results of datasets. Text
Thursday, June 2, 16
107. ITU Cloud DataVisualization Capabilities
Data visualization - offers the capabilities to create, configure, deliver and
customize the visual representation of data analysis results
Data reporting - offers the capabilities to make reports of summary, key
elements and analysis results of datasets.
Thursday, June 2, 16
108. ITU Cloud Data Management Capabilities
Data provenance - offers the capabilities to manage information pertaining to any
source of data including the party or parties involved in generation and introduction
processes for data
Data preservation - offers the capabilities to manage the series of activities
necessary to ensure continued access to data according to relevant policy
Data ownership - offers the capabilities to manage property rights of data
possession, disposition according to the change of data status (e.g. after data
integration)
Processes monitoring - offers the information related to data processing;
Note – This capability can include information such as success of the job and task,
running time, resource utilization etc.
Metadata management, which offers the capabilities of creating, defining,
attributing, controlling and updating metadata information.
Thursday, June 2, 16
109. ITU Cloud Data Security and Protection Capabilities
Access control - offers the capabilities to manage the rights of parties to control
or influence the information related to them
Policy control - offers the capabilities to control policies of data protection and
security
Data security - offers the capabilities to apply the storage, network and service
related security mechanisms, including administrative, operational and maintenance
issues.
Thursday, June 2, 16
110. ITU Cloud Service Partner (CSN) Data Provider Roles (e.g. IoT)
1. Generate data activity (e.g. IoT) involves gathering data
from several kinds of sources.The data can be generated in a variety
of types such as structured data, semi-structured data, and
unstructured data.
2. Publish data (e.g. IoT Gateways) is the process of
registering metadata of data to the CSN. It provides metadata for
brokerage data activity. Note – metadata is delivered to the Big Data
Infrastructure Provider through brokerage data with the data
catalogue which includes data access methods, data use policy, etc.
3. Brokerage Data (e.g. Fog) activities include: providing a
data registry to the Cloud Data Provider for publishing their
data sources; finding on-line data source and registering its
metadata; providing a catalogue to the Cloud Big Data
Infrastructure Provider for searching appropriate data.
Thursday, June 2, 16
112. References (Non-commercial)
Pitfalls for direct IoT to Cloud interfaces
https://www.usenix.org/system/files/conference/hotcloud15/hotcloud15-zhang.pdf
Principles for IoT Clouds and IoT Cloud Systems from Distributed Systems Group at TU Wien
http://www.infosys.tuwien.ac.at/research/viecom/papers/Truong2015Principles.pdf
http://www.slideshare.net/linhsolar/principles-for-engineering-elastic-iot-cloud-systems
Cloud Computing vs Edge Computing
http://wikibon.com/the-vital-role-of-edge-computing-in-the-internet-of-things/
http://www.thoughtsoncloud.com/2015/07/7-reasons-edge-computing-is-critical-to-iot/
Benefits and Drawbacks of Fog Computing
http://www.dataversity.net/the-future-of-cloud-computing-fog-computing-and-the-internet-of-things/
Driving Forces for the Internet of Things
http://venturebeat.com/2014/02/04/9-factors-creating-a-perfect-storm-driving-the-internet-of-things-to-14-4-trillion-in-10-years/
Descriptive, Predictive, and Prescriptive Analytics Explained
https://halobi.com/2014/10/descriptive-predictive-and-prescriptive-analytics-explained/
NASA Big Data Analysis and Analytics with Supercomputer
http://www.nas.nasa.gov/assets/pdf/papers/NAS_Technical_Report_NAS-2014-02.pdf
Michael Koster’s Slideshare Presentations including Smart Objects API
www.slideshare.net/michaeljohnkoster
Thursday, June 2, 16
113. References (Non-commercial) continued
Global Sensor Network Middleware
http://lsirpeople.epfl.ch/salehi/papers/GSN-MDM07.pdf
ITU-T Big data – cloud computing based requirements and capabilities
https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-Y.3600-201511-I!!PDF-E&type=items
Real Time Data Services for Cyber-Physical Systems from U of Va
http://www.alice.virginia.edu/~son/publications/cpsw08.pdf
Book on “Signal and Data Processing Techniques for Industrial Cyber-Physical System”
http://users.ics.forth.gr/~tsakalid/PAPERS/2015-book.pdf
Messaging Protocols for Internet of Things
http://electronicdesign.com/iot/understanding-protocols-behind-internet-things
Downloadable Book on Mining Massive Data Sets including Stream Mining
http://infolab.stanford.edu/~ullman/mmds/book.pdf
Survey of Cyber-Physical Solutions
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.397.4496&rep=rep1&type=pdf
Towards a Framework for Securing Cyber-Physical Systems from China
http://www.sersc.org/journals/IJSIA/vol9_no3_2015/4.pdf
IoT and Big Data from ZDNet
http://www.zdnet.com/topic/the-power-of-iot-and-big-data/
Thursday, June 2, 16
114. References (Vendor)
IoT from Cloud to Fog Computing
http://blogs.cisco.com/perspectives/iot-from-cloud-to-fog-computing
IoT Event Stream Processing from SAS
http://resources.idgenterprise.com/original/AST-0147175_Understanding_Data_Streams_in_IoT.pdf
Fog Computing Overview
http://ubiquity.acm.org/article.cfm?id=2822875
Analytic hardware for IoT to Cloud data processing
http://cloudcomputing.sys-con.com/node/3268761
Thursday, June 2, 16
115. Wally’s Perspective on IoT and Cloud Interfaces
From http://dilbert.com/strip/2015-12-20
Thursday, June 2, 16