SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Predictive Maintenance
with Sensors in Utilities
Tina Zhang
Agenda
 Sensors in IOT era
 Predictive Maintenance
 Predictive Maintenance with sensor data in Utilities industry
 Architecture for real time distributed sensor data collection, analysis,
visualization, and storage system
 Modeling imprecise sensor readings
Sensors in IOT era
 Sensors
Sensors are a bridge between the physical world and the internet. They will
play an ever increasing role in just about every field imaginable, and powering
the “Internet of Things”.
 Potential Uses of Sensor Data
 Sensors can be used to monitor machines, infrastructure, and environment such as
ventilation equipment, bridges, energy meters, airplane engines, temperature,
humility, etc.
 One use of this data is for predictive maintenance, to repair or replace the items
before they break.
3 classes of Maintenance
 Corrective maintenance (CM), is simply fixing things after they suffer a
breakdown and can also be called Reactive maintenance.
 Preventive maintenance (PM), is about replacing or replenishing consumables
at scheduled intervals.
 Predictive maintenance (PdM) or Condition-based maintenance, focuses on
detecting failures before they occur.
PdM incorporates inspections of the system at predetermined intervals to
determine system condition.
Depending on the outcome of a continual inspection, either a preventive or
no maintenance activity is performed.
Fault Detection Method in Predictive Maintenance
 PdM employs many fault or defect detection methods which compare current
sensor or inspection data with some reference data.
 If the reference data are the outcome of a representation of the real system,
the fault detection method is called model-based.
Mainly, two distinctive kind of models are used, analytical models and
machine learning models:
Analytical models are limited to represent linear characteristics, however
modern machine learning techniques based on artificial intelligence, as
neural networks or Bayesian (beliefs) networks or support vector machines
are capable of including nonlinearities and complex interdependencies. Even
a relatively "simple" machine learning tool such as a decision tree can allow
for nonlinearities.
Machine Learning in Predictive Maintenance
 Data Mining and Machine Learning
allow systematic classifying of
patterns contained in data sets.
 Patterns of data, “attributes”,
containing information about
condition of physical assets can be
represented by “instances” with an
associated failure mode, or “class”.
 Predictions can be made based on
patterns in real time data.
Decision tree model example
 Here is an instance of building a decision tree model where the strategy is to
either perform maintenance or not based on outcome from several
independent measurements (variables).
Naïve Bayes example
Predictive Maintenance in Utility
industry
 By analyzing the patterns of circumstances surrounding past equipment
failures and power outages and by accessing multiple data sources including
sensors in real time, utility companies can predict and prevent future
failures.
 Predictive Maintenance allows utility companies to not only prepare for
known consumption peaks, such as those caused by extreme weather
conditions, but also react quickly to unexpected problems when the warning
signs appear.
 Utility companies can spot the problem early on:
 When some of the values of some sensor are not normal;
 When the number of abnormal values exceeds a given threshold;
 Or when the values of a given sensor are significantly different from the values
of its neighbors.
Big and fast sensor data requires a
different architecture
 Due to the rapid advances in sensor technologies, the
number of sensors and the amount of sensor data
have been increasing with incredible rates.
 Therefore the scalability, availability, speed
requirements for sensor data collection, storage, and
analysis solutions call for use of new technologies,
which have the ability to efficiently distribute data
over many servers and dynamically add new
attributes to data records.
Architecture for a real time distributed sensor
data collection, analysis, visualization, and
storage system
 The new architecture must be able to scale to support a large number of
sensors and big data sizes.
 It must be able to automatically gather and analyze large number of sensor
measurements over long periods of time and also to deploy statistics and
machine learning to execute computationally complex data analysis
algorithms with many influence factors.
 Open source big data frameworks can be utilized for large-scale sensor data
analysis requirements.
Socket
Shared Files
User
Kafka
Web Service
Data Source
:
:
Spark
Streaming &
Spark SQL &
ML lib
HDFS
Web UI
HBase
Analysis
results
Kafka
Hive
An example use case
 Display all the transformers located in City Houston, Texas on the map, and
when a transformer icon is clicked, display in an info window the following
details for each transformer: Transformer ID, Age, Designed Capacity, exact
location, and the current Load reading.
 If a transformer is of Type “Pole-Top”, with Rating 230, Age > 20, and if its
load has exceeds its designed capacity by more than 10 kVA, and also in the
location where the transformer is located, air temperature >100 degrees,
we'll highlight the transformer icon as red.
 When user clicks on the specific transformer, we'll populate the details for the
transformer, including its Load reading. Both the transformer icon color and
the transformer Load reading (with red or green color) will continuously
update every second in real time.
Why Spark?
 Spark presents a new distributed memory abstraction, called resilient
distributed datasets (RDDs), which provides a data structure for in-memory
computations on large clusters.
 RDDs can achieve fault tolerance, meaning that if a given task fails due to
some reasons such as hardware failures and erroneous user code, lost data
can be recovered and reconstructed automatically on the remaining tasks.
 Spark has a Java high-level API for working with distributed data similar to
Hadoop and presents an in-memory processing solution.
 We run Spark on Hortonworks HDP2.2 in YARN mode, also have made Spark
1.3.1 work on HDP2.2 (default Spark version: 1.2).
Spark Streaming
 Spark Streaming is an extension of the core Spark API that allows to enable
high-throughput, fault-tolerant stream processing of live data streams.
 It offers an additional abstraction called discretized streams, or
DStreams. DStreams are a continuous sequence of RDDs representing a
stream of data.
 DStreams can be created from live incoming data or by transforming other
DStreams.
 Spark receives data, divides it into batches, then replicates the batches for
fault tolerance and persists them in memory where they are available for
mathematical operations.
 Spark 1.3 offers Streaming K-means Clustering and Streaming Linear
Regression
Spark SQL
 Spark SQL is Spark's module for working with structured data.
 The foundation of Spark SQL is a type of RDD, called SchemaRDD (pre-V1.3) or
DataFrame (V1.3), an object similar to a table in a relational database.
 Spark SQL can run queries against mixed types of data
Spark piece in detail:
Sensor Data Storage – HBase
 NoSQL databases provide efficient alternatives for large amount of sensor data storage. In
this example, we will use HBase, a NoSQL key/value store which runs on top of HDFS.
 Unlike Hive, HBase operations run in real-time on its database rather than batch-based
MapReduce jobs.
 Each key/value pair in HBase is defined as a cell, and each key consists of row-key, column
family, column, and time-stamp. A row in HBase is a grouping of key/value mappings
identified by the row-key.
In our case, we’ll store the anomaly sensor data in a table “abnormal_ load” in the format of:
key, Transformer_ID, Timestamp, Load, Overload, Location, Air_Temperature
 We can query our HBase table by creating an external Hive table, linking the HBase table to
the Hive table, and then running HiveQL:
select Transformer_ID, Timestamp, Overload from spark_poc.abnormal_load where Overload
> 20 and Air_Temperature>105 order by Timestamp DESC;
Why sending all sources data to Kafka
In the diagrams in the next 2 slides:
 The first shows what happens without Kafka.
Since each source needs to have a connection to each target, it is difficult to
maintain and can cause lots of programming and security issues.
 The second diagram uses the Kafka, so all sources send data to Kafka.
We only to develop one interface/program to get all different data into
Kafka. Each different data is one topic.
And from consumer side, a consumer only deals with Kafka. When we add a
new source or a new consumer, it does not affect any existing source or target
at all. Thus it is easy to maintain, clean, secure, scalable.
Sources
Targets
Data Pipe Lines Without Kafka
Data Pipe Lines With Kafka
Kafka
HBase Hive
Sources
Targets
HDFS DB
Why write analysis result data stream to
Kafka before publishing it to web UI
 This is because if we send data steam (analysis result) to a queue on the web
server and then use web socket to push to the browser, it is very tedious to
maintain the queue.
 Kafka comes handy as a distributed, persistent message queue which supports
multiple concurrent writers, as well as multiple groups of readers that
maintain their own offsets within the queue (which Kafka calls a ‘topic’).
This enables us to build applications that consume data from a topic at their
own pace without disrupting access from other groups of readers.
Sensor Data Analysis
 To analyze data on the aforementioned architecture we use distributed
machine-learning algorithms in Apache Mahout and MLlib by Apache Spark.
 MLlib is a Spark component and a fast and flexible iterative computing
framework to implement machine-learning algorithms, including
classification, clustering, linear regression, collaborative filtering, and
decomposition aims to create and analyze large-scale data hosted in memory.
 We use -means algorithm for clustering sensor data and find the anomalies. -
means algorithm is a very popular unsupervised learning algorithm. It aims to
assign objects to groups. All of the objects to be grouped need to be
represented as numerical features. The technique iteratively assigns points to
clusters using distance as a similarity factor until there is no change in which
point belongs to which cluster.
 We also use Spark’s Streaming K-means.
Modeling imprecise sensor readings
 Sensor readings are inherently imprecise because of the noise introduced by
the equipment itself.
 Two main approaches have emerged for modeling uncertain data series:
 In the first, a Probability Density Function (PDF) over the uncertain values is
estimated by using some a priori knowledge.
 In the second, the uncertain data distribution is summarized by repeated
measurements (i.e., samples).
Dynamic probabilistic models over the
sensor readings
 The KEN technique builds and maintains dynamic probabilistic models over the
sensor readings, taking into account the spatio-temporal correlations that exist
in the sensor readings.
 These models organize the sensor nodes in non-overlapping groups, and are
shared by the sensor nodes and the sink.
 The expected values of the probabilistic models are the values that are
recorded by the sink. If the sensors observe that these values are more than εVT
away from the sensed values, then a model update is triggered.
 The PAQ and SAF methods employ linear regression and autoregressive
models, respectively, for modeling the measurements produced by the nodes,
with SAF leading to a more accurate model than PAQ.

Weitere ähnliche Inhalte

Was ist angesagt?

Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and AnalyticsVMware Tanzu
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsGanesan Narayanasamy
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...Big Data Spain
 
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsightsUse cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsightsGord Sissons
 
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr..."Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...Dataconomy Media
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data PlatformVikas Manoria
 
GITEX Big Data Conference 2014 – SAP Presentation
GITEX Big Data Conference 2014 – SAP PresentationGITEX Big Data Conference 2014 – SAP Presentation
GITEX Big Data Conference 2014 – SAP PresentationPedro Pereira
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017Ray Bugg
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practiceVivek Murugesan
 
Threat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique BrezinskiThreat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique BrezinskiDatabricks
 
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...Denodo
 
How IOT & Big Data will shape up Future Economies?
How IOT & Big Data will shape up Future Economies?How IOT & Big Data will shape up Future Economies?
How IOT & Big Data will shape up Future Economies?Srinath Perera
 
Next-Gen ML/AI Platform
Next-Gen ML/AI PlatformNext-Gen ML/AI Platform
Next-Gen ML/AI PlatformJosh Yeh
 

Was ist angesagt? (20)

Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and Analytics
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
 
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsightsUse cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
Use cases for Hadoop and Big Data Analytics - InfoSphere BigInsights
 
Ibm big data
Ibm big dataIbm big data
Ibm big data
 
Operational Analytics
Operational AnalyticsOperational Analytics
Operational Analytics
 
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr..."Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
"Empower Developers with HPE Machine Learning and Augmented Intelligence", Dr...
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 
GITEX Big Data Conference 2014 – SAP Presentation
GITEX Big Data Conference 2014 – SAP PresentationGITEX Big Data Conference 2014 – SAP Presentation
GITEX Big Data Conference 2014 – SAP Presentation
 
Big Data Application Architectures - IoT
Big Data Application Architectures - IoTBig Data Application Architectures - IoT
Big Data Application Architectures - IoT
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017
 
Importance of Big Data Analytics
Importance of Big Data AnalyticsImportance of Big Data Analytics
Importance of Big Data Analytics
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practice
 
Threat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique BrezinskiThreat Detection and Response at Scale with Dominique Brezinski
Threat Detection and Response at Scale with Dominique Brezinski
 
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
 
How IOT & Big Data will shape up Future Economies?
How IOT & Big Data will shape up Future Economies?How IOT & Big Data will shape up Future Economies?
How IOT & Big Data will shape up Future Economies?
 
Next-Gen ML/AI Platform
Next-Gen ML/AI PlatformNext-Gen ML/AI Platform
Next-Gen ML/AI Platform
 

Andere mochten auch

Predictive maintenance
Predictive maintenancePredictive maintenance
Predictive maintenanceJames Shearer
 
[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...PAPIs.io
 
What is predictive maintenance?
What is predictive maintenance?What is predictive maintenance?
What is predictive maintenance?Danko Nikolic
 
Predictive Maintenance with R
Predictive Maintenance with RPredictive Maintenance with R
Predictive Maintenance with Reoda GmbH
 
The Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics SolutionThe Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics SolutionSenturus
 
Predictive Maintenance
Predictive MaintenancePredictive Maintenance
Predictive Maintenancefljungbe
 
Predictive Maintenance
Predictive MaintenancePredictive Maintenance
Predictive MaintenanceSaama
 
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...Sentient Science
 
Predictive Maintenance for Oil and Gas
Predictive Maintenance for Oil and Gas Predictive Maintenance for Oil and Gas
Predictive Maintenance for Oil and Gas Helen Fisher
 
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
BA Summit 2014  Predictive maintenance: Met big data het lek dichtenBA Summit 2014  Predictive maintenance: Met big data het lek dichten
BA Summit 2014 Predictive maintenance: Met big data het lek dichtenDaniel Westzaan
 
Reliability centred maintenance
Reliability centred maintenanceReliability centred maintenance
Reliability centred maintenanceSHIVAJI CHOUDHURY
 
Essential Elements of Data Center Facility Operations
Essential Elements of Data Center Facility OperationsEssential Elements of Data Center Facility Operations
Essential Elements of Data Center Facility OperationsSchneider Electric
 
DATA FORUM MICROPOLE 2015 - Atelier Talend
 DATA FORUM MICROPOLE 2015 - Atelier Talend DATA FORUM MICROPOLE 2015 - Atelier Talend
DATA FORUM MICROPOLE 2015 - Atelier TalendMicropole Group
 
XMPLR Data Analytics in Power Generation
XMPLR Data Analytics in  Power GenerationXMPLR Data Analytics in  Power Generation
XMPLR Data Analytics in Power GenerationScott Affelt
 
Application fields of R in classical industrial analytics
Application fields of R in classical industrial analyticsApplication fields of R in classical industrial analytics
Application fields of R in classical industrial analyticseoda GmbH
 

Andere mochten auch (20)

Predictive maintenance
Predictive maintenancePredictive maintenance
Predictive maintenance
 
[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...
 
What is predictive maintenance?
What is predictive maintenance?What is predictive maintenance?
What is predictive maintenance?
 
Predictive Maintenance with R
Predictive Maintenance with RPredictive Maintenance with R
Predictive Maintenance with R
 
The Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics SolutionThe Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics Solution
 
Predictive Maintenance
Predictive MaintenancePredictive Maintenance
Predictive Maintenance
 
Predictive Maintenance
Predictive MaintenancePredictive Maintenance
Predictive Maintenance
 
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
Using the Industrial Internet to Move From Planned Maintenance to Predictive ...
 
Predictive Maintenance for Oil and Gas
Predictive Maintenance for Oil and Gas Predictive Maintenance for Oil and Gas
Predictive Maintenance for Oil and Gas
 
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
BA Summit 2014  Predictive maintenance: Met big data het lek dichtenBA Summit 2014  Predictive maintenance: Met big data het lek dichten
BA Summit 2014 Predictive maintenance: Met big data het lek dichten
 
Reliability centred maintenance
Reliability centred maintenanceReliability centred maintenance
Reliability centred maintenance
 
Reliability centered maintenance
Reliability centered maintenanceReliability centered maintenance
Reliability centered maintenance
 
Essential Elements of Data Center Facility Operations
Essential Elements of Data Center Facility OperationsEssential Elements of Data Center Facility Operations
Essential Elements of Data Center Facility Operations
 
Digital POV-Chemical Industries
Digital POV-Chemical IndustriesDigital POV-Chemical Industries
Digital POV-Chemical Industries
 
DATA FORUM MICROPOLE 2015 - Atelier Talend
 DATA FORUM MICROPOLE 2015 - Atelier Talend DATA FORUM MICROPOLE 2015 - Atelier Talend
DATA FORUM MICROPOLE 2015 - Atelier Talend
 
Business Insight and Predictive Analysis
Business Insight and Predictive AnalysisBusiness Insight and Predictive Analysis
Business Insight and Predictive Analysis
 
XMPLR Data Analytics in Power Generation
XMPLR Data Analytics in  Power GenerationXMPLR Data Analytics in  Power Generation
XMPLR Data Analytics in Power Generation
 
Predictive maintenance
Predictive maintenancePredictive maintenance
Predictive maintenance
 
Predictive Analysis
Predictive AnalysisPredictive Analysis
Predictive Analysis
 
Application fields of R in classical industrial analytics
Application fields of R in classical industrial analyticsApplication fields of R in classical industrial analytics
Application fields of R in classical industrial analytics
 

Ähnlich wie Predictive maintenance withsensors_in_utilities_

CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingCS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingPalani Kumar
 
Reactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeReactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeSumant Tambe
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy snehal parikh
 
Enabling SQL Access to Data Lakes
Enabling SQL Access to Data LakesEnabling SQL Access to Data Lakes
Enabling SQL Access to Data LakesVasu S
 
Real time data-pipeline from inception to production
Real time data-pipeline from inception to productionReal time data-pipeline from inception to production
Real time data-pipeline from inception to productionShreya Mukhopadhyay
 
Massive sacalabilitty with InterSystems IRIS Data Platform
Massive sacalabilitty with InterSystems IRIS Data PlatformMassive sacalabilitty with InterSystems IRIS Data Platform
Massive sacalabilitty with InterSystems IRIS Data PlatformRobert Bira
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkAgnihotriGhosh2
 
Data Analysis In The Cloud
Data Analysis In The CloudData Analysis In The Cloud
Data Analysis In The CloudMonica Carter
 
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc AnalyticsA General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc AnalyticsFlurry, Inc.
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfan
 
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayDataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayJosef Adersberger
 
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdfSchema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdfseo18
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
 
Experimenting With Big Data
Experimenting With Big DataExperimenting With Big Data
Experimenting With Big DataNick Boucart
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaNithin Kakkireni
 

Ähnlich wie Predictive maintenance withsensors_in_utilities_ (20)

CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingCS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_Computing
 
Reactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/SubscribeReactive Stream Processing for Data-centric Publish/Subscribe
Reactive Stream Processing for Data-centric Publish/Subscribe
 
Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy Hadoop Integration with Microstrategy
Hadoop Integration with Microstrategy
 
Enabling SQL Access to Data Lakes
Enabling SQL Access to Data LakesEnabling SQL Access to Data Lakes
Enabling SQL Access to Data Lakes
 
Real time data-pipeline from inception to production
Real time data-pipeline from inception to productionReal time data-pipeline from inception to production
Real time data-pipeline from inception to production
 
Massive sacalabilitty with InterSystems IRIS Data Platform
Massive sacalabilitty with InterSystems IRIS Data PlatformMassive sacalabilitty with InterSystems IRIS Data Platform
Massive sacalabilitty with InterSystems IRIS Data Platform
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
Data Analysis In The Cloud
Data Analysis In The CloudData Analysis In The Cloud
Data Analysis In The Cloud
 
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc AnalyticsA General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -
 
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayDataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice Way
 
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdfSchema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
Schema-based multi-tenant architecture using Quarkus & Hibernate-ORM.pdf
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Experimenting With Big Data
Experimenting With Big DataExperimenting With Big Data
Experimenting With Big Data
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Final Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_SharmilaFinal Report_798 Project_Nithin_Sharmila
Final Report_798 Project_Nithin_Sharmila
 
rscript_paper-1
rscript_paper-1rscript_paper-1
rscript_paper-1
 

Kürzlich hochgeladen

Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...SUHANI PANDEY
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableSeo
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...SUHANI PANDEY
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...tanu pandey
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445ruhi
 
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋nirzagarg
 
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋nirzagarg
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...SUHANI PANDEY
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...SUHANI PANDEY
 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...tanu pandey
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...SUHANI PANDEY
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirtrahman018755
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceDelhi Call girls
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...SUHANI PANDEY
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...roncy bisnoi
 

Kürzlich hochgeladen (20)

Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Salem Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
 
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
 
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
 

Predictive maintenance withsensors_in_utilities_

  • 1. Predictive Maintenance with Sensors in Utilities Tina Zhang
  • 2. Agenda  Sensors in IOT era  Predictive Maintenance  Predictive Maintenance with sensor data in Utilities industry  Architecture for real time distributed sensor data collection, analysis, visualization, and storage system  Modeling imprecise sensor readings
  • 3. Sensors in IOT era  Sensors Sensors are a bridge between the physical world and the internet. They will play an ever increasing role in just about every field imaginable, and powering the “Internet of Things”.  Potential Uses of Sensor Data  Sensors can be used to monitor machines, infrastructure, and environment such as ventilation equipment, bridges, energy meters, airplane engines, temperature, humility, etc.  One use of this data is for predictive maintenance, to repair or replace the items before they break.
  • 4. 3 classes of Maintenance  Corrective maintenance (CM), is simply fixing things after they suffer a breakdown and can also be called Reactive maintenance.  Preventive maintenance (PM), is about replacing or replenishing consumables at scheduled intervals.  Predictive maintenance (PdM) or Condition-based maintenance, focuses on detecting failures before they occur. PdM incorporates inspections of the system at predetermined intervals to determine system condition. Depending on the outcome of a continual inspection, either a preventive or no maintenance activity is performed.
  • 5. Fault Detection Method in Predictive Maintenance  PdM employs many fault or defect detection methods which compare current sensor or inspection data with some reference data.  If the reference data are the outcome of a representation of the real system, the fault detection method is called model-based. Mainly, two distinctive kind of models are used, analytical models and machine learning models: Analytical models are limited to represent linear characteristics, however modern machine learning techniques based on artificial intelligence, as neural networks or Bayesian (beliefs) networks or support vector machines are capable of including nonlinearities and complex interdependencies. Even a relatively "simple" machine learning tool such as a decision tree can allow for nonlinearities.
  • 6. Machine Learning in Predictive Maintenance  Data Mining and Machine Learning allow systematic classifying of patterns contained in data sets.  Patterns of data, “attributes”, containing information about condition of physical assets can be represented by “instances” with an associated failure mode, or “class”.  Predictions can be made based on patterns in real time data.
  • 7. Decision tree model example  Here is an instance of building a decision tree model where the strategy is to either perform maintenance or not based on outcome from several independent measurements (variables).
  • 9. Predictive Maintenance in Utility industry  By analyzing the patterns of circumstances surrounding past equipment failures and power outages and by accessing multiple data sources including sensors in real time, utility companies can predict and prevent future failures.  Predictive Maintenance allows utility companies to not only prepare for known consumption peaks, such as those caused by extreme weather conditions, but also react quickly to unexpected problems when the warning signs appear.  Utility companies can spot the problem early on:  When some of the values of some sensor are not normal;  When the number of abnormal values exceeds a given threshold;  Or when the values of a given sensor are significantly different from the values of its neighbors.
  • 10. Big and fast sensor data requires a different architecture  Due to the rapid advances in sensor technologies, the number of sensors and the amount of sensor data have been increasing with incredible rates.  Therefore the scalability, availability, speed requirements for sensor data collection, storage, and analysis solutions call for use of new technologies, which have the ability to efficiently distribute data over many servers and dynamically add new attributes to data records.
  • 11. Architecture for a real time distributed sensor data collection, analysis, visualization, and storage system  The new architecture must be able to scale to support a large number of sensors and big data sizes.  It must be able to automatically gather and analyze large number of sensor measurements over long periods of time and also to deploy statistics and machine learning to execute computationally complex data analysis algorithms with many influence factors.  Open source big data frameworks can be utilized for large-scale sensor data analysis requirements.
  • 12. Socket Shared Files User Kafka Web Service Data Source : : Spark Streaming & Spark SQL & ML lib HDFS Web UI HBase Analysis results Kafka Hive
  • 13. An example use case  Display all the transformers located in City Houston, Texas on the map, and when a transformer icon is clicked, display in an info window the following details for each transformer: Transformer ID, Age, Designed Capacity, exact location, and the current Load reading.  If a transformer is of Type “Pole-Top”, with Rating 230, Age > 20, and if its load has exceeds its designed capacity by more than 10 kVA, and also in the location where the transformer is located, air temperature >100 degrees, we'll highlight the transformer icon as red.  When user clicks on the specific transformer, we'll populate the details for the transformer, including its Load reading. Both the transformer icon color and the transformer Load reading (with red or green color) will continuously update every second in real time.
  • 14. Why Spark?  Spark presents a new distributed memory abstraction, called resilient distributed datasets (RDDs), which provides a data structure for in-memory computations on large clusters.  RDDs can achieve fault tolerance, meaning that if a given task fails due to some reasons such as hardware failures and erroneous user code, lost data can be recovered and reconstructed automatically on the remaining tasks.  Spark has a Java high-level API for working with distributed data similar to Hadoop and presents an in-memory processing solution.  We run Spark on Hortonworks HDP2.2 in YARN mode, also have made Spark 1.3.1 work on HDP2.2 (default Spark version: 1.2).
  • 15. Spark Streaming  Spark Streaming is an extension of the core Spark API that allows to enable high-throughput, fault-tolerant stream processing of live data streams.  It offers an additional abstraction called discretized streams, or DStreams. DStreams are a continuous sequence of RDDs representing a stream of data.  DStreams can be created from live incoming data or by transforming other DStreams.  Spark receives data, divides it into batches, then replicates the batches for fault tolerance and persists them in memory where they are available for mathematical operations.  Spark 1.3 offers Streaming K-means Clustering and Streaming Linear Regression
  • 16. Spark SQL  Spark SQL is Spark's module for working with structured data.  The foundation of Spark SQL is a type of RDD, called SchemaRDD (pre-V1.3) or DataFrame (V1.3), an object similar to a table in a relational database.  Spark SQL can run queries against mixed types of data Spark piece in detail:
  • 17. Sensor Data Storage – HBase  NoSQL databases provide efficient alternatives for large amount of sensor data storage. In this example, we will use HBase, a NoSQL key/value store which runs on top of HDFS.  Unlike Hive, HBase operations run in real-time on its database rather than batch-based MapReduce jobs.  Each key/value pair in HBase is defined as a cell, and each key consists of row-key, column family, column, and time-stamp. A row in HBase is a grouping of key/value mappings identified by the row-key. In our case, we’ll store the anomaly sensor data in a table “abnormal_ load” in the format of: key, Transformer_ID, Timestamp, Load, Overload, Location, Air_Temperature  We can query our HBase table by creating an external Hive table, linking the HBase table to the Hive table, and then running HiveQL: select Transformer_ID, Timestamp, Overload from spark_poc.abnormal_load where Overload > 20 and Air_Temperature>105 order by Timestamp DESC;
  • 18. Why sending all sources data to Kafka In the diagrams in the next 2 slides:  The first shows what happens without Kafka. Since each source needs to have a connection to each target, it is difficult to maintain and can cause lots of programming and security issues.  The second diagram uses the Kafka, so all sources send data to Kafka. We only to develop one interface/program to get all different data into Kafka. Each different data is one topic. And from consumer side, a consumer only deals with Kafka. When we add a new source or a new consumer, it does not affect any existing source or target at all. Thus it is easy to maintain, clean, secure, scalable.
  • 20. Data Pipe Lines With Kafka Kafka HBase Hive Sources Targets HDFS DB
  • 21. Why write analysis result data stream to Kafka before publishing it to web UI  This is because if we send data steam (analysis result) to a queue on the web server and then use web socket to push to the browser, it is very tedious to maintain the queue.  Kafka comes handy as a distributed, persistent message queue which supports multiple concurrent writers, as well as multiple groups of readers that maintain their own offsets within the queue (which Kafka calls a ‘topic’). This enables us to build applications that consume data from a topic at their own pace without disrupting access from other groups of readers.
  • 22. Sensor Data Analysis  To analyze data on the aforementioned architecture we use distributed machine-learning algorithms in Apache Mahout and MLlib by Apache Spark.  MLlib is a Spark component and a fast and flexible iterative computing framework to implement machine-learning algorithms, including classification, clustering, linear regression, collaborative filtering, and decomposition aims to create and analyze large-scale data hosted in memory.  We use -means algorithm for clustering sensor data and find the anomalies. - means algorithm is a very popular unsupervised learning algorithm. It aims to assign objects to groups. All of the objects to be grouped need to be represented as numerical features. The technique iteratively assigns points to clusters using distance as a similarity factor until there is no change in which point belongs to which cluster.  We also use Spark’s Streaming K-means.
  • 23. Modeling imprecise sensor readings  Sensor readings are inherently imprecise because of the noise introduced by the equipment itself.  Two main approaches have emerged for modeling uncertain data series:  In the first, a Probability Density Function (PDF) over the uncertain values is estimated by using some a priori knowledge.  In the second, the uncertain data distribution is summarized by repeated measurements (i.e., samples).
  • 24. Dynamic probabilistic models over the sensor readings  The KEN technique builds and maintains dynamic probabilistic models over the sensor readings, taking into account the spatio-temporal correlations that exist in the sensor readings.  These models organize the sensor nodes in non-overlapping groups, and are shared by the sensor nodes and the sink.  The expected values of the probabilistic models are the values that are recorded by the sink. If the sensors observe that these values are more than εVT away from the sensed values, then a model update is triggered.  The PAQ and SAF methods employ linear regression and autoregressive models, respectively, for modeling the measurements produced by the nodes, with SAF leading to a more accurate model than PAQ.