Big Data technologies enable us to build the digital brain of smart systems. I will illustrate with examples how we build a digital brain by collecting data from a large number of sensors and using the brain to find value in that data. We build a Data Lake using cutting edge technology from Pivotal and use it to store large amounts of sensor and other data. Then we can find patterns in that data by applying the Data Science methodology using sophisticated machine learning and statistical algorithms customized to run on big data within the Data Lake. Armed with these patterns the system can detect anomalies and respond in an appropriate manner. Data Science combined with sensors and actuators can make a system smart!
We have the Internet Of Things – it is a dumb collection of very sophisticated machines with thousands of sensors and actuators / machine
2010 accident on BP offshore platform in the Gulf of Mexico
Drilling rigs cost between $350,000 and $1,000,000 per day
Non-productive Time (NPT) is the measurement most watched
Worst case scenario is a Macondo type blow-out = $40B liability
We need - better safety protocols……better regulation……and a smart system!
Sensors:
GE blowout preventers (BOP) collect information like ram position, system health and maintenance, etc.
Jet engine
Fitbit
We are all carrying a large number of sensors on us right now! There’s a lot of intelligence (analytics) built into these sensors/machines
SENSORS: We now collect huge amounts of data about activities of humans and machines
We apply predictive analytics to that data to build decision support tools – like using regression models to facilitate scenario analysis
… and action is still taken based on human orders
Create a list of customers likely to churn and reach out to them
Optimize the pricing and discounting plan for the next year and modify it on the fly
NUDGE: Figure out the drivers that affect human behavior like saving and encourage that
Understand what is causing shutdowns and investigate
Exception: the internet has automated systems that react to events of interest
They are not connected!
We have humans look at the data from sensors and then make decisions – full of delays and room for error
WE CAN DO MUCH BETTER!
Once connected, a smart system takes action in response to an event of interest
Brian:
Input – signal from eyes
Analysis – compute trajectory of ball
Action – swing bat to connect woth ball
Brian:
Input – signal from eyes
Analysis – compute trajectory of ball
Action – swing bat to connect woth ball
Shown – a gamma ray detector in a drill stem – part of MWD – Monitoring While Drilling
Smart System = Data lake for storing sensor data + data science for building and operationalizing models + actuators for taking action
Sensors collect data and send it to a Data Lake
The Digital Brain –
We create the brain by building models that extract patterns
The brain is then activated and can detect deviations from these patterns
The system can initiate action on it’s own as well as provide alerts and predictive intelligence to humans
Actuators:
-Connect to the control system and send action messages
-Shutdown system if blowup is predicted
-Send an alert to humans in-charge if something is anomalous but dangerous
-Predictive maintenance: Flag component for maintenance when required
THE DIGITAL BRAIN CANNOT BE IN ONE MACHINE – IT NEEDS INFORMATION FROM THE NETWORK OF MACHINES
A Parallel Storage system (or Data Lake) where all the sensor information is collected (at Pivotal we have developed Pivotal HD + HAWQ based on Hadoop)
Capability to build models which keeping the data in-parallel and in-place
EXTRACTING PATTERNS: Clustering algorithms – we have used the k-means clustering function available in MADlib, graph-based clustering, clustering of time-series data in frequency space etc.
FINDING ANOMALIES: distance from centroid, change in cluster assignment etc.
LIVING MODELS: Models have to learn and update continuously
The ability to send appropriate signals to the actuator control systems
Low latency scoring
API that can connect to Business Intelligence tools and Apps
Refer to the debate in the AI community between Douglas Hofstadter (UMich, Indiana University) one one hand and Peter Norvig (Google) and Stuart Russell (Berkeley) on the other hand
We want to step out of that debate and combine humans and machines into smart system (a la Arnab Gupta of Opera @ Strata 2011 - > man + machine)
We are not taking the humans out of the loop but empowering them
Tiers:
Ingestion: Ability to bring data from multiple data sources across all timelines with varying QoS
Distillation: Ability to take the data stored in the storage tier and coverting it to structured data for easier analysis by downstream applications
Processing: Ability to run analytical algorithms and user queries with varying QoS (real-time, interactive, batch) to generate structured data for easier analysis by downstream applications
Insights: Ability to analyze all the data with varying QoS (real-time, interactive, batch) to generate insights for business decision making
Action: Ability to integrate the insights with the business decision making systems
Unified Data Management: Ability to manage the data lifecycle, access policy definition, and master data management and reference data management services
Unified Operations: Ability to monitor, configure and manage the whole Data Lake from a single operations environment
Processing Tier – PHD (Hive, HBase, Pig and MapReduce)
Distillation Tier – Pivotal Data Dispatch, Pivotal Analytics, ETL Partner Products
Informational
Ability to get information in a dashboard
Integration with business intelligence toolsTableau, MicroStrategy, BusinessObjects, Pentaho.
Alerting
Ability to alert the decision maker
-Integration with the alert systems
-Dashboard, alarms, emails, pagers, phones etc.
Automation
- Ability to integrate with business decision making systems
- Integration with the applications to take automated actions
- MessageMQ, Rabbit, Spring, & other technologies.
Store Everything
Analyze Anything
Build Next Generation
Smart meters measure power twice an hour – it’s a measure of all activity!
We can Fourier transform 10 weeks of data from 100,000 meters in 5 seconds flat
… and take action: once an anomaly is detected we can detect theft, prevent blackouts, and much, much more
Batch process – FT time series data (data parallel algorithm) and use k-means clustering (not explicitly parallel – use MADlib); identify and label outliers
Real-time process – detect changes and outliers and set off the suitable alarm
2010 accident on BP offshore platform in the Gulf of Mexico
Drilling rigs cost between $350,000 and $1,000,000 per day
Non-productive Time (NPT) is the measurement most watched
Worst case scenario is a Macondo type blow-out = $20B liability
We need - better safety protocols……better regulation……and a smart system!
An ecosystem of smart machines, much like a natural ecosystem, will be self-healing and self-sustaining
That’s the true realization of the potential of the Internet Of Things
This is a movement we can all make happen
We have the Internet Of Things – it is a dumb collection of very sophisticated machines with thousands of sensors and actuators / machine
We have the Internet Of Things – it is a dumb collection of very sophisticated machines with thousands of sensors and actuators / machine