Charles Cai has more than two decades of experience and track records of global transformational programme deliveries – from vision, evangelism to end-to-end execution in global investment banks, and energy trading companies, where he excels at designing and building innovative, large scale, Big Data systems in high volume low latency trading, global Energy Trading & Risk Management, and advanced temporal and geospatial predictive analytics, as Chief Front Office Technical Architect and Head of Data Science. He’s also a frequent speaker at Google Campus, Big Data Innovation Summit, Cloud World Forum, Data Science London, QCon London and MoD CIO Symposium etc, to promote knowledge and best practice sharing, with audience ranging from developers, data scientists, to CXO level senior executives from both IT and business background. He has in-depth knowledge and experience Scala, Python, C# / F#, C++, Node.js, Java, R, Haskell programming languages in Mobile, Desktop, Hadoop/Spark, Cloud IoT/MCU and BlockChain etc, and TOGAF9, EMC-DS, AWS CNE4 etc. certifications.
Presentation on how to chat with PDF using ChatGPT code interpreter
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
1. Making Sense of IoT Data
w/ Big Data + Data Science
Charles Cai
- The views expressed here are of my own and not my employer
2. Making Sense of IoT w/ Big Data + Data Science
u IoT = Big Data
u In this talk we are going to discuss the latest development in Big Data, Machine
Learning and Data Science and the latest IoT use cases in healthcare new drug
trial, geospatial mapping, disaster relief, retails and insurance etc to cover life
cycle of IoT data analytics: capturing, storing, cleansing, analysing, predicting
and maintaining…
u There will be 30~50 billion Internet connected devices in 5 years
u How IoT can drive innovations in various industries
u IoT = Big Data, how open source big data eco-system supports IoT Driven business cases
3. Making Sense of IoT Data with Big Data + Data Science
Big Data Week Conference 2015
Charles Cai
Big Data + Data Science
Leading Oil and Gas Trading Company
u Innovating with Disruptive Technologies
Data Center Operation System
Data Operation System 2.0
Data Science Maturity Model D - I - K - W
Crowdsourcing
MOOC / OSH / OSS
Data Science Maturity Model
Big Data DevOps / Data Scientist Shortage
Operating BDA: MicroservicesGraph Database / Graph Computing
Open Source Hardware / SoftwareData – Information – Knowledge - Wisdom
The Power of Crowdsourcing
4. Intro
u Bio
u #FO #FICC: Investment Banking Front Office: FX/Commodities
u #ETRM: Energy Trading & Risk Management
u #entrepreneur #innovator #disruptor
u Voted as one of the UK’s Top 50 Data Leaders & Influencers
u Twitter: @caidong
u #big-data #IoT #data-science #MOOC #Mobile #Cloud #UX
u LinkedIn: http://uk.linkedin.com/in/charlescai/en
5. Where we are at with Big Data Analytics?
By Thomas Davenport – Harvard Business Review
6. BI vs DS: from Descriptive to Prescriptive – Ironside
The Ironside Group Quantifiable ROI
7. u Use Case: Parkinson Disease New Drug Trial
u there’s no cure for Parkinson’s disease
u New medicine trial is an extremely slow process, daily doses x8!
u Traditional feedbacks from the patients are not frequent at all
u Wireless enabled wearable device + IMU sensors
u Classification of wearer activities
u sitting, standing, walking, running, sleeping…
u Detect pattern of Parkinson’s Disease symptoms
u predicting deterioration / improvement speed
u new trial medicine effectiveness
u Sensor data 10Hz sampling = 1GB / day / patient
IoT = Big Data
10. Open Source
Data Science Toolbox
Hadoop / Mesos
Distributed Storage
+
Scalable Computation
Open Source Big Data / Data Science Platform
10
COTS Apps
(Excel, Tableau, Qlik...)
Statistical Time Series Analysis
Wider Big Data Analytics eco-systems
• Shell/APIs: HDFS, Hive, Spark,
HBase, Sqoop, JDBC/ODBC
• Languages: Julia, Python, R, Scala
- Developed on:
- Operated by:
NLTK: Natural
Language
Distributed
Time Series /
Geospatial /
Graph Databases
GIT
Repo
DataProducts
WebSocket
Drag + Drop
(CZML/GeoJSON)
Web Browser
(collaboration)
Export to CSV/
Excel
Geospatial data
Time Series data
Public Data
Market data
Real-time
Streaming
Open Gov Data
JDBC
via phoenix
HDFS
Hive/Pig
w/ Geospatial
12. Key Sub-systems in Modern Big Data Analytics Stack
Data Analytics
Streaming
Graph Computing
Machine Learning
…
13. Data Science Maturity Map – where we are, where we are going can go
InformationData Knowledge
Wisdom /
Intelligence
“Note: The current version focuses mainly around data / machine
learning - a new version for cross industry use cases with more
coverage on IoT, container, data flow etc… is being developed – ETA Dec
2015 / Jan 2016.
Please follow Twitter: @caidong to receive the latest version soon”
14. From Classic to Modern Architecture
Full Text Search Natural Language Process
CCTV / Voice
Computer Vision + Q&A
Deep Learning (CNN/RNN)
RDBMS / DW KV + GraphDB + BD DW
Business Intelligence Big Data, Machine Learning
Lightweight Container +
Microservices + API Harvestingn-tier architecture
Semantic Search
Keyword Search
Named Entity Extraction Q&A N-Grams
Faceted Search Geospatial Search
Tables Primary Keys Foreign Keys Node / Vertex Label Edge / Relationship Properties
Colours Shapes Complex Shapes Textiles Accessories Context
What happened? What’s happening? Predictive Analytics Prescriptive Analysis “Make the trend!”
Database App Server Web Front Cloud Distributed and Fault Tolerant “Data Centre as One Computer”
Unstructured
15. u Working with HR Training
team
u VTA Training Sessions
u Big Data Bootcamp
u Lunch and Learn KT Sessions
Big Data Technology is evolving so fast… here’s Hadoop related:
Big data ELT with Apache Sqoop
BI vs Data Science
Data Scientist Career Path
MOOC and Machine Learning
Machine Learning with Apache Spark
Map Reduce 101
Big Data Security: Kerberos/Knox/Sentry
Deep Learning and Use Cases
Time Series and
Geospatial Big Data
Analytics with
ImpalaHBase: Distributed Key-value BigTable
Distributed Time Sereis DB: OpenTSDB
Machine
Learning with
Hadoop and R
Advanced
Machine
Bayesian
Network
16. Big Data / Data Science Learning Resource: free e-Books
Data Jujitsu: The Art of
Turning Data into Product
Data Mining
Algorithms In R
A Programmer's
Guide to Data Mining
Data Mining and Analysis:
Fundamental Concepts and Algorithms
Mining of
Massive Datasets
The School of
Data Handbook
Theory and Applications
for Advanced Text Mining
An Introduction to
Data Science
19. Big Data / Data Science Virtual Machines + Containers
20. Big Data / Data Science Certifications: EMC, Cloudera, …
CCP: Data Scientists:
- elite level
- real-world designing
and developing
- production-ready
data science solution
- peer-evaluated for
accuracy, scalability,
and robustness
EMC Data Science Associate:
- Data Analytics Lifecycle
- Analyzing / exploring data
w/ R
- Statistics modelling,
theory and advanced
methods
- Advanced technology &
tools
- Operationalizing