3. Introduction
Growth of data –
its sources and
types have seen
an exponential
explosion.
What can be done
with that data in a
variety of areas
makes this field
exciting.
Traditionally,
business
intelligence and
reporting was
done around
business records
and market data.
Data has become
a gold-mine for
companies.
•Amazon, FB, Google
As with all new
things, the
potential for good
is challenged
equally by its
potential for misuse
and disruption.
4. Sources of Big Data?
Business data: records like consumer, financial, sales, marketing,
production, transportation, patient, payer, provider etc. that
reside on spreadsheets and databases.
Social Networks: Twitter, Facebook, YouTube, blogs, other social
platforms. Information consumers provide about themselves
and others.
Machine-generated data – logs, sensors, automated devices,
audio, video, mobile phones, power grid, surveillance signals
etc. what is commonly referred to as the internet of things (IOT)
5. Exponential
Growth of Data
Source: https://www.promptcloud.com/blog/want-to-ensure-business-growth-via-big-data-augment-
enterprise-data-with-web-data/
6. Types of Big Data
•Records in a relational database (schema)
•Formatted Files
Structured
•Spreadsheets
•XML (Extensible Mark-up Language)
•JSON (Java Script Object Notation)
Semi-structured
•Text messages, video, audio, email
•Web pages, social media posts, GPS data
•PDFs, presentations
Unstructured
7. Before Big Data
Primarily Enterprise Data
Pre-1990s Reporting and Analytics
• Ad-Hoc Data extracts and duplication of effort and data.
• Would require new interfaces or feeds to be built and processed each time
resulting in a hodge-podge of data.
• No standards, disorganized, flat-file-based, impacted others with changes.
• MIS, Decision Support Systems and Executive Information Systems.
1990s - Data Warehouses for Business Intelligence
• Tried to bring order and definition.
• Centralized data (in theory but often multiple data warehouses).
• Often used by area (departmental data marts).
8. Big Data – Data Lake
Data Lake is the
central repository
for the enterprise
Term Coined by
Pentaho CTO,
James Dixon in
2010
Traditional
hardware and
architecture
unsuitable
•Built on commodity
hardware.
•Use of NoSQL
database.
•Massively parallel
processing.
•Schema on read.
•Meta data is stored.
9. The Four Vs of Big Data
• Massive acceleration in the last couple of decades.
• 900,000,000 unique visits to YouTube every month.
Volume
• Streaming data.
• 300 Hours of video uploaded to YouTube every minute.
Velocity
• Structured, semi-structured and unstructured data.Variety
• Confidence in the data drops
• Inconsistency, ambiguity, collection methodology...
Veracity
Adventurous thought leaders
have added more Vs ☺
Statistics Source: https://www.statisticbrain.com/
10. Healthcare Data Lake Example
Clinical
Payer
EHR
Rx/Pharmacy
Other
Call
3rd Party
Claims
Provider
Logs & Notes
License attributions below
11. Ecosystem of
Big Data in
Healthcare
Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4981575/
12. Why Use Big Data in Healthcare?
Machines (software)
very good at seeing
patterns and
classifying data.
Machine learning
from image libraries
and patient records
extremely good.
The use of sensors and
connectivity to these
various sources of
data in designing
medical devices.
A study found by
2020, 40% of IOT-
related technology
will be healthcare-
related.
License attributions below
13. Big Data Analytics
Descriptive (looking back - could be done before)
Predictive – what may happen
Prescriptive – what actions to take
Advanced analytic techniques using data science and AI
14. Big Data
Analytics
Techniques
Machine Learning and Data Mining: Teach computers to identify patterns
and relationships in data that a user doesn’t know to ask about.
Regression Analysis: Identifies how changing an independent variable
influences another, dependent, variable.
Text Analytics: Combines computational linguistics, statistics, and Machine
Learning to generate insights from unstructured text including online.
Social Network Analytics: Analyze the relationships (rather than the content
shared) within a social network.
Multimedia Analytics: Generates insights from multimedia data.
Sentiment Analysis: Scores opinions expressed in text to evaluate them as
positive or negative.
Monte-Carlo Simulation: Helps predict what can happen.
Source: http://www.dataversity.net/advanced-analytics-101-beyond-business-intelligence/
15. Machine Learning and AI
With the advances in
computing power and
techniques.
Can teach software
(a.k.a. machines) to learn
from the data.
Training data
Algorithms
Supervised and
unsupervised learning
Neural networks that
simulate the way our brain
and nervous system work.
16. Applications in Healthcare and
Medical Devices
•Hospital in France part of Assistance Publique-Hôpitaux de Paris.
•10 years’ worth of hospital admissions records, using “time series analysis”
techniques and machine learning to find the most accurate algorithms that
predicted future admissions trends.
•Set staffing levels 15 days out.
Shift Management
•Kaiser Permanente’s HealthConnect integrated system.
•Improved outcomes in cardiovascular disease.
•Savings of $1B in reduced visits and tests.
Electronic Health Records
17. Applications in Healthcare and
Medical Devices
•Integrated system between doctors, hospitals and health plans.
Prevention and Care Coordination - BlueShield of California
•Sensor attached to inhaler and synched with phone app.
•GPS-enabled tracker. Sends reminders, checks weather and pollen count and
sends notifications, provides asthma forecast for the day
Asthma and COPD - Propeller
18. Applications in Healthcare and
Medical Devices
•Interpreting neurological signals is the ultimate big data problem.
•By stimulating specific nerves, neural stimulation/neuromodulation may be able
to treat or ease a variety of diseases and conditions.
Bioelectric Medicine
•Dramatically speed-up progress in finding cancer cure by 5 years from 10.
•Big Data and analytics underpins this effort.
•Studying tumor samples in biobanks linked to patient treatment records.
•Discovering unexpected benefits like finding treatment for certain lung
cancers using an anti-depressant called Desipramine.
President Obama’s Cancer Moonshot Program
19. Applications in Healthcare and
Medical Devices
•Detects patterns of behavior and predicts diabetic events hours before they
happen.
•Ingested data from health insurance records, 10,000 anonymous electronic
patient medical records and population data in an attempt to develop real-
time personalized care.
Diabetes Management - Medtronic & IBM Watson on Sugar.IQ
•Big Data analytics uncovers hidden patterns, unknown correlations, and other
insights through examining large-scale varied data sets.
•Impact on clinical trials also.
Genomic Medicine
21. Pneumonia
ChexNet (Algorithm), tested on 420 x-rays, outperformed four
radiologists in both sensitivity (identifying positives correctly) and
specificity (identifying negatives correctly).
The training data contained 112,120 chest X-ray images labeled
with 14 different possible diagnoses.
Within a month of training, it was ahead of doctors in all 14.
They also created a heat map of the chest x-rays, a tool that could
greatly assist human radiologists.
Source: https://spectrum.ieee.org/static/ai-vs-doctors
22. Heart Attacks and Strokes
Researchers at the University of Nottingham in the UK scanned
patients’ routine medical data and predicted which of them would
have heart attacks or strokes within 10 years.
The neural network model predicted 4,998 patients who went on to
have a heart attack or stroke out of 7,404 actual cases.
The AI system correctly identified the condition of 355 more patients
than did the standard model.
Source: https://spectrum.ieee.org/static/ai-vs-doctors
23. Autism
A team at the University of North Carolina, Chapel Hill, has detected
brain growth changes linked to autism in children as young as 6
months old.
A deep-learning algorithm was able to use that data to predict
whether a child at high-risk of autism would be diagnosed with the
disorder at 24 months.
The algorithm correctly predicted the eventual diagnosis in high-risk
children with 81 percent accuracy and 88 percent sensitivity.
Behavioral questionnaires, which yield information that leads to
early autism diagnoses (at around 12 months old) that are just 50
percent accurate.
Source: https://spectrum.ieee.org/static/ai-vs-doctors
24. The Big
Questions to
Ponder
Rights to the data
How is the data used?
Privacy
IOT, Smart Homes and eavesdropping
Governments, law-enforcement agencies,
hackers.
What does it mean for professionals?
Jobs threatened.
25.
26. Other Sources
Big Data Analytics Techniques
http://www.dataversity.net/advanced-analytics-101-beyond-business-intelligence/
Shift Management
https://www.forbes.com/sites/bernardmarr/2016/12/13/big-data-in-healthcare-paris-hospitals-predict-admission-rates-using-machine-
learning/#2e8818f279a2
Electronic Health Records and Prevention and Care Coordination
https://www.mckinsey.com/industries/healthcare-systems-and-services/our-insights/the-big-data-revolution-in-us-health-care
Bioelectric Medicine
http://www.healthcareitnews.com/news/big-data-difference-neuro-sensing-and-stimulation
President Obama’s Cancer Moonshot Program
https://www.cancer.gov/research/key-initiatives/moonshot-cancer-initiative/blue-ribbon-panel
27. Licenses – Photos used under Creative Commons licenses
This Photo by Unknown Author is licensed under CC BY-SA
This Photo by Unknown Author is licensed under CC BY-SA
This Photo by Salvatore P is licensed under Creative Commons Zero
This Photo by Unknown Author is licensed under CC BY-NC-SA
This Photo by https://www.onlinewebfonts.com