Talk presented at Bio-IT 2018 (machine learning track) - explores some approaches to overcoming challenges of using machine learning systems in healthcare applications.
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")James Hendler
More Related Content
Similar to Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: Overcoming some key challenges in machine learning for healthcare
Similar to Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: Overcoming some key challenges in machine learning for healthcare (20)
08448380779 Call Girls In Civil Lines Women Seeking Men
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: Overcoming some key challenges in machine learning for healthcare
1. Enhancing Precision Wellness with
Knowledge Graphs and Semantic Analytics
Or
Overcoming some key challenges in machine
learning for healthcare
Professor James Hendler
Director, Rensselaer Institute for Data Exploration and Analytics
2. Challenges to Machine Learning
• Powerful new analytic techniques are leading
to advances in precision medicine
– However, there are a number of obstacles in bring
these to the clinician/user level; results must be:
• Personalized
• Actionable
• Collaborative*
• Explainable
* i.e. support ollaboration in decision making
4. HEALTH
EMPOWERMENT by
Putting the right information into the hands of patients
and clinicians when they need it the most
ANALYTICS Using data for hypothesis formation and testing
LEARNING & Improving knowledge continuously
SEMANTICS
Integrating health & medical knowledge from
heterogeneous sources
HEALS is a member of the IBM AI Horizons Network
Rensselaer-IBM HEALS project
5. Problem Objectives Technical Challenge
Relevant knowledge is
heterogeneous
Personal information must be aligned with disease-related information
• Disease progression & risk factors
• Treatment guidelines
• Nutritional & physical activity habits and preferences
• Other lifestyle constraints
Resolving inconsistencies
between disparate sources
Relevant knowledge is
dynamic
Data sources (e.g., personal context, social media, web pages) are
constantly changing and must be reconciled
Keeping knowledge current
Patients have unique
needs
Individual’s information needs are dependent on their
specific micro- to macro-level contexts
Deriving personalized
insights
Raw insights are not
interpretable or actionable
A system must educate on good practices, support planning for improved
health, and track and evaluate performance
Delivering explainable
recommendations
Personalized Information
6. Knowledge Graph for Personal Health
PERSONAL
DISEASE
DIET
ACTIVITIES
Web sources & online forums
Personal health data
AI/ML Technologies
- Data-mining /
KDD
- NLP
- Semantic Search
- Semantic Data
Integration
7.
8. I'm not losing weight, what else can I do?
I see that you're walking about 5,000 steps/day,
which is below the American Heart Association's
recommendation. However, I can see from your
calendar that you don't have a lot of free time to
focus on physical activity. Maybe we should we talk
about your diet even thought I don't have a lot of
information about your eating habits?
System recognizes concept of "weight" that is
tied to diet and activity.
System analyzes both activity and diet data in
the personal KG (sourced from Apple
HealthKit) for the last month:
• Activity data: Compares Ed's activity with
the recommended guidelines (10,000 steps
per day AHA recommendation), and
determines that he is not getting enough
activity from walking (Ed averages only
5,124 steps per day).
• Calendar data: Use iOS EventKit and finds
Ed has very little free time during the day.
• Diet data: Ed doesn't log his food, so system
needs to ask him about his dietary habits.
Sure.
I see that you usually go to Applebee's for lunch during
the week, what do you usually order?
However, the system recognizes that he spends
lunch time at Applebee's every day (sourcing
location data from iOS CoreLocation API matched
with restaurant listings from the Google Maps API
).
Conversation Back-End System
ED
ED
9. I usually get a battered fish sandwich and a side of
fries.
Ok, that’s 981 calories and 43g of fat for lunch. Guidelines
recommend that you should be consuming 2086
calories/day so your lunch is 47% of your daily calories.
For more information, click here.
System retrieves nutrition facts about Ed's "average"
lunch from the nutrition KG and populates Ed's
personal KG with the self-reported data about his
diet habits.
System calculates recommended daily caloric intake
for Ed based on:
• Ed's height 5'10'', current weight 200lb, age 57,
caloric intake etc. from Ed's personal knowledge
graph
• Guidelines on recommended caloric intake
• System computes recommended caloric intake
from the Mifflin-St. Jeor equation (for BMR) & fat
intake from Ed's personal KG , compares to his
data
What is a healthier option?
Next time try a 6-oz top sirloin, baked potato, and fire-
grilled veggies for lunch. This only has 540 calories and
fewer than 22 grams of fat. I also found other restaurants
near you with healthy options. Would you like more
information?
System retrieves the Applebee's menu with nutrition
information from the Nutrition KG and recommends
healthier food substitutes at the same restaurant that
are in the similar categories to Ed's usual lunch.
ED
ED
Conversation Back-End System
11. I'm not losing weight, what else can I do?
It looks like you have been active so let's look at your
diet.
Can you explain what the graph means?
It looks like you usually have a big dinner and snack at
night. You also don't eat lunch most days. Also, you
tend to have more calories over weekends.
System uses Watson NLU to recognize concept
of "losing weight" that is tied to diet and activity.
Then, system analyzes both activity and diet
data stored in the personal KG (sourced from
Apple HealthKit) for the last month:
• Activity data: Compare Jessica's activity with
the recommended guidelines (10,000 steps
per day AHA recommendation), and
determines that she is almost getting
sufficient activity 9,356 steps per day) from
walking.
• Diet data: Jessica logs food in MyFitnessPal,
which syncs with Apple Health. The system
compares her recommended caloric intake
(1700 kCal) and her average caloric intake.
On weekdays her caloric intake is (2200
kCal), and weekends she consumes (3000
kCal)System mined pattern from personal KG: she
has a large gap between breakfast and dinner
and she typically consumes 2/3 of her daily
calories at dinner in addition to snacks at night.
She doesn't eat lunch and also eats more on
the weekends.
JE
Conversation Back-End System
JE
12. Why is this a problem? I usually am not hungry
during the day so I just eat when I get home from
work.
According to WebMD,
The longer the gap between dinner and the
previous meal or snack, the larger the dinner.
People who eat lightly at night end up eating fewer
calories and grams of fat overall than people who
eat big dinners and nighttime snacks. For more,
click here.
Do I really eat a lot of fat at night?
According to AHA guidelines,
You should be consuming 51 grams of fat/day. On
average, you eat 70 grams at night. Here is a
breakdown of your average fat content for breakfast,
lunch, dinner, and snacks. For more, click here.
System extracted explanation from WebMD,
by searching for <late night snacks and big
dinners>.
System calculates recommended fat intake
for Jessica based on:
• Her height, current weight, caloric intake
etc. from Jessica's personal KG
• AHA guidelines on recommended fat
intake
• System computes recommended caloric
intake from the Mifflin-St. Jeor equation
(for BMR) & fat intake from Jessica's
personal KG , compares to her data
Conversation Back-End
System
JE
JE
15. Hospital ED Readmission example
Identified factors, based on “Emergency Dept (ED) Log”,” In Patient” and “Out Patient”
datasets, that could improve our ability to predict if a patient would, or would not, return to
the ED within 72 hours of discharge.
What this entailed:
• Derived and analyzed dependent variables for 72 hour readmissions
– Examined over ~15000 existing variables.
– Developed new variables based on roll-ups and historical data.
• Used models that computed the combinations of sets of these variables
– Identified the best set of predictive variables for the available ED data
• 300 factors identified, reduced to 74 key variables
– Weighted Logistic Regression Analysis Performed
• Reported existing state of the art results for EDs: 73% to 85%
• Result of our two month study: Accuracy 80.1%
15
Ryan et al, Big Data, 2015
16. Making the Outcomes Actionable
16
Domain experts (e.g. Hospital Administrators in this case) need to understand the results and can
only take action on certain of them – the overall accuracy was not their key concern.
17. Actionable: Dynamic “cadre” identification
• 47% of revisits caused by
402 patients with
multiple visits
• 151 patients cause 29%
of all revisits
• Patients with a past ED
revisit are more than 3
times likely to revisit
18. Infer risk factors dynamically for different “cadres”
Select Disease
Select Risk Factors
Select
confounders
Statistical analysis of detected risk factors
A. New, C. Breneman, and K. P. Bennett. Cadre modeling: Simultaneously
Discovering subpopulations and predictive models. In 2018 International Joint
Conference on Neural Networks (IJCNN), 2018.
23. Explanation: Going beyond Recognition
Generating Triples with Adversarial Networks for Scene Graph
Construction (Klawonn & Heims, AAAI, 2018)
24. Conclusions
• Machine Learning is a great tool, but using the
results more widely will involve research into a
number of areas
– Individuals vs. cohorts
– “Best” vs. “most useful” ML results
– Supporting collaborative human (and eventually
human/AI) decision making
– Making the results of deep ML explainable
Editor's Notes
Feb 2008 Rehearsal
HEALS is an acronym that stands for Health Empowerment by Analytics, Learning & Semantics. We seek to create health empowerment by putting the right information into the hands of the patients and clinicians who need it the most. We will do this by combining the best that mathematical, cognitive and semantic technologies have to offer in terms of
SEMANTICally integrating health and medical knowledge from heterogeneous sources,
Improving that knowledge continuously through LEARNING
And ANALYZING that data to formulate and test hypotheses
Show how the demos they will see today will demonstrate different aspects of the technical challenges we are addressing…
HEALS 2
characterize disease in the face of changing information
provide context for overwhelming amounts of complex information (personalized population analysis)
leverage structured domain knowledge to help filter/prioritize/hypothesize
Personal background information: Sociodemographic information (e.g., age, gender, family structure, profession), medical history (comorbidities), personality characteristics
Health related parameters: Step counts, heart rate, blood pressure, food intake, liquid intake, activity data, stress data, BMI, weight -- temporal and geo-location data
Social: Cultural influences, infer through data or asking explicitly (food restrictions), social environment , social media use, people they like to spend time with, people they trust/respect
Resources/barriers: Free time or time-constraints, dietary restrictions, resources/restrictions that they have (e.g., do they have a car? how much do they make?), skills that they have (e.g., literacy, languages, education)
Preferences and willingness: Times they prefer to engage in different activities, activities they prefer to engage in, locations/communities they prefer or frequent, what they prefer to eat, willingness to try new things (food, activities, plans, etc.)
Motivations: Concerns that they have, goals
Knowledge and experiences: Knows what they have tried in the past regarding health behaviors (e.g., diets, exercise), what has worked and not worked for them, past experiences, stories they heard about other people's experiences
How can I change behavior X to affect outcome Y?
Collect data and analyze how behavior X affects health outcome Y
Behavior X works well for me, but I need help maintaining it long-term
Predictive vs. standard analytics
make clear we needed to find the top ones
(i.e. out of 15000 we needed to find some good ones – takes about 100 to get State of the Art)
Current risk browser is here. WARNING risk browser is not currently providing accurate numeric results because survey weighting is still in the works.
https://lp01.idea.rpi.edu/shiny/drukej/NHANESRB/