Colleen M. Farrelly discusses the importance of interdisciplinary approaches in data science. She provides examples of data science problems involving health risk modeling, market forecasting, and predicting disease from genetic data. For each problem, multiple disciplines could provide relevant insights, such as sociology, nutrition, genetics, economics, and medicine. Farrelly argues that individuals with broad knowledge across disciplines are well-equipped for careers in data science, as interdisciplinary perspectives avoid bias and unreasonable assumptions when solving complex real-world problems.
2. About Me
Former MD/PhD student
Background outside of math/stat including sociology, biochemistry, molecular
biology, psychology, genomics, epidemiology.
Work history including academic medical studies, government, military,
biotech, and education.
Areas of expertise include topological data analysis, measurement models,
Bayesian designs, geometry in machine learning.
3. Overview
Multidisciplinary approaches often needed to solve data science problems
effectively.
Can draw on many different areas depending on the problem:
Sociology
Industrial psychology
Marketing
Genomics
Finance
Medicine
Individuals with a broad knowledge base are well-equipped for a career in
data science.
5. Problem 1: Health Risk Modeling
Problem: Obesity and related problems are costing a healthcare system a lot
of money. How do we flag patients at risk and try to intervene on patients
who are sick?
What disciplines might be needed? What causes might we consider? Anything
to take into account when designing the data mining and a possible trial?
What sorts of expertise might we need on this project and the
implementation of results?
6. Problem 1: Health Risk Modeling
Food deserts
Jobs with little opportunity to be active
Genetic component
Lack of understanding around nutrition
Stress
7. Problem 2: Market Forecasting
Problem: How can we get a better model of future valuation of a company or
sector to find good investment opportunities?
What disciplines might be needed? What outside influences might we need to
account for? How might we set up the analysis?
9. Problem 3: Predicting Disease from
Genetic Data
Problem: Given a sequence of genetic data and patient case history
information, provide a short list of differential diagnoses with a high
probability of matching the underlying disease.
What might complicate this analysis? Could the patient have more than one
underlying disease? Do you think the data is structured or unstructured? What
might be some technical challenges? Which disciplines could be helpful on this
project?
10. Problem 3: Predicting Disease from
Genetic Data
Epigenetic factors (environment)
Comorbidity
Doctor error
Incorrect spelling or unreadable shorthand
Computational challenges of data storage and analysis requirements
Statistical test problems (p>>n)
11. Conclusions
Domain knowledge is important in data science.
Interdisciplinary backgrounds or team compositions can help understand a
given project from multiple angles.
This avoids potential bias or unreasonable assumptions.
Technical expertise + domain knowledge creates value in data science
projects.