SlideShare a Scribd company logo
1 of 4
Download to read offline
ROBERTA BALCYTYTE
MACHINE LEARNING
FOR EARLY
DETECTION
OF HEREDITARY
RARE DISEASE.
ROBERTA BALCYTYTE
MACHINE LEARNING FOR EARLY DETECTION OF
HEREDITARY RARE DISEASE.
Prediction of rare disease from
big and severely imbalanced data.
Rare diseases affect a realively small
number of people (1 per 2,000)
compared to the general population
and specific issues are raised in
relation to their rarity. In particular,
many patients are not diagnosed,
the diagnosis is delayed or wrongly
determined resulting in inappropriate
treatments. This research aims to
investigate machine learning based
models which can capture early
flags of rare disease, namely HAE,
and supplement medical diagnosis
procedure by making it faster and
more accurate. The rarity of the
condition poses the under-investigated
computational challenge of severe
class imbalance within big data
(1,200 cases vs. 165M controls).
Task:
For the experiments we used a
cleaned≈data set which contained
165M US patients with ~1,200 HAE
positive cases and ~240 features
on relevant medical events.
Firstly, we experimented with six
classifiers to fit prediction model:
L2 regularized logistic regression,
SVM with Gaussian kernel, decision
tree, random forest, AdaBoost and
Gaussian Naïve Bayes. All classifiers
were tuned within cross-validation and
trained on randomly under-sampled
controls’ for 50 iterations.
Secondly, we applied an advanced
technique for under-sampling the
majority class from a big data set.
The technique is based on Tomek-links
and is parallelizable.
The research was conducted in
collaboration with industry partner
‘IMS Health’ and lead by UCL Prof.
DelmiroFernandez-Reyes.
Review:
The project is still in progress but
in the first stage we have found
that Random Forest and AdaBoost
outperformed other classifiers with an
average AUC of 88.9% (std. 0.87%)
and 89.0% (std. 0.81%) respectively.
Furthermore, AdaBoost achieved a
higher sensitivity of 63.97% compared
to Random Forest while sustaining a
relatively high specificity of 93.17%
(Fig. 2).
During the second stage the advanced
under-sampling technique proved itself
to improve the predictive power of the
classifier, but only slightly.
To our knowledge this research is the
first attempt to apply machine learning
to predicting HAE and is one of the
few studies focusing on rare disease
prediction in general from the current
big and severely imbalanced data set.
What makes this project unique?
This project is the first in its
field to apply machine learning
to HAE prediction.
Our aim is to build a predictive
model for a very rare disease from
big and severely imbalanced data.
It will make the diagnosis of such a
disease faster and more accurate.
What are your
plans for the future?
I have been learning and
implementing new techniques
of under-sampling and parallel
computing. After it is finished I
would like a data analytics job
within healthcare. The project has
definitely motivated me to continue
working within that industry.
Did you always know this was
the area you wanted to work in?
Originally my background was
in Economics. After graduating
I joined EY for 5 years, doing
projects related to process
analysis, improvement, risk
assessment and organizational
performance monitoring - not
very related to data analytics
or programming!
I decided to convert to data
analytics after my secondment year
in the Enterprise Intelligence & Data
Analytics Centre of Excellence.
It inspired me to pursue new
challenges and become a data
scientist. So I applied to UCL to
do a MSc in Business Analytics.
What has been the
highlight so far?
I started the program from scratch,
not knowing anything before. I am
proud of my endeavor to becoming
a data scientist. It’s been a very
steep learning curve!
What advice would you
give your 18 year old self?
Always keep learning
and exploring.
What is changing in engineering?
This research project has
convinced me that revised, novel
algorithms and new tools are
needed to leverage the treasures
of big data.
What excites you about the
opportunities with data today
and in the future
I believe that data science will
start evolving in the healthcare
industry more and more rapidly and
very soon. This will lead to better
medical services, saved lives and
an improved quality of life for many,
many people.
I am very excited to be part of
this, developing innovative, more
efficient ways of curing people or
even better, preventing diseases.
////
Q&A
WE SAT DOWN WITH ROBERTA AND ASKED HER A FEW
QUESTIONS ABOUT HER PROJECT AND ASK WHAT SHE
THINKS THE FUTURE HOLDS FOR HERSELF.
“It’s been
a very
steep
learning
curve!”
Case Study - Roberta

More Related Content

What's hot

IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD Editor
 
A Practical Computer Program That Diagnoses Diseases In Actual Patients
A Practical Computer Program That Diagnoses Diseases In Actual PatientsA Practical Computer Program That Diagnoses Diseases In Actual Patients
A Practical Computer Program That Diagnoses Diseases In Actual Patients
Carlos Feder
 

What's hot (19)

Managing Bed Capacity Towards a Solution
Managing Bed Capacity Towards a SolutionManaging Bed Capacity Towards a Solution
Managing Bed Capacity Towards a Solution
 
Project on disease prediction
Project on disease predictionProject on disease prediction
Project on disease prediction
 
Heart Attack Prediction using Machine Learning
Heart Attack Prediction using Machine LearningHeart Attack Prediction using Machine Learning
Heart Attack Prediction using Machine Learning
 
2015 EMS 3.0
2015 EMS 3.02015 EMS 3.0
2015 EMS 3.0
 
Analytics in healthcare bhuvaneashwar 11th_march
Analytics in healthcare  bhuvaneashwar  11th_marchAnalytics in healthcare  bhuvaneashwar  11th_march
Analytics in healthcare bhuvaneashwar 11th_march
 
Analytics in healthcare
Analytics in healthcareAnalytics in healthcare
Analytics in healthcare
 
Mental Disorder Diagnosis using Machine Learning
Mental Disorder Diagnosis using Machine LearningMental Disorder Diagnosis using Machine Learning
Mental Disorder Diagnosis using Machine Learning
 
IRJET- Disease Prediction using Machine Learning
IRJET-  	  Disease Prediction using Machine LearningIRJET-  	  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
 
A method for mining infrequent causal associations and its application in fin...
A method for mining infrequent causal associations and its application in fin...A method for mining infrequent causal associations and its application in fin...
A method for mining infrequent causal associations and its application in fin...
 
DISEASE PREDICTION BY MACHINE LEARNING OVER BIG DATA FROM HEALTHCARE COMMUNI...
 DISEASE PREDICTION BY MACHINE LEARNING OVER BIG DATA FROM HEALTHCARE COMMUNI... DISEASE PREDICTION BY MACHINE LEARNING OVER BIG DATA FROM HEALTHCARE COMMUNI...
DISEASE PREDICTION BY MACHINE LEARNING OVER BIG DATA FROM HEALTHCARE COMMUNI...
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment system
 
Heart Attack Prediction System Using Fuzzy C Means Classifier
Heart Attack Prediction System Using Fuzzy C Means ClassifierHeart Attack Prediction System Using Fuzzy C Means Classifier
Heart Attack Prediction System Using Fuzzy C Means Classifier
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Medical data diagnosis
Medical data diagnosisMedical data diagnosis
Medical data diagnosis
 
A Practical Computer Program That Diagnoses Diseases In Actual Patients
A Practical Computer Program That Diagnoses Diseases In Actual PatientsA Practical Computer Program That Diagnoses Diseases In Actual Patients
A Practical Computer Program That Diagnoses Diseases In Actual Patients
 
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
Disease Prediction by Machine Learning Over Big Data From Healthcare CommunitiesDisease Prediction by Machine Learning Over Big Data From Healthcare Communities
Disease Prediction by Machine Learning Over Big Data From Healthcare Communities
 
Real-time monitoring and the data trap
Real-time monitoring and the data trapReal-time monitoring and the data trap
Real-time monitoring and the data trap
 
Big Data Provides Opportunities, Challenges and a Better Future in Health and...
Big Data Provides Opportunities, Challenges and a Better Future in Health and...Big Data Provides Opportunities, Challenges and a Better Future in Health and...
Big Data Provides Opportunities, Challenges and a Better Future in Health and...
 
Emergency Department Throughput: Using DES as an effective tool for decision ...
Emergency Department Throughput: Using DES as an effective tool for decision ...Emergency Department Throughput: Using DES as an effective tool for decision ...
Emergency Department Throughput: Using DES as an effective tool for decision ...
 

Similar to Case Study - Roberta

Similar to Case Study - Roberta (20)

vaagdevi paper.pdf
vaagdevi paper.pdfvaagdevi paper.pdf
vaagdevi paper.pdf
 
Case Study: Advanced analytics in healthcare using unstructured data
Case Study: Advanced analytics in healthcare using unstructured dataCase Study: Advanced analytics in healthcare using unstructured data
Case Study: Advanced analytics in healthcare using unstructured data
 
Multi Disease Detection using Deep Learning
Multi Disease Detection using Deep LearningMulti Disease Detection using Deep Learning
Multi Disease Detection using Deep Learning
 
IRJET - E-Health Chain and Anticipation of Future Disease
IRJET - E-Health Chain and Anticipation of Future DiseaseIRJET - E-Health Chain and Anticipation of Future Disease
IRJET - E-Health Chain and Anticipation of Future Disease
 
Data science in healthcare.pptx
Data science in healthcare.pptxData science in healthcare.pptx
Data science in healthcare.pptx
 
Artificial Intelligence in Medicine.pdf
Artificial Intelligence in Medicine.pdfArtificial Intelligence in Medicine.pdf
Artificial Intelligence in Medicine.pdf
 
Machine learning for the Healthcare Industry
Machine learning for the Healthcare IndustryMachine learning for the Healthcare Industry
Machine learning for the Healthcare Industry
 
iietalk16 (1).ppt radiology and nlp discovery
iietalk16 (1).ppt radiology and nlp discoveryiietalk16 (1).ppt radiology and nlp discovery
iietalk16 (1).ppt radiology and nlp discovery
 
AI_health.ppt
AI_health.pptAI_health.ppt
AI_health.ppt
 
Artificial intelligence-in-radiology
Artificial intelligence-in-radiologyArtificial intelligence-in-radiology
Artificial intelligence-in-radiology
 
IRJET- Disease Analysis and Giving Remedies through an Android Application
IRJET- Disease Analysis and Giving Remedies through an Android ApplicationIRJET- Disease Analysis and Giving Remedies through an Android Application
IRJET- Disease Analysis and Giving Remedies through an Android Application
 
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...
Diabetes Prediction by Supervised and Unsupervised Approaches with Feature Se...
 
Deep Learning for Predictive Analytics in Healthcare – Pubrica.pptx
Deep Learning for Predictive Analytics in Healthcare – Pubrica.pptxDeep Learning for Predictive Analytics in Healthcare – Pubrica.pptx
Deep Learning for Predictive Analytics in Healthcare – Pubrica.pptx
 
Proposed Model for Chest Disease Prediction using Data Analytics
Proposed Model for Chest Disease Prediction using Data AnalyticsProposed Model for Chest Disease Prediction using Data Analytics
Proposed Model for Chest Disease Prediction using Data Analytics
 
Big data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simpleBig data, RWE and AI in Clinical Trials made simple
Big data, RWE and AI in Clinical Trials made simple
 
IRJET- Cancer Disease Prediction using Machine Learning over Big Data
IRJET- Cancer Disease Prediction using Machine Learning over Big DataIRJET- Cancer Disease Prediction using Machine Learning over Big Data
IRJET- Cancer Disease Prediction using Machine Learning over Big Data
 
Day 1: Real-World Data Panel
Day 1: Real-World Data Panel Day 1: Real-World Data Panel
Day 1: Real-World Data Panel
 
Use of data analytics in health care
Use of data analytics in health careUse of data analytics in health care
Use of data analytics in health care
 
Health Analyzer System
Health Analyzer SystemHealth Analyzer System
Health Analyzer System
 
EarlySense - NOAH19 Tel Aviv
EarlySense - NOAH19 Tel AvivEarlySense - NOAH19 Tel Aviv
EarlySense - NOAH19 Tel Aviv
 

More from David Alderton (6)

Case Study - Victor
Case Study - VictorCase Study - Victor
Case Study - Victor
 
Case Study - Stanislaw
Case Study - StanislawCase Study - Stanislaw
Case Study - Stanislaw
 
Case Sudy - Oliver
Case Sudy - OliverCase Sudy - Oliver
Case Sudy - Oliver
 
Case Study - Jonno
Case Study - JonnoCase Study - Jonno
Case Study - Jonno
 
Case Study - Alex
Case Study - AlexCase Study - Alex
Case Study - Alex
 
Case Study - Agus
Case Study - AgusCase Study - Agus
Case Study - Agus
 

Recently uploaded

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 

Recently uploaded (20)

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 

Case Study - Roberta

  • 1. ROBERTA BALCYTYTE MACHINE LEARNING FOR EARLY DETECTION OF HEREDITARY RARE DISEASE.
  • 2. ROBERTA BALCYTYTE MACHINE LEARNING FOR EARLY DETECTION OF HEREDITARY RARE DISEASE. Prediction of rare disease from big and severely imbalanced data. Rare diseases affect a realively small number of people (1 per 2,000) compared to the general population and specific issues are raised in relation to their rarity. In particular, many patients are not diagnosed, the diagnosis is delayed or wrongly determined resulting in inappropriate treatments. This research aims to investigate machine learning based models which can capture early flags of rare disease, namely HAE, and supplement medical diagnosis procedure by making it faster and more accurate. The rarity of the condition poses the under-investigated computational challenge of severe class imbalance within big data (1,200 cases vs. 165M controls). Task: For the experiments we used a cleaned≈data set which contained 165M US patients with ~1,200 HAE positive cases and ~240 features on relevant medical events. Firstly, we experimented with six classifiers to fit prediction model: L2 regularized logistic regression, SVM with Gaussian kernel, decision tree, random forest, AdaBoost and Gaussian Naïve Bayes. All classifiers were tuned within cross-validation and trained on randomly under-sampled controls’ for 50 iterations. Secondly, we applied an advanced technique for under-sampling the majority class from a big data set. The technique is based on Tomek-links and is parallelizable. The research was conducted in collaboration with industry partner ‘IMS Health’ and lead by UCL Prof. DelmiroFernandez-Reyes. Review: The project is still in progress but in the first stage we have found that Random Forest and AdaBoost outperformed other classifiers with an average AUC of 88.9% (std. 0.87%) and 89.0% (std. 0.81%) respectively. Furthermore, AdaBoost achieved a higher sensitivity of 63.97% compared to Random Forest while sustaining a relatively high specificity of 93.17% (Fig. 2). During the second stage the advanced under-sampling technique proved itself to improve the predictive power of the classifier, but only slightly. To our knowledge this research is the first attempt to apply machine learning to predicting HAE and is one of the few studies focusing on rare disease prediction in general from the current big and severely imbalanced data set.
  • 3. What makes this project unique? This project is the first in its field to apply machine learning to HAE prediction. Our aim is to build a predictive model for a very rare disease from big and severely imbalanced data. It will make the diagnosis of such a disease faster and more accurate. What are your plans for the future? I have been learning and implementing new techniques of under-sampling and parallel computing. After it is finished I would like a data analytics job within healthcare. The project has definitely motivated me to continue working within that industry. Did you always know this was the area you wanted to work in? Originally my background was in Economics. After graduating I joined EY for 5 years, doing projects related to process analysis, improvement, risk assessment and organizational performance monitoring - not very related to data analytics or programming! I decided to convert to data analytics after my secondment year in the Enterprise Intelligence & Data Analytics Centre of Excellence. It inspired me to pursue new challenges and become a data scientist. So I applied to UCL to do a MSc in Business Analytics. What has been the highlight so far? I started the program from scratch, not knowing anything before. I am proud of my endeavor to becoming a data scientist. It’s been a very steep learning curve! What advice would you give your 18 year old self? Always keep learning and exploring. What is changing in engineering? This research project has convinced me that revised, novel algorithms and new tools are needed to leverage the treasures of big data. What excites you about the opportunities with data today and in the future I believe that data science will start evolving in the healthcare industry more and more rapidly and very soon. This will lead to better medical services, saved lives and an improved quality of life for many, many people. I am very excited to be part of this, developing innovative, more efficient ways of curing people or even better, preventing diseases. //// Q&A WE SAT DOWN WITH ROBERTA AND ASKED HER A FEW QUESTIONS ABOUT HER PROJECT AND ASK WHAT SHE THINKS THE FUTURE HOLDS FOR HERSELF. “It’s been a very steep learning curve!”