SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Medical Engineering Data Analysis Framework for Clinical Decision Support for Pediatrics Neuro-Development Disorders eiTanLaVi Advisors: Prof. ShmuelEinav, Biomedical Engineering Department, Tel-Aviv University Prof. Yuval Shahar, Department of Information Systems Engineering, Ben-Gurion University Dr. Mitchell Schertz, Institute for Child Development ,KupatHolimMeuhedet, Central Region, Herzeliya
Introduction – Clinical Domain Pediatric Neuro-Development Disorders (NDD): A delay in development based on that expected for a given age level or stage of development Originate before the age of 18 May be expected to continue indefinitely Constitute a substantial impairment.  Biological and non-biological factors are involved in these disorders ,[object Object],Pervasiveness: 1 in 5 Children ,[object Object],[object Object]
NDD – Current Clinical Practice Diagnosis is mainly performed based on an external evaluation of the child  The pediatrician relies only on her own (available memory of his) past experience Inadequate Human ability to retrieve prior experience in an unbiased, complete and objective fashion.  Need for Experience Based Decision Support System
IntroductionCase-Based Reasoning ,[object Object],[object Object]
Objectives
Problem Space Institute for Child Development ,KupatHolimMeuhedet, Central Region  Collaborating physician – Dr. Mitchell Schertz, head of the institute.  Has been building a case-base since 2000 The case base currently holds 1941 non-active children, 2477 active children, 8022 cases. Much of the case information is in free-text form => making this also a TCBR project.
Building the Data Set 465 x 3 1474 x 60 4582 x 19 1143 x 69 437 x 3 5107 x 2 1560 x 43 13133 x 2 8022 x 153 4826 x 2 5227 x 3 n x m == # observations x  # attributes 6861 x 1
Preprocessing and Transformations 182 attributes Attribute  Type Mapping Date  Numeric Binary Textual 8022 Neuroi.ds X Case-Base Factor Dirty Factor Free Text  235 Diagnoses Binary  Diagnoses Vector for Case i 8022 Neuroi.ds Y Diagnoses
Similarity  Between Cases C1 ,[object Object],a11a12a13      … …    a1i C2 a21a22a23      … …    a2i the distance between the ithattribute of the two cases The clinical weight/relevance of the attribute
Similarity Metrics Date Distance = month gap Numeric/Binary Distance = normalized Euclidean  NA Distance =  0.5 if both fields are NA,  	      			-0.5 otherwise Textual Distance =  Cosine Similarity of Latent Semantic Analysis (LSA) derived document vectors
LSA - Advantages Strictly mathematical approach - inherently independent of language. Able to perform cross linguistic concept searching and example-based categorization. Automatically adapts to new and changing terminology Has been shown to be very tolerant of noise  Deals effectively with sparse, ambiguous and contradictory data Text doesn't have to be in sentence form
a c b d Latent Semantic Analysis1st 4 Steps
Weighted Term-Document Matrix A Local Term Weight: lij–relative frequency of the term iin a document j A Global Term Weight: gi–relative frequency of the term iwithinthe entire corpus
Exploring the Text-Document Matrix
Column = Document  Vector in Concept  Space Row =  Term  Vector  in Concept  Space Singular Values TMatrix ,[object Object]
r = measure of unique dimensions
Columns are eigenvectors of AAT = are axes to define the term spaceDT   Matrix ,[object Object]
Rows are eigenvectors of ATA= are axes to define the document space AMatrix ,[object Object]
TDM with weighted term frequenciesSingular Value Decomposition (SVD)
Cosine Similarity Measure c.1   c.2    …                            c.r Doc.i c.1   c.2    …                            c.r Doc.j
Similarity  Between Cases C1 ,[object Object],a11a12a13      … …    a1i C2 a21a22a23      … …    a2i the distance between the ithattribute of the two cases The clinical weight/relevance of the attribute
Build Similarity Matrix for Each Attribute
` N.cases Aggregating Similarity Results for each Test Case t.i N.cases N.Cases N.Attributes
N.Attributes Simple AverageRetrieval Vectors N.Cases
N.Attributes Weighted AverageRetrieval Vectors N.Cases
N.Attributes Sorted Similarity Retrieval Vectors N.Cases
High Similarity Retrieval Vectors N.Attributes N.Cases
Results
Example of diagnosis prediction scores for a specific {test case,retrievalmethod,K value} combinations. In actuality, 32 such graphs were generated for each of the 350 test cases  The real diagnoses for this test case were:  (1) DELAY IN DEVELOPMENTAL MILESTONES (2) GROSS MOTOR  (3) NORMAL EARLY INTELLIGENCE
Example of diagnosis prediction scores for a specific {test case,retrievalmethod,K value} combinations. In actuality, 32 such graphs were generated for each of the 350 test cases  The real diagnoses for this test case were:  (1) DELAY IN DEVELOPMENTAL MILESTONES (2) GROSS MOTOR  (3) NORMAL EARLY INTELLIGENCE
Example of diagnosis prediction scores for a specific {test case,retrievalmethod,K value} combinations. In actuality, 32 such graphs were generated for each of the 350 test cases  The real diagnoses for this test case were:  (1) DELAY IN DEVELOPMENTAL MILESTONES (2) GROSS MOTOR  (3) NORMAL EARLY INTELLIGENCE
Example of diagnosis prediction scores for a specific {test case,retrievalmethod,K value} combinations. In actuality, 32 such graphs were generated for each of the 350 test cases  The real diagnoses for this test case were:  (1) DELAY IN DEVELOPMENTAL MILESTONES (2) GROSS MOTOR  (3) NORMAL EARLY INTELLIGENCE
Prediction evaluation matrix for a specific test case and retrieval method For each test case, prediction vectors were generated using 8 retrieval & prediction methods, for 8 different K values (total 64 per test case) For each test case, prediction vectors were generated using 8 retrieval & prediction methods, for 8 different K values (total 64 per test case)
SAR = 1/3 * (Accuracy + Area under the ROC curve + Root mean-squared error )   =   Score combining performance measures of different characteristics, in the attempt of creating a more "robust" measure (cf. Caruana R., ROCAI2004).
F measure – Can help to dynamically choose threshold
Conclusions
Future
Future (2)

Weitere ähnliche Inhalte

Ähnlich wie M Sc Thesis Presentation Eitan Lavi

Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for HealthcareChandan Reddy
 
Data mining data characteristics
Data mining data characteristicsData mining data characteristics
Data mining data characteristicsKingSuleiman1
 
American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1Double Check ĆŐNSULTING
 
Statistical techniques used in measurement
Statistical techniques used in measurementStatistical techniques used in measurement
Statistical techniques used in measurementShivamKhajuria3
 
Data mining techniques in data mining with examples
Data mining techniques in data mining with examplesData mining techniques in data mining with examples
Data mining techniques in data mining with examplesmqasimsheikh5
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyPaolo Missier
 
Heart Disease Prediction Using Data Mining Techniques
Heart Disease Prediction Using Data Mining TechniquesHeart Disease Prediction Using Data Mining Techniques
Heart Disease Prediction Using Data Mining TechniquesIJRES Journal
 
Identification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning MethodIdentification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning Methodpraveena06
 
Session ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corrSession ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corrUSD Bioinformatics
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataDmitry Grapov
 
Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...
Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...
Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...IIRindia
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchFranciscoJAzuajeG
 

Ähnlich wie M Sc Thesis Presentation Eitan Lavi (20)

TestSurvRec manual
TestSurvRec manualTestSurvRec manual
TestSurvRec manual
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
02Data.ppt
02Data.ppt02Data.ppt
02Data.ppt
 
02Data.ppt
02Data.ppt02Data.ppt
02Data.ppt
 
Basen Network
Basen NetworkBasen Network
Basen Network
 
02 data
02 data02 data
02 data
 
Data mining data characteristics
Data mining data characteristicsData mining data characteristics
Data mining data characteristics
 
American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1
 
Chapter 2. Know Your Data.ppt
Chapter 2. Know Your Data.pptChapter 2. Know Your Data.ppt
Chapter 2. Know Your Data.ppt
 
Statistical techniques used in measurement
Statistical techniques used in measurementStatistical techniques used in measurement
Statistical techniques used in measurement
 
02Data-osu-0829.pdf
02Data-osu-0829.pdf02Data-osu-0829.pdf
02Data-osu-0829.pdf
 
Data mining techniques in data mining with examples
Data mining techniques in data mining with examplesData mining techniques in data mining with examples
Data mining techniques in data mining with examples
 
algorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparencyalgorithmic-decisions, fairness, machine learning, provenance, transparency
algorithmic-decisions, fairness, machine learning, provenance, transparency
 
Feedbackdriven radiologyreportretrieval ichi2015-v2
Feedbackdriven radiologyreportretrieval ichi2015-v2Feedbackdriven radiologyreportretrieval ichi2015-v2
Feedbackdriven radiologyreportretrieval ichi2015-v2
 
Heart Disease Prediction Using Data Mining Techniques
Heart Disease Prediction Using Data Mining TechniquesHeart Disease Prediction Using Data Mining Techniques
Heart Disease Prediction Using Data Mining Techniques
 
Identification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning MethodIdentification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning Method
 
Session ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corrSession ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corr
 
Multivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological dataMultivariate data analysis and visualization tools for biological data
Multivariate data analysis and visualization tools for biological data
 
Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...
Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...
Comparative Analysis of Weighted Emphirical Optimization Algorithm and Lazy C...
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical research
 

M Sc Thesis Presentation Eitan Lavi

  • 1. Medical Engineering Data Analysis Framework for Clinical Decision Support for Pediatrics Neuro-Development Disorders eiTanLaVi Advisors: Prof. ShmuelEinav, Biomedical Engineering Department, Tel-Aviv University Prof. Yuval Shahar, Department of Information Systems Engineering, Ben-Gurion University Dr. Mitchell Schertz, Institute for Child Development ,KupatHolimMeuhedet, Central Region, Herzeliya
  • 2.
  • 3. NDD – Current Clinical Practice Diagnosis is mainly performed based on an external evaluation of the child The pediatrician relies only on her own (available memory of his) past experience Inadequate Human ability to retrieve prior experience in an unbiased, complete and objective fashion. Need for Experience Based Decision Support System
  • 4.
  • 6. Problem Space Institute for Child Development ,KupatHolimMeuhedet, Central Region Collaborating physician – Dr. Mitchell Schertz, head of the institute. Has been building a case-base since 2000 The case base currently holds 1941 non-active children, 2477 active children, 8022 cases. Much of the case information is in free-text form => making this also a TCBR project.
  • 7. Building the Data Set 465 x 3 1474 x 60 4582 x 19 1143 x 69 437 x 3 5107 x 2 1560 x 43 13133 x 2 8022 x 153 4826 x 2 5227 x 3 n x m == # observations x # attributes 6861 x 1
  • 8. Preprocessing and Transformations 182 attributes Attribute Type Mapping Date Numeric Binary Textual 8022 Neuroi.ds X Case-Base Factor Dirty Factor Free Text 235 Diagnoses Binary Diagnoses Vector for Case i 8022 Neuroi.ds Y Diagnoses
  • 9.
  • 10. Similarity Metrics Date Distance = month gap Numeric/Binary Distance = normalized Euclidean NA Distance = 0.5 if both fields are NA, -0.5 otherwise Textual Distance = Cosine Similarity of Latent Semantic Analysis (LSA) derived document vectors
  • 11. LSA - Advantages Strictly mathematical approach - inherently independent of language. Able to perform cross linguistic concept searching and example-based categorization. Automatically adapts to new and changing terminology Has been shown to be very tolerant of noise Deals effectively with sparse, ambiguous and contradictory data Text doesn't have to be in sentence form
  • 12. a c b d Latent Semantic Analysis1st 4 Steps
  • 13. Weighted Term-Document Matrix A Local Term Weight: lij–relative frequency of the term iin a document j A Global Term Weight: gi–relative frequency of the term iwithinthe entire corpus
  • 15.
  • 16. r = measure of unique dimensions
  • 17.
  • 18.
  • 19. TDM with weighted term frequenciesSingular Value Decomposition (SVD)
  • 20. Cosine Similarity Measure c.1 c.2 … c.r Doc.i c.1 c.2 … c.r Doc.j
  • 21.
  • 22. Build Similarity Matrix for Each Attribute
  • 23. ` N.cases Aggregating Similarity Results for each Test Case t.i N.cases N.Cases N.Attributes
  • 26. N.Attributes Sorted Similarity Retrieval Vectors N.Cases
  • 27. High Similarity Retrieval Vectors N.Attributes N.Cases
  • 28.
  • 30. Example of diagnosis prediction scores for a specific {test case,retrievalmethod,K value} combinations. In actuality, 32 such graphs were generated for each of the 350 test cases The real diagnoses for this test case were: (1) DELAY IN DEVELOPMENTAL MILESTONES (2) GROSS MOTOR  (3) NORMAL EARLY INTELLIGENCE
  • 31. Example of diagnosis prediction scores for a specific {test case,retrievalmethod,K value} combinations. In actuality, 32 such graphs were generated for each of the 350 test cases The real diagnoses for this test case were: (1) DELAY IN DEVELOPMENTAL MILESTONES (2) GROSS MOTOR  (3) NORMAL EARLY INTELLIGENCE
  • 32. Example of diagnosis prediction scores for a specific {test case,retrievalmethod,K value} combinations. In actuality, 32 such graphs were generated for each of the 350 test cases The real diagnoses for this test case were: (1) DELAY IN DEVELOPMENTAL MILESTONES (2) GROSS MOTOR  (3) NORMAL EARLY INTELLIGENCE
  • 33. Example of diagnosis prediction scores for a specific {test case,retrievalmethod,K value} combinations. In actuality, 32 such graphs were generated for each of the 350 test cases The real diagnoses for this test case were: (1) DELAY IN DEVELOPMENTAL MILESTONES (2) GROSS MOTOR  (3) NORMAL EARLY INTELLIGENCE
  • 34. Prediction evaluation matrix for a specific test case and retrieval method For each test case, prediction vectors were generated using 8 retrieval & prediction methods, for 8 different K values (total 64 per test case) For each test case, prediction vectors were generated using 8 retrieval & prediction methods, for 8 different K values (total 64 per test case)
  • 35.
  • 36. SAR = 1/3 * (Accuracy + Area under the ROC curve + Root mean-squared error ) = Score combining performance measures of different characteristics, in the attempt of creating a more "robust" measure (cf. Caruana R., ROCAI2004).
  • 37. F measure – Can help to dynamically choose threshold
  • 38.
  • 39.
  • 40.
  • 41.
  • 45. Thank You Prof. ShmuelEinav Prof. Yuval Shahar Prof. OdedMaimon Dr. Mitchell Schertz The Yitzhak and Chaya Weinstein Research Institute

Hinweis der Redaktion

  1. The NDD specialists often don't follow any preset rules or logical algorithms in making their decisions, and thus the field of Machine Learning is a natural realm from which to approach the classification task at hand:
  2. the required task is more complex then the primary classification types discussed above, and can be termed as Multi-Class Multi-Classification (predict 1 or more classes from a pool of multiple classes)Another important distinction in the NDD domain is that the NDD specialist is the one producing the function mapping from features to diagnoses, through his diagnosis decisions, which are imperfect, inaccurate and inconsistent [9]. Since NDD is a domain lacking a deep clinical understanding or a clear knowledge structure, the physician hasn't necessarily labeled the cases in the case-base with the "correct" classes, nor is it promised that highly similar cases will be given similar diagnoses [2],[9]. We are looking, therefore, to incorporate some aspects of the supervised approach (utilizing the outputs of prior cases in predicting an output for a new case), without resorting to the need to fully deduce a general function mapping from the input objects to the output space (which would completely rely on the outputs' integrity)Moreover, we are looking to also incorporate into our methodology some aspects of the unsupervised approach, primarily the ability to discover patterns and clinically-similar groups in the case-base without using any prior knowledge regarding how the NDD specialists decided to label (diagnose) each case. This would allow us to find, for each new case, the cases most clinically-similar to it basically – we are using clustering (unsupervised learning) to find the clinically similar cases to a test case, and then multilable, multiclass classification (supervised learning) only on the retrieved similar cases – using their physician-given labels to make a prediction.
  3. Preprocessing X Matrix: Attribute types:(a) empty : < 8 non-NA features  removed(b) Date : regex + >90% of feature column entries matching the regex (allowing for non-pattern dates)  transformed into 2 new attributes: month, year(c) Numeric: coeercision to character and back to numeric, if Nas produced by this < 16 , termed as numeric feature  kept in numeric coeercsion(d) binary: multiple conditions + fuzzy detection  consolidated to a single form of “true” and a single form of “false”(e) clean factor: under 25 categories (and no match for the previous feature types)  no action(f) dirty factor: no match for previous + average string length < 20 characters  20 most frequeny levels remained, rest were termed “misc levels”(g) free text: no match for the previous  no actionMissing dataConformed to NA statusPreprocessing Y Matrix: Originally in the .mdb format – each row was a general i.d , a feature (column) gave the respective neuroi.d (there could be two rows with the same neuro. Id), and for each type of diagnosis, a comment or numeric marker was given in a respective feature.  binary diagnosis vector
  4. * In this study – the weights were automatically calculated using a simple algorithmic approach* Other studies have used domain specific ontology - to give different weights to different terms, according to their clinical significance.
  5. * Pij = relative probability Each such entropy is further normalized by log(n), n being the length of the corpus (the number of documents). This normalization was originally devised to give equal treatment to different size corpuses, but since in this project all textual attributes contain the same N.cases number of documents, this produces little effect. A possible improvement to implement in future versions is to replace this with a local normalization by the length of the document, so that the summing up of entropies will be of normalized values in respect to the document length. האנטרופיה היא מדד של פיזור של השימוש במונח ברחבי הקורפוס. בסוף, הסכום של כל pij שווה ל-1, אך ככל שהם מפוזרים יותר טוב ברחבי הקורפוס, כך הציון הגלובלי של המונח יגדל.
  6. This is used in the VECTOR SPACE MODEL
  7. Empirical studies show that truncating the lower singular values can enact noise reduction, and thus the algorithms transformed all singular values in S which were below a certain threshold (set at 10-3) to 0.
  8. Attribute clinical weights:High similarity : if > 80% of cases have similarity to the test case > 0.8  divide by 2Average Input Length in Textual Attribute: if > 30 characters : multiply by 2Test case value for the attribute is NA  divide by 3
  9. In choosing the test cases, however, the distribution of diagnoses in the Y matrix was examined. Inspecting the prior probabilities of diagnoses in the case base shows that there are several diagnoses which occur only once in the entire Y matrix, while others occur in the singles, tens, hundreds and thousands.
  10. 350 case indexes in the final subsetFor each test case, a diagnoses probability prediction vector was outputted, for each of the combinatorial instance of <Retrieval Method (4 types), Reuse & Adapt Method (2 types), K value (8 values)>. That is, 64 diagnoses prediction probability vectors were generated for each test case.
  11. The above 5 graph types were produced for each combination of K, Retrieval Method and Reuse Scheme (i.e. for the 8 X 4 X 2 = 64 distinct combinations). That is, 320 distinct graphs were produced to graphically assess the aggregated results for all test cases.
  12. RMSE = sqrt(1/(P+N) sum_i (y_i - _i)^2) = Root-mean-squared error = summing for all diagnoses (for all i values), an aggregated normalized sum of the individual errors between the predictions and real values of the diagnoses vector. For each diagnosis, the error can be either 0 if the prediction is correct or 1 if the prediction is wrong. Since the output of RMSE is just a cutoff-independent scalar, this measure cannot be combined with other measures into a parametric curve. Accuracy = P( = Y) = estimated as: (TP + TN)/(P + N) = the number of correct predictions divided by the total number of diagnoses predicted = the probability of the algorithm to predict correctly = the rate of correct predictions attained by the algorithm
  13. F measure = Weighted harmonic mean of precision (P) and recall (R) = 1/ (alpha*1/P + (1-alpha)*1/R) (van Rijsbergen, 1979) = If alpha=1/2, the mean is balanced. Sensitivity = Recall = TPrate = P( = + | Y = +) = estimated as: TP/P = True Positive Rate = number of true positives divided by the number of overall positives in the real diagnoses vector from the Y matrix = the algorithm's probability of predicting correctly which diagnoses the patient does have.Precision = PPV = P(Y = + | = +) = estimated as: TP/tstdP = TP/(TP+FP)= Positive Predictive Value = the number of true positives divided by the total number of diagnoses predicted by the algorithm as positives = the probability of a positive "1" prediction to be correct.
  14. P value of the AUC ROC: tests the null hypothesis that the area under the curve really equals 0.50. In other words, the P value answers this question:What is the probability to receive the obtained AUC ROC (or higher) in case the diagnosis algorithm was no better than flipping a coin?
  15. Another reason for choosing ML as an approach for developing a CDSS in NDD, is the need for future scalability – no need for per clinic rules modification