SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Data Mining in Radiology Reports  SaeedMehrabi Spring 2010INFO-I535 Dr. Patrick W. Jamieson Dr. Josette Jones
Outline Introduction to data and text mining  Our data set Structuring free text Results Similar works  Discussion
What is Data Mining  Data mining is The extraction of useful patterns from data sources such as databases, texts and web. There is a big gap from stored data to knowledge and the transition won’t occur automatically. Many interesting things you want to find cannot be found using database queries     “find me people likely to buy my products”      “Who are likely to respond to my promotion”
Why data mining now? The data is abundant. The data is being warehoused. The computing power is affordable. The competitive pressure is strong. Data mining tools have become available
Text Mining  Text mining applies and adapts data mining techniques to text domain Structured vs. Free Text Structured text can be stored in a relational database. Providing the means to represent data available in text in structured format will make information exchange, data mining and information retrieval more feasible.
Data Set Our corpus consists of: 594,000 de-identified radiology reports  36 million words  4.3 million sentences  The reports were dictated by the Indiana University Radiology faculty, a group of 40 radiologists, from 1993-1998.
Structuring Free text  Regular expression was used to detect sentences in reports! Regular expression is a concise and flexible way of matching strings of text, such as particular characters or words. Sentences annotated to propositions which simply are sentences expressing the same concept for similar findings within reports
Structuring Free text (Cont.) A proposition is a declarative sentence, that is either true or false but not both. Today is a beautiful sunny day.    ( A proposition) x + 2 = 4                                        (Not a proposition)  Users can select propositions and map sentences to propositions
Corpus Annotation  So for annotating each new sentence from the radiology reports the computer initially propose propositions The suggested propositions by the software are reviewed by experts and corrected as needed before validation. If there is no proposition in the ontology then the expert can create new ones.
Results  The process of building the ontology of propositions is in parallel with the expert annotating sentences to the existing proposition So far, 427,433 unique sentences from the corpus have been annotated.  Representing a total of 2,561,330 sentences or 60% of the total sentences.
Results (Cont.) The propositions are categorized into main findings such as brain and skull, general radiology, ..  All propositions with information such as whether they are normal or abnormal finding and the number of the sentences mapped to them are all stored in a relational data base  We can find the most frequent or highest ranked propositions by sorting them based the number of sentences that are mapped to them, how many of them are normal or abnormal and the number of normal and abnormal propositions and sentences in each category
Similar works CLEF (Clinical E-Science Framework) It consists of both structured records and free text documents(clinical narratives, radiology reports and histopathology report) Semantic annotation of clinical text to assist in the development and evaluation of an Information Extraction system
LEXIcon Mediated Entropy Reduction
LEXIMER(Cont.) Phrase Isolation includes scanning the report text and separating the content into phrases Noise Reduction  decreases the amount of non-clinically relevant information contained within the report Signal Extraction  pulls out the positive statements and recommendations from the clinically relevant phrases
NLP using OLAP for assessing Recommendations in radiology reports  Database: 4,279,179 radiology reports from a single tertiary health care center 10-year period (1995-2004) Consist of reports of most common imaging modalities tests with patient demographics Leximerin conjunction with OnLine Analytic Processing was used for classifying reports into those with recommendation (IREC) and without recommendations for imaging  IREC rates were determined for different patient age groups, gender, imaging modalities, indications, diseases, subspecialties, and referring physicians
Discussion  CLEF work is on very limited number of reports  In Leximer, there is no validation of their classification method and phrases cannot convey the meaning of a sentence.  What distinguish our work from others is the large amount of data that is mined and consistent expert validation.
Reference  Friedlin, J., Mahoui, M., Jones, J., Kashyap, V., & Jamieson , P. (2010).Knowledge Discovery and Data Mining of Free Text Radiology.Submitted to the journal of biomedical informatics  Roberts, A., Gaizauskas, R., Hepple, M., Demetriou, G., Guo, Y., Setzer, A., et al. (2008). Semantic Annotation of Clinical Text: The CLEF Corpus. Retrieved April 20, 2010, from ftp://ftp.dcs.shef.ac.uk/home/robertg/papers/lrec08-clefcorpus.pdf Dang PA, Kalra MK, Blake MA, Schultz TJ, Stout M, Lemay PR, Freshman DJ, Halpern EF, Dreyer KJ. Natural language processing using online analytic processing for assessing recommendations in radiology reports.J Am CollRadiol. 2008 Mar;5(3):197-204. http://www.nuance.com/healthcare/products/radcube-for-radiology.asp

Weitere ähnliche Inhalte

Was ist angesagt?

Semantic tagging for documents using 'short text' information
Semantic tagging for documents using 'short text' informationSemantic tagging for documents using 'short text' information
Semantic tagging for documents using 'short text' information
csandit
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
IJTET Journal
 
Novel Database-Centric Framework for Incremental Information Extraction
Novel Database-Centric Framework for Incremental Information ExtractionNovel Database-Centric Framework for Incremental Information Extraction
Novel Database-Centric Framework for Incremental Information Extraction
ijsrd.com
 

Was ist angesagt? (18)

Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibConceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
 
Semantic tagging for documents using 'short text' information
Semantic tagging for documents using 'short text' informationSemantic tagging for documents using 'short text' information
Semantic tagging for documents using 'short text' information
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge Graphs
 
Text mining
Text miningText mining
Text mining
 
Text mining
Text miningText mining
Text mining
 
Domain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised ApproachDomain Specific Named Entity Recognition Using Supervised Approach
Domain Specific Named Entity Recognition Using Supervised Approach
 
A Semantic Retrieval System for Extracting Relationships from Biological Corpus
A Semantic Retrieval System for Extracting Relationships from Biological CorpusA Semantic Retrieval System for Extracting Relationships from Biological Corpus
A Semantic Retrieval System for Extracting Relationships from Biological Corpus
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
 
A Novel Approach for Keyword extraction in learning objects using text mining
A Novel Approach for Keyword extraction in learning objects using text miningA Novel Approach for Keyword extraction in learning objects using text mining
A Novel Approach for Keyword extraction in learning objects using text mining
 
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISONSIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
SIMILAR THESAURUS BASED ON ARABIC DOCUMENT: AN OVERVIEW AND COMPARISON
 
Cl4201593597
Cl4201593597Cl4201593597
Cl4201593597
 
6.domain extraction from research papers
6.domain extraction from research papers6.domain extraction from research papers
6.domain extraction from research papers
 
A Multiple Ontology, Concept based, Context-sensitive Search and Retrieval
A Multiple Ontology, Concept based, Context-sensitive Search and RetrievalA Multiple Ontology, Concept based, Context-sensitive Search and Retrieval
A Multiple Ontology, Concept based, Context-sensitive Search and Retrieval
 
Text Mining Framework
Text Mining FrameworkText Mining Framework
Text Mining Framework
 
Novel Database-Centric Framework for Incremental Information Extraction
Novel Database-Centric Framework for Incremental Information ExtractionNovel Database-Centric Framework for Incremental Information Extraction
Novel Database-Centric Framework for Incremental Information Extraction
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrieval
 

Ähnlich wie Data Mining in Rediology reports

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
Waqas Tariq
 
Visual Analytics and the Language of Web Query Logs - A Terminology Perspective
Visual Analytics and the Language of Web Query Logs - A Terminology PerspectiveVisual Analytics and the Language of Web Query Logs - A Terminology Perspective
Visual Analytics and the Language of Web Query Logs - A Terminology Perspective
Findwise
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical Informatics
Chimezie Ogbuji
 
Knowledge Discovery And Data Mining Of Free Text Final
Knowledge Discovery And Data Mining Of Free Text FinalKnowledge Discovery And Data Mining Of Free Text Final
Knowledge Discovery And Data Mining Of Free Text Final
kdjamies
 

Ähnlich wie Data Mining in Rediology reports (20)

Nlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseasesNlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseases
 
Biomedical indexing and retrieval system based on language modeling approach
Biomedical indexing and retrieval system based on language modeling approachBiomedical indexing and retrieval system based on language modeling approach
Biomedical indexing and retrieval system based on language modeling approach
 
50120130406036
5012013040603650120130406036
50120130406036
 
Patient Empowerment by Increasing Information Accessibility In a Telecare Sys...
Patient Empowerment by Increasing Information Accessibility In a Telecare Sys...Patient Empowerment by Increasing Information Accessibility In a Telecare Sys...
Patient Empowerment by Increasing Information Accessibility In a Telecare Sys...
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
T OP K-O PINION D ECISIONS R ETRIEVAL IN H EALTHCARE S YSTEM
T OP  K-O PINION  D ECISIONS  R ETRIEVAL IN  H EALTHCARE  S YSTEM T OP  K-O PINION  D ECISIONS  R ETRIEVAL IN  H EALTHCARE  S YSTEM
T OP K-O PINION D ECISIONS R ETRIEVAL IN H EALTHCARE S YSTEM
 
Prosdocimi ucb cdao
Prosdocimi ucb cdaoProsdocimi ucb cdao
Prosdocimi ucb cdao
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
 
The Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language ProcessingThe Process of Information extraction through Natural Language Processing
The Process of Information extraction through Natural Language Processing
 
Visual Analytics and the Language of Web Query Logs - A Terminology Perspective
Visual Analytics and the Language of Web Query Logs - A Terminology PerspectiveVisual Analytics and the Language of Web Query Logs - A Terminology Perspective
Visual Analytics and the Language of Web Query Logs - A Terminology Perspective
 
Information extraction from EHR
Information extraction from EHRInformation extraction from EHR
Information extraction from EHR
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
Ibn Sina
Ibn SinaIbn Sina
Ibn Sina
 
Ontology oriented concept based clustering
Ontology oriented concept based clusteringOntology oriented concept based clustering
Ontology oriented concept based clustering
 
Ontology oriented concept based clustering
Ontology oriented concept based clusteringOntology oriented concept based clustering
Ontology oriented concept based clustering
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical Informatics
 
Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul Groth
 
Knowledge Discovery And Data Mining Of Free Text Final
Knowledge Discovery And Data Mining Of Free Text FinalKnowledge Discovery And Data Mining Of Free Text Final
Knowledge Discovery And Data Mining Of Free Text Final
 
2013 communication disorders assignment 1 a
2013 communication disorders assignment 1 a2013 communication disorders assignment 1 a
2013 communication disorders assignment 1 a
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 

Kürzlich hochgeladen

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Kürzlich hochgeladen (20)

Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 

Data Mining in Rediology reports

  • 1. Data Mining in Radiology Reports SaeedMehrabi Spring 2010INFO-I535 Dr. Patrick W. Jamieson Dr. Josette Jones
  • 2. Outline Introduction to data and text mining Our data set Structuring free text Results Similar works Discussion
  • 3. What is Data Mining Data mining is The extraction of useful patterns from data sources such as databases, texts and web. There is a big gap from stored data to knowledge and the transition won’t occur automatically. Many interesting things you want to find cannot be found using database queries “find me people likely to buy my products” “Who are likely to respond to my promotion”
  • 4. Why data mining now? The data is abundant. The data is being warehoused. The computing power is affordable. The competitive pressure is strong. Data mining tools have become available
  • 5. Text Mining Text mining applies and adapts data mining techniques to text domain Structured vs. Free Text Structured text can be stored in a relational database. Providing the means to represent data available in text in structured format will make information exchange, data mining and information retrieval more feasible.
  • 6. Data Set Our corpus consists of: 594,000 de-identified radiology reports 36 million words 4.3 million sentences The reports were dictated by the Indiana University Radiology faculty, a group of 40 radiologists, from 1993-1998.
  • 7. Structuring Free text Regular expression was used to detect sentences in reports! Regular expression is a concise and flexible way of matching strings of text, such as particular characters or words. Sentences annotated to propositions which simply are sentences expressing the same concept for similar findings within reports
  • 8. Structuring Free text (Cont.) A proposition is a declarative sentence, that is either true or false but not both. Today is a beautiful sunny day. ( A proposition) x + 2 = 4 (Not a proposition) Users can select propositions and map sentences to propositions
  • 9.
  • 10. Corpus Annotation So for annotating each new sentence from the radiology reports the computer initially propose propositions The suggested propositions by the software are reviewed by experts and corrected as needed before validation. If there is no proposition in the ontology then the expert can create new ones.
  • 11.
  • 12. Results The process of building the ontology of propositions is in parallel with the expert annotating sentences to the existing proposition So far, 427,433 unique sentences from the corpus have been annotated. Representing a total of 2,561,330 sentences or 60% of the total sentences.
  • 13. Results (Cont.) The propositions are categorized into main findings such as brain and skull, general radiology, .. All propositions with information such as whether they are normal or abnormal finding and the number of the sentences mapped to them are all stored in a relational data base We can find the most frequent or highest ranked propositions by sorting them based the number of sentences that are mapped to them, how many of them are normal or abnormal and the number of normal and abnormal propositions and sentences in each category
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. Similar works CLEF (Clinical E-Science Framework) It consists of both structured records and free text documents(clinical narratives, radiology reports and histopathology report) Semantic annotation of clinical text to assist in the development and evaluation of an Information Extraction system
  • 21. LEXIMER(Cont.) Phrase Isolation includes scanning the report text and separating the content into phrases Noise Reduction decreases the amount of non-clinically relevant information contained within the report Signal Extraction pulls out the positive statements and recommendations from the clinically relevant phrases
  • 22. NLP using OLAP for assessing Recommendations in radiology reports Database: 4,279,179 radiology reports from a single tertiary health care center 10-year period (1995-2004) Consist of reports of most common imaging modalities tests with patient demographics Leximerin conjunction with OnLine Analytic Processing was used for classifying reports into those with recommendation (IREC) and without recommendations for imaging IREC rates were determined for different patient age groups, gender, imaging modalities, indications, diseases, subspecialties, and referring physicians
  • 23. Discussion CLEF work is on very limited number of reports In Leximer, there is no validation of their classification method and phrases cannot convey the meaning of a sentence. What distinguish our work from others is the large amount of data that is mined and consistent expert validation.
  • 24. Reference Friedlin, J., Mahoui, M., Jones, J., Kashyap, V., & Jamieson , P. (2010).Knowledge Discovery and Data Mining of Free Text Radiology.Submitted to the journal of biomedical informatics Roberts, A., Gaizauskas, R., Hepple, M., Demetriou, G., Guo, Y., Setzer, A., et al. (2008). Semantic Annotation of Clinical Text: The CLEF Corpus. Retrieved April 20, 2010, from ftp://ftp.dcs.shef.ac.uk/home/robertg/papers/lrec08-clefcorpus.pdf Dang PA, Kalra MK, Blake MA, Schultz TJ, Stout M, Lemay PR, Freshman DJ, Halpern EF, Dreyer KJ. Natural language processing using online analytic processing for assessing recommendations in radiology reports.J Am CollRadiol. 2008 Mar;5(3):197-204. http://www.nuance.com/healthcare/products/radcube-for-radiology.asp