SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
RESEARCH POSTER PRESENTATION DESIGN © 2015
www.PosterPresentations.com
Introduction
Distributed Near Duplicate Detection
●
Integrate medical data from various heterogeneous medical data sources and private
archives using the public APIs.
●
Curate the integrated data into a data warehouse for public access.
●
Store the detected duplicate pairs into a separate data source.
●
Duplicate detection by analyzing the potential data pairs from the original data sources,
using similarity matrices for textual data.
●
Hierarchical meta data attached to the binary medical data to identify, classify, and find
duplicates among the binary raw data.
●
Considers the inconsistencies in representation.
– Usage of acronyms instead of the full form of the attributes.
– Using different measurement units.
●
Data is published to various data sources by the medical data publishers
– through the respective write APIs of the data sources.
●
Connects to the original data sources through their read APIs.
●
Output of consolidated data and duplicate pairs
– stored through the relevant write APIs.
●
Medical data consumers consume the data from the warehouse composed by MediCurator
through its read API.
●
The data warehouse is considered to be free from the duplicates
– False positives and false negatives.
– based on the effectiveness of the similarity matrices and similarity join algorithms used.
References
●
Xiao, C., Wang, W., Lin, X., Yu, J. X., & Wang, G. (2011). Efficient similarity joins for near-
duplicate detection. ACM Transactions on Database Systems (TODS), 36(3), 15.
●
"Kathiravelu, Pradeeban; Galhardas, Helena; Veiga, Luís; ",∂u∂u Multi-Tenanted Framework:
Distributed Near Duplicate Detection for Big Data, On the Move to Meaningful Internet
Systems: OTM 2015 Conferences, 237-256, 2015, Springer International Publishing
●
"Kathiravelu, Pradeeban; Sharma, Ashish;", MEDIator: A Data Sharing Synchronization
Platform for Heterogeneous Medical Image Archives, "Workshop on Connected Health at Big
Data Era (BigCHat'15) , co-located with 21 st ACM SIGKDD Conference on Knowledge
Discovery and Data Mining (KDD 2015)", 2015, ACM.
●
Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle
M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public
Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013,
pp 1045-1057.
●
Hazelcast for a distributed near duplicate detection.
●
Meta Data attached to the binary images in Medical Image Archives
– The Cancer Imaging Archive (TCIA)
●
●
●
●
●
●
●
●
●
●
●
●
Pradeeban Kathiravelu Ashish Sharma
Medical Imaging Data Warehouse Construction
Near Duplicate Detection for
●
Medical data warehouses and image archives are constructed by integrating multiple private
and public data sources.
●
Finding almost identical entries is crucial for warehouse construction.
●
Medical image archives are huge and consist of structured and hierarchical data, which may
be accessed by querying the metadata.
●
Existing solutions tend to be too specific.
– Master Patient Index (MPI) for patient records.
●
Multiple dimensions and attributes
– including medications, clinical, and pathological data
– should be considered for a complete duplicate detection and elimination.
●
MediCurator is a near duplicate detection framework for heterogeneous medical data
sources in constructing data warehouses.
●
MediCurator has been developed to retrieve medical data from
– various data sources, including: MySQL, MongoDB, CSV files, and
– medical image archives such as TCIA
●
MediCurator fits as part of the ETL process.
– Duplicates are detected in-memory.
– Merged data stored into data warehouses hosted in Hadoop Distributed File System
(HDFS).
MediCurator Approach
Design
Implementation
●
A prototype has been implemented.
– Hazelcast as the distributed execution framework.
– Distributed execution of research near duplicate detection algorithms on metadata.
– Speed-up of ten-folds, compared to the existing solutions such as MPI systems.
●
MediCurator functions as an integration middleware
– for data warehouse construction
– with duplicate detection and elimination
– from the raw textual medical data, or the binary data by leveraging the meta data
attached to it.
●
{pkathi2, ashish.sharma} @ emory.edu
Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA.
Acknowledgments
* Google Summer of Code 2015
* NCI U01 [1U01CA187013-01], Resources for development and validation of
Radiomic Analyses & Adaptive Therapy, Fred Prior, Ashish Sharma (UAMS, Emory)

Weitere ähnliche Inhalte

Was ist angesagt?

From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
Databricks
 
Lambda Architecture The Hive
Lambda Architecture The HiveLambda Architecture The Hive
Lambda Architecture The Hive
Altan Khendup
 

Was ist angesagt? (20)

From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
 
Data cloud lab version v.001.2020
Data cloud lab version v.001.2020Data cloud lab version v.001.2020
Data cloud lab version v.001.2020
 
Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning
 
Role of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly worksRole of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly works
 
New PID developments
New PID developmentsNew PID developments
New PID developments
 
Data mining
Data miningData mining
Data mining
 
Record matching over query results from Web Databases
Record matching over query results from Web DatabasesRecord matching over query results from Web Databases
Record matching over query results from Web Databases
 
EDI Training Module 12: An Introduction to Metadata and Data Repositories
EDI Training Module 12:  An Introduction to Metadata and Data RepositoriesEDI Training Module 12:  An Introduction to Metadata and Data Repositories
EDI Training Module 12: An Introduction to Metadata and Data Repositories
 
9 facts about statice's data anonymization solution
9 facts about statice's data anonymization solution9 facts about statice's data anonymization solution
9 facts about statice's data anonymization solution
 
The Big Metadata
The Big MetadataThe Big Metadata
The Big Metadata
 
Lambda Architecture The Hive
Lambda Architecture The HiveLambda Architecture The Hive
Lambda Architecture The Hive
 
2 Data-mining process
2   Data-mining process2   Data-mining process
2 Data-mining process
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyush
 
Role of Data Accessibility During Pandemic
Role of Data Accessibility During PandemicRole of Data Accessibility During Pandemic
Role of Data Accessibility During Pandemic
 
ORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE IndonesiaORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE Indonesia
 

Andere mochten auch

Andere mochten auch (6)

Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
Software-Defined Simulations for Continuous Development of Cloud and Data Cen...
 
CHIEF: Controller Farm for Clouds of Software-Defined Community Networks
CHIEF: Controller Farm for Clouds of Software-Defined Community NetworksCHIEF: Controller Farm for Clouds of Software-Defined Community Networks
CHIEF: Controller Farm for Clouds of Software-Defined Community Networks
 
Selective Redundancy in Network-as-a-Service: Differentiated QoS in Multi-Ten...
Selective Redundancy in Network-as-a-Service: Differentiated QoS in Multi-Ten...Selective Redundancy in Network-as-a-Service: Differentiated QoS in Multi-Ten...
Selective Redundancy in Network-as-a-Service: Differentiated QoS in Multi-Ten...
 
An Introduction to Google Summer of Code 2015
An Introduction to Google Summer of Code 2015An Introduction to Google Summer of Code 2015
An Introduction to Google Summer of Code 2015
 
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
 
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
 

Ähnlich wie Near Duplicate Detection for Medical Imaging Data Warehouse Construction

Understanding the Need of Data Integration in E Healthcare
Understanding the Need of Data Integration in E HealthcareUnderstanding the Need of Data Integration in E Healthcare
Understanding the Need of Data Integration in E Healthcare
ijtsrd
 
The FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdfThe FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdf
Alan Morrison
 
Big Data Analytics for Treatment Pathways John Cai
Big Data Analytics for Treatment Pathways John CaiBig Data Analytics for Treatment Pathways John Cai
Big Data Analytics for Treatment Pathways John Cai
John Cai
 
DATA MINING DC Presentation.pptx
DATA MINING DC Presentation.pptxDATA MINING DC Presentation.pptx
DATA MINING DC Presentation.pptx
SaravanaD2
 

Ähnlich wie Near Duplicate Detection for Medical Imaging Data Warehouse Construction (20)

dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
 
Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousing
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Understanding the Need of Data Integration in E Healthcare
Understanding the Need of Data Integration in E HealthcareUnderstanding the Need of Data Integration in E Healthcare
Understanding the Need of Data Integration in E Healthcare
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
IRJET- Analyse Big Data Electronic Health Records Database using Hadoop Cluster
IRJET- Analyse Big Data Electronic Health Records Database using Hadoop ClusterIRJET- Analyse Big Data Electronic Health Records Database using Hadoop Cluster
IRJET- Analyse Big Data Electronic Health Records Database using Hadoop Cluster
 
CLOUD-BASED DEVELOPMENT OF SMART AND CONNECTED DATA IN HEALTHCARE APPLICATION
CLOUD-BASED DEVELOPMENT OF SMART AND CONNECTED DATA IN HEALTHCARE APPLICATIONCLOUD-BASED DEVELOPMENT OF SMART AND CONNECTED DATA IN HEALTHCARE APPLICATION
CLOUD-BASED DEVELOPMENT OF SMART AND CONNECTED DATA IN HEALTHCARE APPLICATION
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
The FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdfThe FAIR data movement and 22 Feb 2023.pdf
The FAIR data movement and 22 Feb 2023.pdf
 
Repositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemRepositories in an Open Data Ecosystem
Repositories in an Open Data Ecosystem
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
Big Data Analytics for Treatment Pathways John Cai
Big Data Analytics for Treatment Pathways John CaiBig Data Analytics for Treatment Pathways John Cai
Big Data Analytics for Treatment Pathways John Cai
 
Big Data in Clinical Research
Big Data in Clinical ResearchBig Data in Clinical Research
Big Data in Clinical Research
 
MULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTION
MULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTIONMULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTION
MULTI MODEL DATA MINING APPROACH FOR HEART FAILURE PREDICTION
 
Enterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for HealthcareEnterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for Healthcare
 
DATA MINING DC Presentation.pptx
DATA MINING DC Presentation.pptxDATA MINING DC Presentation.pptx
DATA MINING DC Presentation.pptx
 

Mehr von Pradeeban Kathiravelu, Ph.D.

Mehr von Pradeeban Kathiravelu, Ph.D. (20)

Google Summer of Code_2023.pdf
Google Summer of Code_2023.pdfGoogle Summer of Code_2023.pdf
Google Summer of Code_2023.pdf
 
Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022
 
Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022
 
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
 
Google summer of code (GSoC) 2021
Google summer of code (GSoC) 2021Google summer of code (GSoC) 2021
Google summer of code (GSoC) 2021
 
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
 
Google Summer of Code (GSoC) 2020 for mentors
Google Summer of Code (GSoC) 2020 for mentorsGoogle Summer of Code (GSoC) 2020 for mentors
Google Summer of Code (GSoC) 2020 for mentors
 
Google Summer of Code (GSoC) 2020
Google Summer of Code (GSoC) 2020Google Summer of Code (GSoC) 2020
Google Summer of Code (GSoC) 2020
 
Data Services with Bindaas: RESTful Interfaces for Diverse Data Sources
Data Services with Bindaas: RESTful Interfaces for Diverse Data SourcesData Services with Bindaas: RESTful Interfaces for Diverse Data Sources
Data Services with Bindaas: RESTful Interfaces for Diverse Data Sources
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
 
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
 My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos... My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
 
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
 
UCL Ph.D. Confirmation 2018
UCL Ph.D. Confirmation 2018UCL Ph.D. Confirmation 2018
UCL Ph.D. Confirmation 2018
 
Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Software-Defined Systems for Network-Aware Service Composition and Workflow P...Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Software-Defined Systems for Network-Aware Service Composition and Workflow P...
 
Moving bits with a fleet of shared virtual routers
Moving bits with a fleet of shared virtual routersMoving bits with a fleet of shared virtual routers
Moving bits with a fleet of shared virtual routers
 
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
 
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
 
Software-Defined Inter-Cloud Composition of Big Services
Software-Defined Inter-Cloud Composition of Big ServicesSoftware-Defined Inter-Cloud Composition of Big Services
Software-Defined Inter-Cloud Composition of Big Services
 
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
 
Componentizing Big Services in the Internet
Componentizing Big Services in the InternetComponentizing Big Services in the Internet
Componentizing Big Services in the Internet
 

Kürzlich hochgeladen

Muzaffarpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Muzaffarpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetMuzaffarpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Muzaffarpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 
dhanbad Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
dhanbad Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetdhanbad Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
dhanbad Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 
raisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
raisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetraisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
raisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetJalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 
coimbatore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
coimbatore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetcoimbatore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
coimbatore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 
Mathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetMathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 
Russian Call Girls in Noida Pallavi 9711199171 High Class Call Girl Near Me
Russian Call Girls in Noida Pallavi 9711199171 High Class Call Girl Near MeRussian Call Girls in Noida Pallavi 9711199171 High Class Call Girl Near Me
Russian Call Girls in Noida Pallavi 9711199171 High Class Call Girl Near Me
mriyagarg453
 
Best Lahore Escorts 😮‍💨03250114445 || VIP escorts in Lahore
Best Lahore Escorts 😮‍💨03250114445 || VIP escorts in LahoreBest Lahore Escorts 😮‍💨03250114445 || VIP escorts in Lahore
Best Lahore Escorts 😮‍💨03250114445 || VIP escorts in Lahore
Deny Daniel
 
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetNanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Service
 
Ernakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ernakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetErnakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ernakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh
 
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
mahaiklolahd
 
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
mahaiklolahd
 

Kürzlich hochgeladen (20)

Muzaffarpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Muzaffarpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetMuzaffarpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Muzaffarpur Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
 
Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅
Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅
Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅
 
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur RajasthanJaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
 
dhanbad Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
dhanbad Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetdhanbad Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
dhanbad Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Call Girls Patiala Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Patiala Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Patiala Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Patiala Just Call 8250077686 Top Class Call Girl Service Available
 
raisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
raisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetraisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
raisen Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetJalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Jalna Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
coimbatore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
coimbatore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meetcoimbatore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
coimbatore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Call Girl Gorakhpur * 8250192130 Service starts from just ₹9999 ✅
Call Girl Gorakhpur * 8250192130 Service starts from just ₹9999 ✅Call Girl Gorakhpur * 8250192130 Service starts from just ₹9999 ✅
Call Girl Gorakhpur * 8250192130 Service starts from just ₹9999 ✅
 
Mathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetMathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mathura Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Kochi call girls Mallu escort girls available 7877702510
Kochi call girls Mallu escort girls available 7877702510Kochi call girls Mallu escort girls available 7877702510
Kochi call girls Mallu escort girls available 7877702510
 
Krishnagiri call girls Tamil aunty 7877702510
Krishnagiri call girls Tamil aunty 7877702510Krishnagiri call girls Tamil aunty 7877702510
Krishnagiri call girls Tamil aunty 7877702510
 
Russian Call Girls in Noida Pallavi 9711199171 High Class Call Girl Near Me
Russian Call Girls in Noida Pallavi 9711199171 High Class Call Girl Near MeRussian Call Girls in Noida Pallavi 9711199171 High Class Call Girl Near Me
Russian Call Girls in Noida Pallavi 9711199171 High Class Call Girl Near Me
 
Best Lahore Escorts 😮‍💨03250114445 || VIP escorts in Lahore
Best Lahore Escorts 😮‍💨03250114445 || VIP escorts in LahoreBest Lahore Escorts 😮‍💨03250114445 || VIP escorts in Lahore
Best Lahore Escorts 😮‍💨03250114445 || VIP escorts in Lahore
 
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetNanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Nanded Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
Call Now ☎ 9999965857 !! Call Girls in Hauz Khas Escort Service Delhi N.C.R.
 
Ernakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ernakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetErnakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ernakulam Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...Call Girls in Udaipur  Girija  Udaipur Call Girl  ✔ VQRWTO ❤️ 100% offer with...
Call Girls in Udaipur Girija Udaipur Call Girl ✔ VQRWTO ❤️ 100% offer with...
 
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
Call Girl in Bangalore 9632137771 {LowPrice} ❤️ (Navya) Bangalore Call Girls ...
 

Near Duplicate Detection for Medical Imaging Data Warehouse Construction

  • 1. RESEARCH POSTER PRESENTATION DESIGN © 2015 www.PosterPresentations.com Introduction Distributed Near Duplicate Detection ● Integrate medical data from various heterogeneous medical data sources and private archives using the public APIs. ● Curate the integrated data into a data warehouse for public access. ● Store the detected duplicate pairs into a separate data source. ● Duplicate detection by analyzing the potential data pairs from the original data sources, using similarity matrices for textual data. ● Hierarchical meta data attached to the binary medical data to identify, classify, and find duplicates among the binary raw data. ● Considers the inconsistencies in representation. – Usage of acronyms instead of the full form of the attributes. – Using different measurement units. ● Data is published to various data sources by the medical data publishers – through the respective write APIs of the data sources. ● Connects to the original data sources through their read APIs. ● Output of consolidated data and duplicate pairs – stored through the relevant write APIs. ● Medical data consumers consume the data from the warehouse composed by MediCurator through its read API. ● The data warehouse is considered to be free from the duplicates – False positives and false negatives. – based on the effectiveness of the similarity matrices and similarity join algorithms used. References ● Xiao, C., Wang, W., Lin, X., Yu, J. X., & Wang, G. (2011). Efficient similarity joins for near- duplicate detection. ACM Transactions on Database Systems (TODS), 36(3), 15. ● "Kathiravelu, Pradeeban; Galhardas, Helena; Veiga, Luís; ",∂u∂u Multi-Tenanted Framework: Distributed Near Duplicate Detection for Big Data, On the Move to Meaningful Internet Systems: OTM 2015 Conferences, 237-256, 2015, Springer International Publishing ● "Kathiravelu, Pradeeban; Sharma, Ashish;", MEDIator: A Data Sharing Synchronization Platform for Heterogeneous Medical Image Archives, "Workshop on Connected Health at Big Data Era (BigCHat'15) , co-located with 21 st ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2015)", 2015, ACM. ● Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository, Journal of Digital Imaging, Volume 26, Number 6, December, 2013, pp 1045-1057. ● Hazelcast for a distributed near duplicate detection. ● Meta Data attached to the binary images in Medical Image Archives – The Cancer Imaging Archive (TCIA) ● ● ● ● ● ● ● ● ● ● ● ● Pradeeban Kathiravelu Ashish Sharma Medical Imaging Data Warehouse Construction Near Duplicate Detection for ● Medical data warehouses and image archives are constructed by integrating multiple private and public data sources. ● Finding almost identical entries is crucial for warehouse construction. ● Medical image archives are huge and consist of structured and hierarchical data, which may be accessed by querying the metadata. ● Existing solutions tend to be too specific. – Master Patient Index (MPI) for patient records. ● Multiple dimensions and attributes – including medications, clinical, and pathological data – should be considered for a complete duplicate detection and elimination. ● MediCurator is a near duplicate detection framework for heterogeneous medical data sources in constructing data warehouses. ● MediCurator has been developed to retrieve medical data from – various data sources, including: MySQL, MongoDB, CSV files, and – medical image archives such as TCIA ● MediCurator fits as part of the ETL process. – Duplicates are detected in-memory. – Merged data stored into data warehouses hosted in Hadoop Distributed File System (HDFS). MediCurator Approach Design Implementation ● A prototype has been implemented. – Hazelcast as the distributed execution framework. – Distributed execution of research near duplicate detection algorithms on metadata. – Speed-up of ten-folds, compared to the existing solutions such as MPI systems. ● MediCurator functions as an integration middleware – for data warehouse construction – with duplicate detection and elimination – from the raw textual medical data, or the binary data by leveraging the meta data attached to it. ● {pkathi2, ashish.sharma} @ emory.edu Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA. Acknowledgments * Google Summer of Code 2015 * NCI U01 [1U01CA187013-01], Resources for development and validation of Radiomic Analyses & Adaptive Therapy, Fred Prior, Ashish Sharma (UAMS, Emory)