SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Modeling and Mappingforms over databases:empowering users to DESIGN databases IN INDUSTRIAL DOMAINS Dissertation Proposal October 07 2010 Ritu Khare 1
Database Design by Non-technical Users Why existing methods have not reached the industrial domains? MOTIVATION 2
Database Design By Non-technical Users  3 Our inspiration: Applications (Google Forms, FormAssembly, Zohocreator) that allow users to design databases How? Forward Engineering of User Needs into Databases ,[object Object],Very Popular for online data collection  – surveys, event organization, etc.  Not used in industrial domains! – healthcare, automobile, etc.  Patient collect  data F/W engg design VitalSigns Clinician User Designed DB
Why existing methods are unfit for industrial domains? No provision to modify or extend an existing database Translation(Forward Engineering) Method is not reported. Not tested on non-technical users Databases are required to evolve w.r.t. new user needs Data and Database Quality is important  quality leads to productivity. (Batini and Scannapieco, 2006) Users have no background in data modeling and databases 4 Existing Applications Features of Industrial Domains
Proposed System and Research Goals Opportunity: Forms Example: Form to Database Mapping Challenges in Mapping THE PROPOSAL 5
Proposed System and Research Goals 6 Proposed System: An application to model and map user needs into an existing database Goals: Modeling: “Usable” medium for users to model needs  Efficiency, Effectiveness, Adoption  Mapping: The resultant database should be high-quality, i.e. should satisfy: (Silberschatz et al. 2001, Batini and Scannapieco, 2006, Batini et al. 1992) Normalization Completeness Compactness  Correctness
Opportunity: Forms 7 MODELING: Data-entry Forms provide a good communication medium for users to specify their data collection needs. (Choobineh et al. 1988, Embley, 1989) MAPPING: Important information on databases could be retrieved by analyzing forms (Choobineh and Mannino, 1988).  Search forms provide a useful way in determining the underlying database(Benslimane, 2007) (Covered in Candidacy Exam) Data-entry forms provide key guidelines in designing a prospective database(Mannino and Choobineh, 1984).
The proposed application: An Example Patient VitalSigns design Clinician New Needs New User Designed Form Existing Database Evolved Database Form to Database Mapping 8 Form Modeling NEW PROBLEM!
Uniqueness of “Form to Database” Mapping Two structures are similar.  Mapping involves only schema elements (no values). Do not consider schema /database evolution when there are unmapped elements. Semiautomatic Mapping Discovery How to reconcile the differences in structures and semantics? How to detect the form(or need) components (including values) which already exist in the database?  Database Evolution How to extend database based on new elements in the form? How to automatically determine functional dependencies and cardinalities from a form? 9 Schema Mapping (Rahm and Bernstein 2001) Form to Database Mapping
Proposed Application 10
1. Form Design Interface 11 SIMPLE! 1. Terminology (intuitive) 2. Features(form patterns) Supporting Text Format Title Unit Category Field Subcategory Extended Checkbox option Subfield Condition Simple Form Advanced Form
1. Form Design Interface 12 Input: User actions (based on data collection needs) Output: Form Enter the Title “Patient Encounter Form” Enter the category “Patient” Enter the field “Name” Pick a format “textbox” Enter the field “Age” …
Defining High-Quality Guiding Principles(with respect to a given form) 13 Completeness Every form element has a place in database Correctness For each correspondence the form element and the database element refer to the same real-world element (has matching labels and contexts).  Compactness Every database element occurs just once.  Normalization The database is in 3NF
A Simple Approach.  14 Lose grouping information  Lose form values 3.    Heterogeneous attributes placed in same relation.  Generated database is incomplete and not in 3NF (low-quality)! So we propose a tree representation to form.
2. Tree Generation Definition: Form Tree 15 Input: Form Output: Form Tree Previous works have proposed a similar tree representation for search forms.(Dragut et al. 09, Wu et al. 09) 1) data-entry forms. 2) format nodes to improve DB quality.  3) different representation for checkboxes and radiobuttons.
Form to Database Mapping 16 Existing Database Form Tree Map and Merge??? Main challenges:  1discovering a mapping between two heterogeneous structures 2. merging new elements into existing database 3.Birthing Form Tree New Database Graph Existing Database Graph Existing Database 4. Classification MAP MERGE 5. Extension
Definition: Database Graph 17
Definition: Mapping Correspondences 18 Direct correspondence Indirect Correspondence (Value collected on form element is stored in database element)
3. Birthing(term adopted from Jagadish et al. 2007) 19 Input: Form Tree Output: New Database Graph
3. Birthing – Pattern 1 (Textbox) 20 Induced Functional Dependencies: Address.id -> line1 Address.id -> line2 Patient.id -> Name Patient.id -> Age
3. Birthing – Pattern 2: Radiobutton & Pattern 3: Checkbox 21 M:1 1:1 Checkbox values are mapped to database columns(yes/no) Represent 1:1 relationship between Patient and Symptoms Radiobutton values  are mapped to database values Represent M:1 relationship between Patient and Insurance
3. Birthing – Pattern 4: Category/subcat. Pattern 5: Sibling Categories 22 M:M M:M
3. Birthing Patterns Summarized 23
4. Database Graph Classification 24 Classify each node to see if it pre-exists in the existing database or not.i.e. to find whether it “maps” or not.  New Database Graph Existing DB Graph Existing DB
4. Database Graph ClassificationAlgorithm 25 Problem: Finding Matching Nodes between new(DGn) and existing database graph(DGe). Algorithm For each table node tnin DGn Let te be the label-matching table node in DGe If two table nodes tnand te “match”(TableMatchalgo) Tag tn i.e., mark this node as a matching/mapped node Tag all matching column and value nodes(ColumnMatchalgo) Else Rename the table
4. Database Graph ClassificationTableMatch Algorithm 26 Two table nodes “match” if Their labels match Null-value column ratio(NCR) < tolerance-threshold (efficiency consideration – minimize null value possibilities during data collection) NCR = number of unmatched columns(as per ColumnMatch) in either table (whichever is higher) / size of union set of columns in both tables
Example: NULL Value Column(NCR) Calculation 27 NCR= 2/5 =0.4 map If tolerance-threshold = 0.5(high) If tolerance-threshold = 0.3(low) When using Form1, 2 columns will have null values When using form 2, 1 columnwil have null values
4. Database Graph ClassificationColumnMatch Algorithm 28 Two non-key column nodes “match” if their Labels /names are same Data types are same Not null constraints are same Two foreign key column nodes “match” if  They both point to the same table nodes as determined by TableMatch algorithm
5. Extension of the Existing Database 29 Add unmapped tables, columns, and values
Usability Experiments Mapping Experiments Contributions Preliminary Evaluation 30 Implementation –  MySQL, JAVA, JSP, JavaScript, HTML, CSS, Lucene Indexing Package, yFiles Package
Usability Evaluation – User Study 5 nurse professionals.  No knowledge of database  Moderate computer users Familiar with Paper-based Forms 2 Tasks Build task Replicate a paper-based form on the system Model and build task  Model and build a given need (in natural language) into a form using the system interface.  2 rounds (form scale = no. of steps to design a form) Round 1: Small scale needs  Avg. form scale = 17 Generated Avg. 4.2 relations, 5.8 non-key attributes, 1.8 values, and 3.2 foreign key references Round 2: Large scale needs  Avg. form scale 47.4 Generated Avg. 6.2 relations, 13.8 attributes, 10.4 values, and 4.6 foreign key references 31 Participants and Tasks Study Settings
32 MEASUREMENTS Duration Ratio =  Time(in min)/  Form Scale(#of steps to build form) Assistance Ratio = # of assistances sought/  Form Scale(#of steps to build form) Outliers:  P3: considered design alternatives(high duration ratio) P5: had difficulty in form terminology(needed more assistance)
Findings Effectiveness: In 19/20 cases, participants finished the tasks with 100% effectiveness.  The unsuccessful case: a building error committed by a participant who skipped a component while building forms. Efficiency: Duration  ranged from 1 to 9 minutes for simple small-scale needs, and 7 to 19 minutes for advanced long-scale needs.  Exception: A participant who considered several design alternatives . System Adoption Efficiency : consistently improved from round 1 to round 2.  Confidence:  Very confident for specifying small-scale needs for both the tasks.  Improved from round 1 to round 2 for  the build task. Did not improve for model-and-build task,  from round 1 to round 2.  Understanding: improved greatly in round 2. They started synthesizing their knowledge of form concepts and domain knowledge to consider different design alternatives.  33 Comparison with a Related Work Appforge (Yang et al. 2008): Users are required to create forms and expressive views and are exposed to the existing schema.  In our work, users only create forms and mapping is handled by system.
Mapping Experiment Set 1 Experiments on 5 industrial domains.  For each domain, Designed certain forms and used the mapping algorithms to evolve a database.  34 ,[object Object],+ indicates extra element ,[object Object],No sign indicates perfect match
Analyzing Inaccuracies and System Enhancement  35 M:M M:M Added another layer of interaction : to disambiguate cardinality between 2 entities.  Result: All the databases were identical to respective gold standard databases.  Inference: The mapping algorithms have the ability to generate databases in industrial domains.
Mapping Experiment Set 2 36 For each domain Performed mapping experiments with at least 5 different sequences of forms (representing diff. merging situations. ) Result: All the databases generated from different sequences are identical to each other and to the gold standard databases.  Inference: The mapping algorithms have the ability to evolve databases in industrial domains in  a variety of merging situations
Current and Predicted Contributions 37 Introducing the Form to Database Mapping Algorithms driven by data-quality principles Mapping experiments on 5 domains System has the potential  to generate high-quality databases in industrial settings solely based on user-designed forms and user-provided domain knowledge. to evolve existing databases in a variety of merging situations.  Usability Study System has the potential to be adopted by non-technical users while providing them efficiency and effectiveness in form modeling.
Possible Research Experiments Other Research Areas/System Refinement Plan for Thesis Completion What Next? 38
Possible Research Experiments(in healthcare domain) Have multiple clinicians evolve a new database using diff. forms representing diff. kinds of information.  Alter Form and Database Complexity.  Guided Vs unguided 39 Experiment Scenario 1 Experiment Scenario 2 ,[object Object]
Alter Form and Database Complexity
Guided Vs unguided

Weitere ähnliche Inhalte

Was ist angesagt?

06 si(systems analysis and design )
06 si(systems analysis and design )06 si(systems analysis and design )
06 si(systems analysis and design )Nurdin Al-Azies
 
09 si(systems analysis and design )
09 si(systems analysis and design )09 si(systems analysis and design )
09 si(systems analysis and design )Nurdin Al-Azies
 
01 si(systems analysis and design )
01 si(systems analysis and design )01 si(systems analysis and design )
01 si(systems analysis and design )Nurdin Al-Azies
 
14 si(systems analysis and design )
14 si(systems analysis and design )14 si(systems analysis and design )
14 si(systems analysis and design )Nurdin Al-Azies
 
05 si(systems analysis and design )
05 si(systems analysis and design )05 si(systems analysis and design )
05 si(systems analysis and design )Nurdin Al-Azies
 
02 si(systems analysis and design )
02 si(systems analysis and design )02 si(systems analysis and design )
02 si(systems analysis and design )Nurdin Al-Azies
 
Chapter10 conceptual data modeling
Chapter10 conceptual data modelingChapter10 conceptual data modeling
Chapter10 conceptual data modelingDhani Ahmad
 
Domain class model
Domain class modelDomain class model
Domain class modelshekharsj
 
04 si(systems analysis and design )
04 si(systems analysis and design )04 si(systems analysis and design )
04 si(systems analysis and design )Nurdin Al-Azies
 
Development of Design Structure Matrix of Product Architecture Case Study: Mu...
Development of Design Structure Matrix of Product Architecture Case Study: Mu...Development of Design Structure Matrix of Product Architecture Case Study: Mu...
Development of Design Structure Matrix of Product Architecture Case Study: Mu...Irfan Hilmy
 
ME/R model: A New approach of Data Warehouse Schema Design
ME/R model: A New approach of Data Warehouse Schema DesignME/R model: A New approach of Data Warehouse Schema Design
ME/R model: A New approach of Data Warehouse Schema Designidescitation
 
Comp10 unit3b lecture_slides
Comp10 unit3b lecture_slidesComp10 unit3b lecture_slides
Comp10 unit3b lecture_slidesCMDLMS
 
Syllabus mca 2 rdbms i
Syllabus mca 2 rdbms iSyllabus mca 2 rdbms i
Syllabus mca 2 rdbms iemailharmeet
 
Sadcw 7e chapter01-done
Sadcw 7e chapter01-doneSadcw 7e chapter01-done
Sadcw 7e chapter01-doneLamineKaba6
 
The Traditional Approach to Requirement
The Traditional Approach to RequirementThe Traditional Approach to Requirement
The Traditional Approach to RequirementHenhen Lukmana
 

Was ist angesagt? (20)

06 si(systems analysis and design )
06 si(systems analysis and design )06 si(systems analysis and design )
06 si(systems analysis and design )
 
09 si(systems analysis and design )
09 si(systems analysis and design )09 si(systems analysis and design )
09 si(systems analysis and design )
 
spss Help
spss Helpspss Help
spss Help
 
Physical Design and Development
Physical Design and DevelopmentPhysical Design and Development
Physical Design and Development
 
01 si(systems analysis and design )
01 si(systems analysis and design )01 si(systems analysis and design )
01 si(systems analysis and design )
 
14 si(systems analysis and design )
14 si(systems analysis and design )14 si(systems analysis and design )
14 si(systems analysis and design )
 
05 si(systems analysis and design )
05 si(systems analysis and design )05 si(systems analysis and design )
05 si(systems analysis and design )
 
02 si(systems analysis and design )
02 si(systems analysis and design )02 si(systems analysis and design )
02 si(systems analysis and design )
 
Data models
Data modelsData models
Data models
 
Chapter10 conceptual data modeling
Chapter10 conceptual data modelingChapter10 conceptual data modeling
Chapter10 conceptual data modeling
 
Domain class model
Domain class modelDomain class model
Domain class model
 
04 si(systems analysis and design )
04 si(systems analysis and design )04 si(systems analysis and design )
04 si(systems analysis and design )
 
Development of Design Structure Matrix of Product Architecture Case Study: Mu...
Development of Design Structure Matrix of Product Architecture Case Study: Mu...Development of Design Structure Matrix of Product Architecture Case Study: Mu...
Development of Design Structure Matrix of Product Architecture Case Study: Mu...
 
Data models
Data modelsData models
Data models
 
ME/R model: A New approach of Data Warehouse Schema Design
ME/R model: A New approach of Data Warehouse Schema DesignME/R model: A New approach of Data Warehouse Schema Design
ME/R model: A New approach of Data Warehouse Schema Design
 
Comp10 unit3b lecture_slides
Comp10 unit3b lecture_slidesComp10 unit3b lecture_slides
Comp10 unit3b lecture_slides
 
Syllabus mca 2 rdbms i
Syllabus mca 2 rdbms iSyllabus mca 2 rdbms i
Syllabus mca 2 rdbms i
 
Data models
Data modelsData models
Data models
 
Sadcw 7e chapter01-done
Sadcw 7e chapter01-doneSadcw 7e chapter01-done
Sadcw 7e chapter01-done
 
The Traditional Approach to Requirement
The Traditional Approach to RequirementThe Traditional Approach to Requirement
The Traditional Approach to Requirement
 

Andere mochten auch

What is that beautiful house?
What is that beautiful house?What is that beautiful house?
What is that beautiful house?GeorginaSV
 
PHP Documentation APIs on the fly
PHP Documentation APIs on the flyPHP Documentation APIs on the fly
PHP Documentation APIs on the flyAntonio Musarra
 
5 tips on how to select a prom for your study presentation notes
5 tips on how to select a prom for your study   presentation notes5 tips on how to select a prom for your study   presentation notes
5 tips on how to select a prom for your study presentation notesKeith Meadows
 
Why Community Managers Won't Exist in 5 Years (and why that's a good thing)
Why Community Managers Won't Exist in 5 Years (and why that's a good thing)Why Community Managers Won't Exist in 5 Years (and why that's a good thing)
Why Community Managers Won't Exist in 5 Years (and why that's a good thing)Evan Hamilton
 
PRO Workshop - Selecting the appropriate PRO for your clinical study
PRO Workshop - Selecting the appropriate PRO for your clinical studyPRO Workshop - Selecting the appropriate PRO for your clinical study
PRO Workshop - Selecting the appropriate PRO for your clinical studyKeith Meadows
 

Andere mochten auch (6)

What is that beautiful house?
What is that beautiful house?What is that beautiful house?
What is that beautiful house?
 
PHP Documentation APIs on the fly
PHP Documentation APIs on the flyPHP Documentation APIs on the fly
PHP Documentation APIs on the fly
 
5 tips on how to select a prom for your study presentation notes
5 tips on how to select a prom for your study   presentation notes5 tips on how to select a prom for your study   presentation notes
5 tips on how to select a prom for your study presentation notes
 
Why Community Managers Won't Exist in 5 Years (and why that's a good thing)
Why Community Managers Won't Exist in 5 Years (and why that's a good thing)Why Community Managers Won't Exist in 5 Years (and why that's a good thing)
Why Community Managers Won't Exist in 5 Years (and why that's a good thing)
 
PRO Workshop - Selecting the appropriate PRO for your clinical study
PRO Workshop - Selecting the appropriate PRO for your clinical studyPRO Workshop - Selecting the appropriate PRO for your clinical study
PRO Workshop - Selecting the appropriate PRO for your clinical study
 
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job? Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
 

Ähnlich wie Dissertation Proposal Presentation

Related Worksheets
Related WorksheetsRelated Worksheets
Related WorksheetsEirik Bakke
 
Third AssignmentDescribe in 100 – 200 words an application with .docx
Third AssignmentDescribe in 100 – 200 words an application with .docxThird AssignmentDescribe in 100 – 200 words an application with .docx
Third AssignmentDescribe in 100 – 200 words an application with .docxrandymartin91030
 
A relational model of data for large shared data banks
A relational model of data for large shared data banksA relational model of data for large shared data banks
A relational model of data for large shared data banksSammy Alvarez
 
EMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval
EMR: A Scalable Graph-based Ranking Model for Content-based Image RetrievalEMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval
EMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval1crore projects
 
Emr a scalable graph based ranking model for content-based image retrieval
Emr a scalable graph based ranking model for content-based image retrievalEmr a scalable graph based ranking model for content-based image retrieval
Emr a scalable graph based ranking model for content-based image retrievalPvrtechnologies Nellore
 
data-spread-demo
data-spread-demodata-spread-demo
data-spread-demoBofan Sun
 
ICT DBA3 03 0710 Designing a Database.pptx
ICT DBA3 03 0710 Designing a Database.pptxICT DBA3 03 0710 Designing a Database.pptx
ICT DBA3 03 0710 Designing a Database.pptxInfotech27
 
Claremont Report on Database Research: Research Directions (Le Gruenwald)
Claremont Report on Database Research: Research Directions (Le Gruenwald)Claremont Report on Database Research: Research Directions (Le Gruenwald)
Claremont Report on Database Research: Research Directions (Le Gruenwald)infoblog
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxiamultapromax
 
Iare ds lecture_notes_2
Iare ds lecture_notes_2Iare ds lecture_notes_2
Iare ds lecture_notes_2RajSingh734307
 
IRJET- Techniques for Detecting and Extracting Tabular Data from PDFs and Sca...
IRJET- Techniques for Detecting and Extracting Tabular Data from PDFs and Sca...IRJET- Techniques for Detecting and Extracting Tabular Data from PDFs and Sca...
IRJET- Techniques for Detecting and Extracting Tabular Data from PDFs and Sca...IRJET Journal
 
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES cscpconf
 
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIESENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIEScsandit
 

Ähnlich wie Dissertation Proposal Presentation (20)

Related Worksheets
Related WorksheetsRelated Worksheets
Related Worksheets
 
Third AssignmentDescribe in 100 – 200 words an application with .docx
Third AssignmentDescribe in 100 – 200 words an application with .docxThird AssignmentDescribe in 100 – 200 words an application with .docx
Third AssignmentDescribe in 100 – 200 words an application with .docx
 
Ch09
Ch09Ch09
Ch09
 
System design
System designSystem design
System design
 
Drexel University Computing Academy - iSchool Research Presentation
Drexel University Computing Academy - iSchool Research PresentationDrexel University Computing Academy - iSchool Research Presentation
Drexel University Computing Academy - iSchool Research Presentation
 
A relational model of data for large shared data banks
A relational model of data for large shared data banksA relational model of data for large shared data banks
A relational model of data for large shared data banks
 
EMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval
EMR: A Scalable Graph-based Ranking Model for Content-based Image RetrievalEMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval
EMR: A Scalable Graph-based Ranking Model for Content-based Image Retrieval
 
Emr a scalable graph based ranking model for content-based image retrieval
Emr a scalable graph based ranking model for content-based image retrievalEmr a scalable graph based ranking model for content-based image retrieval
Emr a scalable graph based ranking model for content-based image retrieval
 
data-spread-demo
data-spread-demodata-spread-demo
data-spread-demo
 
Database aggregation using metadata
Database aggregation using metadataDatabase aggregation using metadata
Database aggregation using metadata
 
ICT DBA3 03 0710 Designing a Database.pptx
ICT DBA3 03 0710 Designing a Database.pptxICT DBA3 03 0710 Designing a Database.pptx
ICT DBA3 03 0710 Designing a Database.pptx
 
T6
T6T6
T6
 
Claremont Report on Database Research: Research Directions (Le Gruenwald)
Claremont Report on Database Research: Research Directions (Le Gruenwald)Claremont Report on Database Research: Research Directions (Le Gruenwald)
Claremont Report on Database Research: Research Directions (Le Gruenwald)
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
EE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptxEE-232-LEC-01 Data_structures.pptx
EE-232-LEC-01 Data_structures.pptx
 
Iare ds lecture_notes_2
Iare ds lecture_notes_2Iare ds lecture_notes_2
Iare ds lecture_notes_2
 
IRJET- Techniques for Detecting and Extracting Tabular Data from PDFs and Sca...
IRJET- Techniques for Detecting and Extracting Tabular Data from PDFs and Sca...IRJET- Techniques for Detecting and Extracting Tabular Data from PDFs and Sca...
IRJET- Techniques for Detecting and Extracting Tabular Data from PDFs and Sca...
 
Data models
Data modelsData models
Data models
 
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
 
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIESENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
 

Dissertation Proposal Presentation

  • 1. Modeling and Mappingforms over databases:empowering users to DESIGN databases IN INDUSTRIAL DOMAINS Dissertation Proposal October 07 2010 Ritu Khare 1
  • 2. Database Design by Non-technical Users Why existing methods have not reached the industrial domains? MOTIVATION 2
  • 3.
  • 4. Why existing methods are unfit for industrial domains? No provision to modify or extend an existing database Translation(Forward Engineering) Method is not reported. Not tested on non-technical users Databases are required to evolve w.r.t. new user needs Data and Database Quality is important quality leads to productivity. (Batini and Scannapieco, 2006) Users have no background in data modeling and databases 4 Existing Applications Features of Industrial Domains
  • 5. Proposed System and Research Goals Opportunity: Forms Example: Form to Database Mapping Challenges in Mapping THE PROPOSAL 5
  • 6. Proposed System and Research Goals 6 Proposed System: An application to model and map user needs into an existing database Goals: Modeling: “Usable” medium for users to model needs Efficiency, Effectiveness, Adoption Mapping: The resultant database should be high-quality, i.e. should satisfy: (Silberschatz et al. 2001, Batini and Scannapieco, 2006, Batini et al. 1992) Normalization Completeness Compactness Correctness
  • 7. Opportunity: Forms 7 MODELING: Data-entry Forms provide a good communication medium for users to specify their data collection needs. (Choobineh et al. 1988, Embley, 1989) MAPPING: Important information on databases could be retrieved by analyzing forms (Choobineh and Mannino, 1988). Search forms provide a useful way in determining the underlying database(Benslimane, 2007) (Covered in Candidacy Exam) Data-entry forms provide key guidelines in designing a prospective database(Mannino and Choobineh, 1984).
  • 8. The proposed application: An Example Patient VitalSigns design Clinician New Needs New User Designed Form Existing Database Evolved Database Form to Database Mapping 8 Form Modeling NEW PROBLEM!
  • 9. Uniqueness of “Form to Database” Mapping Two structures are similar. Mapping involves only schema elements (no values). Do not consider schema /database evolution when there are unmapped elements. Semiautomatic Mapping Discovery How to reconcile the differences in structures and semantics? How to detect the form(or need) components (including values) which already exist in the database? Database Evolution How to extend database based on new elements in the form? How to automatically determine functional dependencies and cardinalities from a form? 9 Schema Mapping (Rahm and Bernstein 2001) Form to Database Mapping
  • 11. 1. Form Design Interface 11 SIMPLE! 1. Terminology (intuitive) 2. Features(form patterns) Supporting Text Format Title Unit Category Field Subcategory Extended Checkbox option Subfield Condition Simple Form Advanced Form
  • 12. 1. Form Design Interface 12 Input: User actions (based on data collection needs) Output: Form Enter the Title “Patient Encounter Form” Enter the category “Patient” Enter the field “Name” Pick a format “textbox” Enter the field “Age” …
  • 13. Defining High-Quality Guiding Principles(with respect to a given form) 13 Completeness Every form element has a place in database Correctness For each correspondence the form element and the database element refer to the same real-world element (has matching labels and contexts). Compactness Every database element occurs just once. Normalization The database is in 3NF
  • 14. A Simple Approach. 14 Lose grouping information Lose form values 3. Heterogeneous attributes placed in same relation. Generated database is incomplete and not in 3NF (low-quality)! So we propose a tree representation to form.
  • 15. 2. Tree Generation Definition: Form Tree 15 Input: Form Output: Form Tree Previous works have proposed a similar tree representation for search forms.(Dragut et al. 09, Wu et al. 09) 1) data-entry forms. 2) format nodes to improve DB quality. 3) different representation for checkboxes and radiobuttons.
  • 16. Form to Database Mapping 16 Existing Database Form Tree Map and Merge??? Main challenges: 1discovering a mapping between two heterogeneous structures 2. merging new elements into existing database 3.Birthing Form Tree New Database Graph Existing Database Graph Existing Database 4. Classification MAP MERGE 5. Extension
  • 18. Definition: Mapping Correspondences 18 Direct correspondence Indirect Correspondence (Value collected on form element is stored in database element)
  • 19. 3. Birthing(term adopted from Jagadish et al. 2007) 19 Input: Form Tree Output: New Database Graph
  • 20. 3. Birthing – Pattern 1 (Textbox) 20 Induced Functional Dependencies: Address.id -> line1 Address.id -> line2 Patient.id -> Name Patient.id -> Age
  • 21. 3. Birthing – Pattern 2: Radiobutton & Pattern 3: Checkbox 21 M:1 1:1 Checkbox values are mapped to database columns(yes/no) Represent 1:1 relationship between Patient and Symptoms Radiobutton values are mapped to database values Represent M:1 relationship between Patient and Insurance
  • 22. 3. Birthing – Pattern 4: Category/subcat. Pattern 5: Sibling Categories 22 M:M M:M
  • 23. 3. Birthing Patterns Summarized 23
  • 24. 4. Database Graph Classification 24 Classify each node to see if it pre-exists in the existing database or not.i.e. to find whether it “maps” or not. New Database Graph Existing DB Graph Existing DB
  • 25. 4. Database Graph ClassificationAlgorithm 25 Problem: Finding Matching Nodes between new(DGn) and existing database graph(DGe). Algorithm For each table node tnin DGn Let te be the label-matching table node in DGe If two table nodes tnand te “match”(TableMatchalgo) Tag tn i.e., mark this node as a matching/mapped node Tag all matching column and value nodes(ColumnMatchalgo) Else Rename the table
  • 26. 4. Database Graph ClassificationTableMatch Algorithm 26 Two table nodes “match” if Their labels match Null-value column ratio(NCR) < tolerance-threshold (efficiency consideration – minimize null value possibilities during data collection) NCR = number of unmatched columns(as per ColumnMatch) in either table (whichever is higher) / size of union set of columns in both tables
  • 27. Example: NULL Value Column(NCR) Calculation 27 NCR= 2/5 =0.4 map If tolerance-threshold = 0.5(high) If tolerance-threshold = 0.3(low) When using Form1, 2 columns will have null values When using form 2, 1 columnwil have null values
  • 28. 4. Database Graph ClassificationColumnMatch Algorithm 28 Two non-key column nodes “match” if their Labels /names are same Data types are same Not null constraints are same Two foreign key column nodes “match” if They both point to the same table nodes as determined by TableMatch algorithm
  • 29. 5. Extension of the Existing Database 29 Add unmapped tables, columns, and values
  • 30. Usability Experiments Mapping Experiments Contributions Preliminary Evaluation 30 Implementation – MySQL, JAVA, JSP, JavaScript, HTML, CSS, Lucene Indexing Package, yFiles Package
  • 31. Usability Evaluation – User Study 5 nurse professionals. No knowledge of database Moderate computer users Familiar with Paper-based Forms 2 Tasks Build task Replicate a paper-based form on the system Model and build task Model and build a given need (in natural language) into a form using the system interface. 2 rounds (form scale = no. of steps to design a form) Round 1: Small scale needs Avg. form scale = 17 Generated Avg. 4.2 relations, 5.8 non-key attributes, 1.8 values, and 3.2 foreign key references Round 2: Large scale needs Avg. form scale 47.4 Generated Avg. 6.2 relations, 13.8 attributes, 10.4 values, and 4.6 foreign key references 31 Participants and Tasks Study Settings
  • 32. 32 MEASUREMENTS Duration Ratio = Time(in min)/ Form Scale(#of steps to build form) Assistance Ratio = # of assistances sought/ Form Scale(#of steps to build form) Outliers: P3: considered design alternatives(high duration ratio) P5: had difficulty in form terminology(needed more assistance)
  • 33. Findings Effectiveness: In 19/20 cases, participants finished the tasks with 100% effectiveness. The unsuccessful case: a building error committed by a participant who skipped a component while building forms. Efficiency: Duration ranged from 1 to 9 minutes for simple small-scale needs, and 7 to 19 minutes for advanced long-scale needs. Exception: A participant who considered several design alternatives . System Adoption Efficiency : consistently improved from round 1 to round 2. Confidence: Very confident for specifying small-scale needs for both the tasks. Improved from round 1 to round 2 for the build task. Did not improve for model-and-build task, from round 1 to round 2. Understanding: improved greatly in round 2. They started synthesizing their knowledge of form concepts and domain knowledge to consider different design alternatives. 33 Comparison with a Related Work Appforge (Yang et al. 2008): Users are required to create forms and expressive views and are exposed to the existing schema. In our work, users only create forms and mapping is handled by system.
  • 34.
  • 35. Analyzing Inaccuracies and System Enhancement 35 M:M M:M Added another layer of interaction : to disambiguate cardinality between 2 entities. Result: All the databases were identical to respective gold standard databases. Inference: The mapping algorithms have the ability to generate databases in industrial domains.
  • 36. Mapping Experiment Set 2 36 For each domain Performed mapping experiments with at least 5 different sequences of forms (representing diff. merging situations. ) Result: All the databases generated from different sequences are identical to each other and to the gold standard databases. Inference: The mapping algorithms have the ability to evolve databases in industrial domains in a variety of merging situations
  • 37. Current and Predicted Contributions 37 Introducing the Form to Database Mapping Algorithms driven by data-quality principles Mapping experiments on 5 domains System has the potential to generate high-quality databases in industrial settings solely based on user-designed forms and user-provided domain knowledge. to evolve existing databases in a variety of merging situations. Usability Study System has the potential to be adopted by non-technical users while providing them efficiency and effectiveness in form modeling.
  • 38. Possible Research Experiments Other Research Areas/System Refinement Plan for Thesis Completion What Next? 38
  • 39.
  • 40. Alter Form and Database Complexity
  • 42. Guided: user is provided with specific needs.
  • 43.
  • 44. Plan for Dissertation Completion 41 Thank you!

Hinweis der Redaktion

  1. Lets start with the motivation of this work.
  2. Using the Form Design Interface, users can design simple as well as advanced forms. To make it usable for non-tech. users, we have kept the interface simple in terms of terminology as well as design. Terminology means – terms used are simple and commonplace – features supported are present in various data-entry forms – which users might already be familiar with. E.g. terms used are