SlideShare ist ein Scribd-Unternehmen logo
1 von 45
A Framework for Mapping
                  User-designed Forms
                 to Relational Databases
                     Dissertation Presentation
                        November 15 2011
                            Ritu Khare
COMMITTEE :
Dr. Yuan An (Chair)
Dr. Jiexun Jason Li
Dr. Il-Yeol Song
Dr. Min Song
Dr. Christopher C. Yang
1
Presentation Order
    1.   Motivation
    2.   Problems
    3.   Solutions
    4.   Evaluation
    5.   Final Remarks



2
1. Motivation




3
General Motivation: Database Usability (Sawyer, 1995)
 Enable users to SEARCH and                      Enable users to DESIGN
    QUERY databases                                databases. (Jagadish et al. 2007)
     Information Retrieval                         Form-based DIY and WYSIWYG
      Techniques (Liu et al, 2006, Hristidis         paradigms
      et al., 2003, Catarci, 2000, Jayapandian      FormAssembly, ZohoCreator,
      and Jagadish, 2006)                            GoogleForms




          Databases still remain unusable from the integration point of view
          (Gurses et al., 2009)
4
Precise Motivation: Integration of New Needs




   New
  needs
related to               1) Building of new forms
 patient’s
  social                 2) Integration of new form
  habits                   into back-end
5
Research Objective
     To develop a mechanism to automatically map
     and integrate a user-designed form into
     existing structured database.
      Assume that a user-designed form is
       already acquired
      Seek a framework that
        merges   the semantically matching elements
         between forms and databases.
        creates new database elements corresponding to
         the unmatched form elements.

6
2. Research Problems




7
A form template represents the
    semantic intentions of the designer   Problem #1 : Form Understanding


                                                    Existing Work

                                                     Focus on Search Forms
                                                     (Benslimane, et al. 2007, Kaljuviee
                                                     et al., 2001)
                                                       shorter and simpler than the
                                                       data-entry forms. (empirical
                                                       finding)
                                                     Rules       and       heuristics
                                                     (Zhang et al. 2004, He et al., 2007)
    Automatic Extraction of the form semantics        not likely to circumvent the
    Machine can only read the syntactic patterns      ever broadening varieties in
     of form elements. A certain layout pattern        form topologies
     cannot be associated with a semantic
     intention.
8
Problem#2: Correspondence Discovery


                                                          Existing Work

                                      Schema and Ontology
                                       Mapping (Madhavan et al., 2001,
 Detect semantically matching         Euzenat and Shvaiko, 2005, Rahm and
                                       Bernstein, 2001, An et al. 2005, An et al. 2006)
  elements between a form and           Mostly semi-automatic
  an existing database                Not applicable to form to
 Challenges                           database correspondence
                                       discovery
   Variety of terms to denote the      Heterogeneity between forms and
     same concepts.                      databases
                                        Correspondences are to be used for
   Variety of concepts denoted          evolving the database; the discovery
                                         process has to keep this requirement
     by similar terms                    into consideration.
   Identify and eliminate the
     invalid correspondences.
9
Problem# 3: Form Integration
 Problem#3a: Merging                 Existing Work
  Merging into an existing         Form integration         (Yang et al.,
   database so that the same         2008)
   concept is not duplicated and       largely manual
   the database remains                expose the users to the technical
   compact.                                details of the underlying data
  Merging increases the                   model.
   potential of having NULL         Database integration          (Yang et al.
   values, i.e., less optimized    2003)
   database.                              provide guidelines.
  Judicious Decisions




10
Problem# 3: Form Integration
     Problem#3b: Birthing
                                  Existing Work:
  Extend the database for
                                      Form-based database design
     the unmatched form                 Several methods (Choobineh et al.
                                           1988, Pavicevic et al, 2006, Choobeneh and
     elements                              Venkatraman, 1992, Deklarit, 2008) and
                                           commercial tools (Form assembly,
      How to automatically                google forms, zohocreator, wufoo)
                                        No empirical evaluation of the
       derive the functional            resultant databases
       dependencies among the       Few focus on designing a database
                                     with certain desirable properties,
       form elements?                e.g., expressiveness (Yang et al, 2008,
                                       Choobineh et al., 1988, Lukovic, et al 2007).
      How to translate the             These properties do not reflect
       complex form patterns?              any compliance with the form
                                           semantics and are inadequate
      How to evaluate multiple            for evaluating the mapping
                                           process.
       design alternatives &
       pick one?
11
Research Questions and System Goals
     1. Form Understanding
                                              System Goals:
           A model to capture the form       1.   To evolve a DB that is high-
            semantics                              quality and optimized as per
           Extract this model from a given        the form semantics, i.e.,
                                                   compliant to the principles
            form                                       (Wang and Strong, 1996,
                                                       Ramakrishnan and Gehrke, 2002,
     2. Correspondence Discovery                       Silberschatz, et al., 2001, Batini and
                                                       Scannapieco, 2006):
           Determine          semantically
                                                         Completeness: All form
            equivalent elements b/w form &                elements represented in
            database                                      database
           Incorporate    DB     evolution              Correctness: Form
                                                          semantics retained:
            requirement during discovery                 Compactness: Equivalent
            process                                       elements merged
     3. Form Integration                                 Normalization: 3NF w.r.t.
                                                          form’s functional
           Resolve merging conflicts while               dependencies
            maintaining the original form                Minimize NULL values in
            semantics                                     FKs and Descriptive
                                                          attributes
           Given a form pattern, derive a
                                              2.       To ensure minimalism in the
            relational     database    with            required user intervention
12
            “desirable” properties
3. Solutions




13
Form Representation: Form Tree
      The form tree accurately captures the designer's intentions, and
       hence the semantic associations among the form elements.
      Inspired by hierarchical modeling of forms in existing works
       (Dragut et al. 2009, Wu et al. 2009)




14
Framework Outline

                            Form
                        Understanding     Form Tree
                        and Semantics
                          Extraction


                                        Correspondenc
       Form Tree with                   e Discovery and
         Discovered                        Validation
      Correspondences



         Database
        Design and                         Database
         Evolution
15
Method 1a: Form Tree Generation




16
Method 1a: Form Tree Generation



                  I. Tag and                   2.    Derive Tree
                Segment Phase                       Phase(5 rules)




       The approach leverages the probabilistic nature of form design
        and develops a         2-layered Hidden Markov Model (HMM)
        based artificial designer that has the ability to understand the
        semantics of any arbitrarily designed form.
         T-HMM: Tagging HMM
         S-HMM-Segmentation T-HMM
17
Method 1b: Form Term Annotation
      Refine semantics by annotating terms
        Systematized Nomenclature of Medicine  Challenge: Same form term can be
          Clinical Terms (SNOMED CT) comprising      specified in multiple contexts, i.e.,
          360,000 concepts belonging to various      semantic categories. The key is to identify
          semantic categories.                       the semantic category for a given term.
                                                    We hypothesize that the term context can
     ConceptID     Description   Semantic Category   be derived from the structure of the form
                                                     tree.
     0231832      Respiratory Rate    Observable Entity

     362508001    Both eyes, entire   Body Structure




18
Method 1b: Form term annotation
        Form Tree

                                      SNOMED CT      Choose the
 Form    Structure   Classification                  best match       SNOMED
 Term                                                                 CT
         Analyzer       Model         Semantic      concept from
                                                    this category     Concept
                                      category




                                                  SNOMED CT search service




19
Method 2: Correspondence Discovery and Validation

                   Linguistic   Exact Concept
                   Matching       Matching


                       1




                       2


20
Total Heuristics = 4
 Method 2: Validation Algorithm
          Past Medical     X
                           
            History                 History
                           X
                                   Id HPI Medications SocialHistory
 Family
  Hx
            History of
                         Meds
                                X
                                
             present
             Illness




                                             Oral
                                                             
                                            Hygiene                      Appetite
                                                                         Id Options
                                    radio                               1   Good
                                                                         2   Fair
                                                                 
                                        good          poor               3   Poor


                                                                         Look-up table


21
Method 3: Database Design and Evolution



     1                  2




                       3


22
Method 3a: Birthing Algorithm                              Total Patterns = 12

 Principles: High Quality(Complete, Correct, Compact, Normalized) and
     Optimization (minimize NULLs)
 Traverses the form tree in depth first order

                                            M:1




                            Tj.ID -> Tj.c

                                                  Radiobutton Pattern
             Textbox Pattern




           Category/subcategory
                 Pattern                                            Extended RB
                                                                       Pattern


23
Method 3a: Birthing Algorithm
                                   Sibling
                                 categories
                                  pattern




           Textbox
           pattern

                                            Category-
                                          subcat. pattern




                                                Textbox
24           Radiobutton   Checkbox             pattern
               pattern      pattern
Method 3: Database Design and Evolution




     1                  2




                        3


25
Tot. merging scenarios = 8
 Method 3b: Merging Algorithm
                                          Compactness Factor(CF): A
  Each merger involves a trade-off
                                            configurable value (0,1) that indicates
     between compactness and                the weightage given to compactness
     optimization (min. NULL values)
                                          Null Value Ratio(NVR): A calculated
     principles.
                                            value that indicates the potential of
                                            having NULL values in a given table.

                               New DB             Existing DB




                                         NVR = 2/5=0.4

              Case a: CF=0.5                                        Case b: CF=0.3
                                       Final DB
               (CF>NVR)                                              (CF<=NVR)



26
             More Compact                                More Optimized
4. Evaluation




27
System Goals: Principle Compliance & Min. Interventions
     Evaluation Goals:                                    Java, Tomcat,
     A. How well the system meets the goals?                          MySQL Server,
                                                                      yFiles, JSP
     B. Impact of framework in accomplishing the goals ?

                                                 EM & Viterbi,
                                                 cross-
                                 HMM-based tree validation
                                    extraction
                                  SNOMED CT                 Form Tree
                                 Term Annotation                           Linguistic
                                 Naïve Bayes Classifier,                    Similarity
                                 Top-4 classes, SnAPI,                  =Lucene’s Default
                                 Cross-validation   per      Corr.          Settings
           Form Tree with        dataset                   Discovery
            Discovered                                     Validation
          Correspondences                                  Algorithm



               Birthing
              Algorithm                                    Database
              Merging
28            Algorithm CF=0.7
Data
    (52 real world forms from 6 medical institutions)
    Healthcare : Forms are prevalent, and Information systems are unusable and inflexible.
            Dataset                     Avg.       Avg.     SNOMED
                                        Terms      Inputs   CT
                                                            Mappability
        1   Walk in clinic encounter    32.33      49.33    75.77 %
            forms (3 forms)                                               Gold Benchmarks
        2   Nursing patient             17.17      33       63.98%      52 Gold Std Trees
            admission forms (6                                          (using a DIY interface that
            forms)                                                      captures designers’ on-
                                                                        the-fly semantic decisions)
        3   Labor & delivery DB data- 16.14        37.29    58.8 %
            entry forms (7 forms)                                       Gold Std Annotations
                                                                        (4235 form terms were
        4   Adult visit encounter       47.83      65.22    56.2%       manually studied & 2506
            forms                                                       (59%) had corr. concept in
                                                                        SNOMED CT)
            (18 forms)
        5   Family practice forms       82.61      100.46   59.38%      3 pairs of Gold DBs
                                                                        (3 datasets were given to
            (13 forms)                                                  2 experts. Each expert
        6   Child visit encounter       53         67.4     62.21%      manually derived the 3
            forms                                                       databases)
29          (5 forms)
Experiment 1: Form Tree
 Extraction
                                            97.85% of parent child semantic
                                               associations captured correctly
                                            An average tree with 135 edges
                                               gets generated in 0.08 seconds.




              Dataset1   Dataset2   Dataset3   Dataset4    Dataset5   Dataset6
Total Edges   272        362        461        2606        2674       644
Accuracy      95.22%     97.51%     100%       97.58%      98.46%     96.11%

                    Inaccuracies because of more hierarchical
                    complexity, i.e., semantic grouping and sub-
                    grouping.
30
Experiment 2: Form Term Annotation
                                      Precision    (#correct annotations
                                                   /# annotations)
                                      Recall       (# correct annotations
                    Baseline
                                                   /# relevant (gold) annotations)
                  (linguistics)




                             Hybrid
              (linguistics + semantic structure)




                           Hybrid++
31
Avg time(s)/form
Exp. 2: Form Term Annotation                                       1.28, 1.77, 2.31,
                                                                   10.29, 8.12, 3.44

                                                   Enhanced all versions by adding
                                                    term processing: remove special
                                                    character, clinical acronyms
                                                    expansion.
                                                   Precision only slightly improved
                                                    (3-5%)
                                                   Recall majorly improved (25%).
                                                   Final Precision =0.89, Recall
                                                    =0.76
  Baseline to Hybrid
    Avg. precision Improved by 26%.
    Recall no specific pattern
  Hybrid to Hybrid++
    Avg. Precision improved by 13%
    Avg. Recall improved by 17%
  Hybrid++: precision 0.86, recall 0.6

            Structural knowledge can improve the overall performance.
32                Linguistic Techniques can only impact the recall.
Experiment 3: Form to Database Mapping
     3a.Linguistic-based   3b. Concept-based   3c. Hybrid
          Discovery             Discovery      Discovery




33
Exp 3: Description of evolved databases.
 (35 to 450 tables), (Linguistic-based Discovery)   (x:element-type
                                                     y:# elements)

                                                    Mapping Duration
                                                       per form:
                                                    few ms. to 200s.




34
Exp 3: Comparison with Gold Datasets
 With Gold 1
                                With Gold 2             74%(avg.) of the system generated
                                                        tables “perfectly match" with the
                                                        tables in the gold databases.
                                                       Based on the principles of quality
                                                        and optimization, the mismatches
                                                        could be divided into: Negative and
                                                        Positive

                                           System
                                                                 A Gold DB
                 Form Pattern            Generated DB
      Positive
      Mismatch




      Negative
      Mismatch


35
Correctness. Completeness,
Exp. 3: Measuring Principle Compliance                              Normalization, Optimization,
                                                                          Compactness.
     An approx. universal set of merging situations

                                                           3a : Linguistic Discovery
             DB1    DB2    DB3    DB4    DB5     DB6
                                                           > =75% compactness in 4
Linguistic                                                 databases.
Discovery
                                                           Databases 4, 6: >=20%
                                                           rejected due of form features
Concept
Discovery                                                 Datasets 4 and 6
                                                                Format Diversity: Gender (textbox,
 Hybrid                                                          radiobuttons - M, F); DOB (single vs.
Discovery                                                        multiple textboxes)
                                                                Section Scattering
3b: Concept-based                3c: Hybrid
Discovery                        Discovery
>= 70% compactness in 3          >= 80%
databases.                       compactness in 4
Datasets 5 & 6: >=33%            databases.
undetected


36
% Reduction in no. of screens
Exp 3: Measuring User Interventions                      Avg. screen/form presented to user
                                                       Screen relevance(%)= (# of screens to
                                                       which user responds) /(# tot. screens)

Linguistic-based Discovery        Concept-based Discovery Hybrid Discovery
     % Red. Avg.       Screen rel. % Red.    Avg.      Screen    % Red.    Avg.        Screen
     screens screens   (%)         Screens   screens   rel.(%)   Screens   screens     rel. (%)
1    50      4         15.39      77         1         75        52        4           15.38
2    77      2         42.86      62         3         68.75     75        3           50
3    69      2         50.00      18         5         46.87     57        4           29.63
4    55      10        39.79      54         8         45.45     51        13          43.29
5    76      21        94.18      65         15        73.57     69        27          86.04
6    62      5         32.14      65         4         42.86     59        8           45




37
Results Summary & Implications
                                                            Exp3:
       Exp1: Form tree                                                        Interventions
                                                     Form to DB Mapping
         generation                                   (6 DBs: 35 to 450        Intervention red. 61%
                      Accuracy = 0.98
         (52 forms)                                        tables,               Intervention/form:
         0.08s/tree                                    few ms to 200s)            ling.:10, con. : 8,
        •Supervised                                                                    Hyb.:13
        •Intervention 10/tree for   Hybrid          approach
        cardinality                 improves         scenario                  Avg. screen rel. =50%
        disambiguation              identification (19%)
                                                                Validation
                                    compactness (13%)                         Principle Compliance
                                                                Algorithm
                                    over pure approaches.                      84.5% identical, or
                                    But performs less in         Birthing     superior to gold DBs
      Improve precision             terms of interventions &    Algorithm
                                                                             74% compact(hybrid)
       (43%) and recall           screen relevance.             Merging
     (29%) over baseline                                        Algorithm
          Exp2: Form term
                                                •Tune validation/merging based on form
          annotation
                          Precision= 0.89       features.
          (2500 forms)
          1 to 11s/form    Recall = 0.76        •Birthing algorithm can be refined as per
                                                   gold std.
              •Sophisticated term
              techniques                           •Interventions & screen relevance can be
              •SNOMED CT relationships             improved by enhancing validation
38            •Unsupervised learning               algorithm
5. Final Remarks




39
Thesis Contributions:
 Mapping user-designed form to relational database. (NEW problem)
Form Understanding
New Solution: 2-layered HMM that encodes designers               Merging Algorithm
knowledge. First work to apply HMMs on form understanding
                                                                 Balance b/w compactness &
Highly accurate (98%) and efficient (0.08s per form)             optimization
                                                                 Merged =>70% semantically matching
Form Term Annotation (NEW Problem!)                              elements in 11/18 cases.
Context-based solution leveraging semantic structure
                                                                 Key Recommendations
Promising (0.89 precision, 0.76 recall) and efficient (1-11s);
Improves over baseline by 43% in precision and 29% in recall     For term annotations, design hybrid
                                                                 approaches leveraging both linguistics
Correspondence Validation Algorithm                              and structural semantics.

Heuristic based solution relying on frequent observations        For improving database quality, design
                                                                 approaches leveraging both linguistic
Reduces interventions by avg. 61%.                               and     semantic     methods        for
                                                                 correspondence discovery.
Birthing Algorithm                                               Birthing algorithm could be further
Intertwines quality and optimization principles                  refined in terms of handling radio-button
                                                                 groups and extended check-boxes to
4 medium (<65 tables) & 2 large (<500 tables)-scale DBs          improve database quality.
3 medium-scale DBs intersect(or superior) with gold by 84.5%.
                                                                 Enhance validation algorithm to further
                                                                 reduce user interventions and improve
40                                                               screen relevance
Limitations – I
     Techniques                       Technique Evaluation
      Form Understanding                 Compare with other
        Weak entities, part./card.        learning models
         constraints.                      SVM, conditional random
                                            fields, Bayesian networks,
      Form Term Annotations                CAR
        Post coordinated mapping         Completeness and
      Correspondence Discovery            Correctness of Heuristics
                                           Tree design rules, Heuristics
        Concatenated matches
                                            for validation and merging,
      Merging Algorithm                    Birthing Form Patterns,
                                            Classification attributes
        Detect/eliminate circular
                                          Assumptions
         references in database.
                                           Class conditional
                                            independence, Correctness
                                            of most linguistic matching
                                            concept
41                                        Theoretical Validity of
                                           Birthing Algorithm
Limitations - II
     Study                             Experimental Design
      Thorough User Studies            Map and merge forms from
        Can users understand/select
                                         different sources
         the right correspondences?     Experiments involving both
                                         automatic form tree extraction
      Domain Expert Annotator
                                         and term annotation methods.
      Large Scale of Databases
        Result Evaluation, Gold DB
      Limited Time
        Implementation
        Experimentation




42
Future Directions
      Electronic Health Record               General
       Can Clinicians
                                              Turn into an API
          Design Forms,
           Understand/Identify                   Amazon SimpleDB
           Correspondences
                                                 Google Datastore.
       Does this framework improve
          Data Quality, Patient Diagnosis    Leveraging More Form-Related
       Legal Perspective                      Information
          HIPPA regulations, Proprietary        Past Mappings
           systems
                                                 Usage frequency
       Customize for Form Categories
                                                 Designer’s/User’s Domain
         Encounter, Walk-in, Regular
           Visit, Data-entry                      Expertise
       Use other UMLS terminologies          Mapping Maintenance and
                                               Record Conflict Resolution



43
Related Publications
    Exploiting Semantic Structure for Mapping User-specified Form Terms to
     SNOMED CT Concepts
      Khare R., An Y., Li J., Song I-Y., Hu X. In the proceedings of 2nd International Health
       Informatics Symposium (IHI 2012), Jan 28-30, 2012, Miami, FL, USA.
    Automatically Mapping and Integrating Multiple Data Entry Forms into a
     Database
      An Y., Khare R., Song I-Y., Hu X. In the proceedings of 30th International Conference on
       Conceptual Modeling (ER 2011), Oct 31-Nov 3, 2011, Brussels, Belgium.
    Can Clinicians Create High-Quality Databases? A Study on A Flexible
     Electronic Health Record (fEHR) System
      Khare R., An Y., Song I-Y., Hu X., In the proceedings of 1st International Health Informatics
       Symposium (IHI 2010), Nov 11-12, 2010, Arlington, VA, USA.
    Understanding Deep Web Search Interfaces
      Khare R., An Y., Song I-Y. Special Interest Group in Management of Data (SIGMOD) Record,
       39(1):33-40, 2010.
    An Empirical Study on using Hidden Markov Model for Search Interface
     Segmentation
      Khare R., and An Y., In the proceedings of 18th International Conference on Information and
       Knowledge Management (CIKM 2009), Nov 3-5, 2009, Hong Kong.

44
Thank you




45

Weitere ähnliche Inhalte

Was ist angesagt?

TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONTEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
 
Feature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsFeature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
 
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
 
10.1.1.41.7910
10.1.1.41.791010.1.1.41.7910
10.1.1.41.7910Chimgee_M
 
Data structures and algorithms alfred v. aho, john e. hopcroft and jeffrey ...
Data structures and algorithms   alfred v. aho, john e. hopcroft and jeffrey ...Data structures and algorithms   alfred v. aho, john e. hopcroft and jeffrey ...
Data structures and algorithms alfred v. aho, john e. hopcroft and jeffrey ...Chethan Nt
 
Semantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsSemantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
 
How and when to flatten java classes
How and when to flatten java classesHow and when to flatten java classes
How and when to flatten java classesijcseit
 
Corporate data handling
Corporate data handlingCorporate data handling
Corporate data handlingJaipal Dhobale
 
ICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short NotesICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short NotesAbdul Haseeb
 
Processing vietnamese news titles to answer relative questions in vnewsqa ict...
Processing vietnamese news titles to answer relative questions in vnewsqa ict...Processing vietnamese news titles to answer relative questions in vnewsqa ict...
Processing vietnamese news titles to answer relative questions in vnewsqa ict...ijnlc
 
A&D - Object Oriented Analysis using UML
A&D - Object Oriented Analysis using UMLA&D - Object Oriented Analysis using UML
A&D - Object Oriented Analysis using UMLvinay arora
 
Molecular similarity searching methods, seminar
Molecular similarity searching methods, seminarMolecular similarity searching methods, seminar
Molecular similarity searching methods, seminarHaitham Hijazi
 

Was ist angesagt? (14)

TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONTEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTION
 
1861 1865
1861 18651861 1865
1861 1865
 
Feature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsFeature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documents
 
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
 
10.1.1.41.7910
10.1.1.41.791010.1.1.41.7910
10.1.1.41.7910
 
Data structures and algorithms alfred v. aho, john e. hopcroft and jeffrey ...
Data structures and algorithms   alfred v. aho, john e. hopcroft and jeffrey ...Data structures and algorithms   alfred v. aho, john e. hopcroft and jeffrey ...
Data structures and algorithms alfred v. aho, john e. hopcroft and jeffrey ...
 
Semantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with IdiomsSemantic Based Model for Text Document Clustering with Idioms
Semantic Based Model for Text Document Clustering with Idioms
 
How and when to flatten java classes
How and when to flatten java classesHow and when to flatten java classes
How and when to flatten java classes
 
Corporate data handling
Corporate data handlingCorporate data handling
Corporate data handling
 
Phenoflow 2021
Phenoflow 2021Phenoflow 2021
Phenoflow 2021
 
ICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short NotesICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short Notes
 
Processing vietnamese news titles to answer relative questions in vnewsqa ict...
Processing vietnamese news titles to answer relative questions in vnewsqa ict...Processing vietnamese news titles to answer relative questions in vnewsqa ict...
Processing vietnamese news titles to answer relative questions in vnewsqa ict...
 
A&D - Object Oriented Analysis using UML
A&D - Object Oriented Analysis using UMLA&D - Object Oriented Analysis using UML
A&D - Object Oriented Analysis using UML
 
Molecular similarity searching methods, seminar
Molecular similarity searching methods, seminarMolecular similarity searching methods, seminar
Molecular similarity searching methods, seminar
 

Andere mochten auch

Guia argentina de tratamiento de la EPOC
Guia argentina de tratamiento de la EPOCGuia argentina de tratamiento de la EPOC
Guia argentina de tratamiento de la EPOCAlejandro Videla
 
BwN Concepts & Solutions For Wb Delegation
BwN Concepts & Solutions For Wb DelegationBwN Concepts & Solutions For Wb Delegation
BwN Concepts & Solutions For Wb Delegationmindertdevries
 
"LinkedIn 101 for Nonprofits", An Axelson Center Webinar.
"LinkedIn 101 for Nonprofits", An Axelson Center Webinar."LinkedIn 101 for Nonprofits", An Axelson Center Webinar.
"LinkedIn 101 for Nonprofits", An Axelson Center Webinar.Box
 
AppSec USA - LASCON Edition
AppSec USA - LASCON EditionAppSec USA - LASCON Edition
AppSec USA - LASCON EditionSherif Koussa
 
How Good of a Java Developer are You?
How Good of a Java Developer are You?How Good of a Java Developer are You?
How Good of a Java Developer are You?Sherif Koussa
 
10 Nonprofit Success Stories Using LinkedIn - Stanford Bus 109 Lecture 1/21/14
10 Nonprofit Success Stories Using LinkedIn - Stanford Bus 109 Lecture 1/21/1410 Nonprofit Success Stories Using LinkedIn - Stanford Bus 109 Lecture 1/21/14
10 Nonprofit Success Stories Using LinkedIn - Stanford Bus 109 Lecture 1/21/14Box
 
Security Code Reviews. Does Your Code Need an Open Heart Surgery and The 6 Po...
Security Code Reviews. Does Your Code Need an Open Heart Surgery and The 6 Po...Security Code Reviews. Does Your Code Need an Open Heart Surgery and The 6 Po...
Security Code Reviews. Does Your Code Need an Open Heart Surgery and The 6 Po...Sherif Koussa
 
Security Code Review: Magic or Art?
Security Code Review: Magic or Art?Security Code Review: Magic or Art?
Security Code Review: Magic or Art?Sherif Koussa
 
Program Comprehension - An Evaluation of the Strategies of Sorting, Filtering...
Program Comprehension - An Evaluation of the Strategies of Sorting, Filtering...Program Comprehension - An Evaluation of the Strategies of Sorting, Filtering...
Program Comprehension - An Evaluation of the Strategies of Sorting, Filtering...ICSM 2011
 
UofC Marketing Communications
UofC Marketing CommunicationsUofC Marketing Communications
UofC Marketing CommunicationsJohn Hutchings
 
Simplified Security Code Review Process
Simplified Security Code Review ProcessSimplified Security Code Review Process
Simplified Security Code Review ProcessSherif Koussa
 

Andere mochten auch (20)

Remote Mentoring Young Girls in STEM through MAGIC
Remote Mentoring Young Girls in STEM through MAGICRemote Mentoring Young Girls in STEM through MAGIC
Remote Mentoring Young Girls in STEM through MAGIC
 
Understanding EMR Error Control Practices Among Gynecologic Physicians
Understanding EMR Error Control Practices Among Gynecologic PhysiciansUnderstanding EMR Error Control Practices Among Gynecologic Physicians
Understanding EMR Error Control Practices Among Gynecologic Physicians
 
Improving Interoperability of Text Mining Tools with BioC
Improving Interoperability of Text Mining Tools with BioCImproving Interoperability of Text Mining Tools with BioC
Improving Interoperability of Text Mining Tools with BioC
 
презентация (Quasi synonyms, вмк 25.10.2011)
презентация (Quasi synonyms, вмк 25.10.2011)презентация (Quasi synonyms, вмк 25.10.2011)
презентация (Quasi synonyms, вмк 25.10.2011)
 
нс2
нс2нс2
нс2
 
Модули автоматической обработки текстов в проекте aot.ru
Модули автоматической обработки текстов в проекте aot.ruМодули автоматической обработки текстов в проекте aot.ru
Модули автоматической обработки текстов в проекте aot.ru
 
Автоматическая коррекция ошибок сочетаемости слов в текстах на естественном я...
Автоматическая коррекция ошибок сочетаемости слов в текстах на естественном я...Автоматическая коррекция ошибок сочетаемости слов в текстах на естественном я...
Автоматическая коррекция ошибок сочетаемости слов в текстах на естественном я...
 
Guia argentina de tratamiento de la EPOC
Guia argentina de tratamiento de la EPOCGuia argentina de tratamiento de la EPOC
Guia argentina de tratamiento de la EPOC
 
BwN Concepts & Solutions For Wb Delegation
BwN Concepts & Solutions For Wb DelegationBwN Concepts & Solutions For Wb Delegation
BwN Concepts & Solutions For Wb Delegation
 
"LinkedIn 101 for Nonprofits", An Axelson Center Webinar.
"LinkedIn 101 for Nonprofits", An Axelson Center Webinar."LinkedIn 101 for Nonprofits", An Axelson Center Webinar.
"LinkedIn 101 for Nonprofits", An Axelson Center Webinar.
 
AppSec USA - LASCON Edition
AppSec USA - LASCON EditionAppSec USA - LASCON Edition
AppSec USA - LASCON Edition
 
Mike thelwall ritu
Mike thelwall rituMike thelwall ritu
Mike thelwall ritu
 
Can Clinicians Create High-Quality Databases?
Can Clinicians Create High-Quality Databases?Can Clinicians Create High-Quality Databases?
Can Clinicians Create High-Quality Databases?
 
How Good of a Java Developer are You?
How Good of a Java Developer are You?How Good of a Java Developer are You?
How Good of a Java Developer are You?
 
10 Nonprofit Success Stories Using LinkedIn - Stanford Bus 109 Lecture 1/21/14
10 Nonprofit Success Stories Using LinkedIn - Stanford Bus 109 Lecture 1/21/1410 Nonprofit Success Stories Using LinkedIn - Stanford Bus 109 Lecture 1/21/14
10 Nonprofit Success Stories Using LinkedIn - Stanford Bus 109 Lecture 1/21/14
 
Security Code Reviews. Does Your Code Need an Open Heart Surgery and The 6 Po...
Security Code Reviews. Does Your Code Need an Open Heart Surgery and The 6 Po...Security Code Reviews. Does Your Code Need an Open Heart Surgery and The 6 Po...
Security Code Reviews. Does Your Code Need an Open Heart Surgery and The 6 Po...
 
Security Code Review: Magic or Art?
Security Code Review: Magic or Art?Security Code Review: Magic or Art?
Security Code Review: Magic or Art?
 
Program Comprehension - An Evaluation of the Strategies of Sorting, Filtering...
Program Comprehension - An Evaluation of the Strategies of Sorting, Filtering...Program Comprehension - An Evaluation of the Strategies of Sorting, Filtering...
Program Comprehension - An Evaluation of the Strategies of Sorting, Filtering...
 
UofC Marketing Communications
UofC Marketing CommunicationsUofC Marketing Communications
UofC Marketing Communications
 
Simplified Security Code Review Process
Simplified Security Code Review ProcessSimplified Security Code Review Process
Simplified Security Code Review Process
 

Ähnlich wie Dissertation Defense Presentation

Student POST  Database processing models showcase the logical s.docx
Student POST  Database processing models showcase the logical s.docxStudent POST  Database processing models showcase the logical s.docx
Student POST  Database processing models showcase the logical s.docxorlandov3
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data toIJwest
 
Metadata and Cooperative Knowledge Management
Metadata and Cooperative Knowledge ManagementMetadata and Cooperative Knowledge Management
Metadata and Cooperative Knowledge ManagementRalf Klamma
 
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYUSING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYcseij
 
Mc0077 – advanced database systems
Mc0077 – advanced database systemsMc0077 – advanced database systems
Mc0077 – advanced database systemsRabby Bhatt
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemIRJET Journal
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.docbutest
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.docbutest
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.docbutest
 
Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation
Understanding Clinical Forms: Structure Discovery and SNOMED CT AnnotationUnderstanding Clinical Forms: Structure Discovery and SNOMED CT Annotation
Understanding Clinical Forms: Structure Discovery and SNOMED CT AnnotationThe Children's Hospital of Philadelphia
 
An approach for transforming of relational databases to owl ontology
An approach for transforming of relational databases to owl ontologyAn approach for transforming of relational databases to owl ontology
An approach for transforming of relational databases to owl ontologyIJwest
 
Towards Ontology Development Based on Relational Database
Towards Ontology Development Based on Relational DatabaseTowards Ontology Development Based on Relational Database
Towards Ontology Development Based on Relational Databaseijbuiiir1
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
 
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...SBGC
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dcc.titus.brown
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...Patricia Tavares Boralli
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 

Ähnlich wie Dissertation Defense Presentation (20)

Student POST  Database processing models showcase the logical s.docx
Student POST  Database processing models showcase the logical s.docxStudent POST  Database processing models showcase the logical s.docx
Student POST  Database processing models showcase the logical s.docx
 
Automatically converting tabular data to
Automatically converting tabular data toAutomatically converting tabular data to
Automatically converting tabular data to
 
Metadata and Cooperative Knowledge Management
Metadata and Cooperative Knowledge ManagementMetadata and Cooperative Knowledge Management
Metadata and Cooperative Knowledge Management
 
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYUSING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
 
Mc0077 – advanced database systems
Mc0077 – advanced database systemsMc0077 – advanced database systems
Mc0077 – advanced database systems
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
 
Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation
Understanding Clinical Forms: Structure Discovery and SNOMED CT AnnotationUnderstanding Clinical Forms: Structure Discovery and SNOMED CT Annotation
Understanding Clinical Forms: Structure Discovery and SNOMED CT Annotation
 
An approach for transforming of relational databases to owl ontology
An approach for transforming of relational databases to owl ontologyAn approach for transforming of relational databases to owl ontology
An approach for transforming of relational databases to owl ontology
 
Towards Ontology Development Based on Relational Database
Towards Ontology Development Based on Relational DatabaseTowards Ontology Development Based on Relational Database
Towards Ontology Development Based on Relational Database
 
Dissertation Proposal Presentation
Dissertation Proposal Presentation Dissertation Proposal Presentation
Dissertation Proposal Presentation
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
 
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
 
DBMS - Introduction
DBMS - IntroductionDBMS - Introduction
DBMS - Introduction
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...A semantic framework and software design to enable the transparent integratio...
A semantic framework and software design to enable the transparent integratio...
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 

Kürzlich hochgeladen

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Dissertation Defense Presentation

  • 1. A Framework for Mapping User-designed Forms to Relational Databases Dissertation Presentation November 15 2011 Ritu Khare COMMITTEE : Dr. Yuan An (Chair) Dr. Jiexun Jason Li Dr. Il-Yeol Song Dr. Min Song Dr. Christopher C. Yang 1
  • 2. Presentation Order 1. Motivation 2. Problems 3. Solutions 4. Evaluation 5. Final Remarks 2
  • 4. General Motivation: Database Usability (Sawyer, 1995)  Enable users to SEARCH and  Enable users to DESIGN QUERY databases databases. (Jagadish et al. 2007)  Information Retrieval  Form-based DIY and WYSIWYG Techniques (Liu et al, 2006, Hristidis paradigms et al., 2003, Catarci, 2000, Jayapandian  FormAssembly, ZohoCreator, and Jagadish, 2006) GoogleForms Databases still remain unusable from the integration point of view (Gurses et al., 2009) 4
  • 5. Precise Motivation: Integration of New Needs New needs related to 1) Building of new forms patient’s social 2) Integration of new form habits into back-end 5
  • 6. Research Objective  To develop a mechanism to automatically map and integrate a user-designed form into existing structured database.  Assume that a user-designed form is already acquired  Seek a framework that  merges the semantically matching elements between forms and databases.  creates new database elements corresponding to the unmatched form elements. 6
  • 8. A form template represents the semantic intentions of the designer Problem #1 : Form Understanding Existing Work  Focus on Search Forms (Benslimane, et al. 2007, Kaljuviee et al., 2001)  shorter and simpler than the data-entry forms. (empirical finding)  Rules and heuristics (Zhang et al. 2004, He et al., 2007)  Automatic Extraction of the form semantics  not likely to circumvent the  Machine can only read the syntactic patterns ever broadening varieties in of form elements. A certain layout pattern form topologies cannot be associated with a semantic intention. 8
  • 9. Problem#2: Correspondence Discovery Existing Work  Schema and Ontology Mapping (Madhavan et al., 2001,  Detect semantically matching Euzenat and Shvaiko, 2005, Rahm and Bernstein, 2001, An et al. 2005, An et al. 2006) elements between a form and  Mostly semi-automatic an existing database  Not applicable to form to  Challenges database correspondence discovery  Variety of terms to denote the  Heterogeneity between forms and same concepts. databases  Correspondences are to be used for  Variety of concepts denoted evolving the database; the discovery process has to keep this requirement by similar terms into consideration.  Identify and eliminate the invalid correspondences. 9
  • 10. Problem# 3: Form Integration Problem#3a: Merging Existing Work  Merging into an existing  Form integration (Yang et al., database so that the same 2008) concept is not duplicated and  largely manual the database remains  expose the users to the technical compact. details of the underlying data  Merging increases the model. potential of having NULL  Database integration (Yang et al. values, i.e., less optimized 2003) database.  provide guidelines.  Judicious Decisions 10
  • 11. Problem# 3: Form Integration Problem#3b: Birthing Existing Work:  Extend the database for  Form-based database design the unmatched form  Several methods (Choobineh et al. 1988, Pavicevic et al, 2006, Choobeneh and elements Venkatraman, 1992, Deklarit, 2008) and commercial tools (Form assembly,  How to automatically google forms, zohocreator, wufoo)  No empirical evaluation of the derive the functional resultant databases dependencies among the  Few focus on designing a database with certain desirable properties, form elements? e.g., expressiveness (Yang et al, 2008, Choobineh et al., 1988, Lukovic, et al 2007).  How to translate the  These properties do not reflect complex form patterns? any compliance with the form semantics and are inadequate  How to evaluate multiple for evaluating the mapping process. design alternatives & pick one? 11
  • 12. Research Questions and System Goals 1. Form Understanding System Goals:  A model to capture the form 1. To evolve a DB that is high- semantics quality and optimized as per  Extract this model from a given the form semantics, i.e., compliant to the principles form (Wang and Strong, 1996, Ramakrishnan and Gehrke, 2002, 2. Correspondence Discovery Silberschatz, et al., 2001, Batini and Scannapieco, 2006):  Determine semantically  Completeness: All form equivalent elements b/w form & elements represented in database database  Incorporate DB evolution  Correctness: Form semantics retained: requirement during discovery  Compactness: Equivalent process elements merged 3. Form Integration  Normalization: 3NF w.r.t. form’s functional  Resolve merging conflicts while dependencies maintaining the original form  Minimize NULL values in semantics FKs and Descriptive attributes  Given a form pattern, derive a 2. To ensure minimalism in the relational database with required user intervention 12 “desirable” properties
  • 14. Form Representation: Form Tree  The form tree accurately captures the designer's intentions, and hence the semantic associations among the form elements.  Inspired by hierarchical modeling of forms in existing works (Dragut et al. 2009, Wu et al. 2009) 14
  • 15. Framework Outline Form Understanding Form Tree and Semantics Extraction Correspondenc Form Tree with e Discovery and Discovered Validation Correspondences Database Design and Database Evolution 15
  • 16. Method 1a: Form Tree Generation 16
  • 17. Method 1a: Form Tree Generation I. Tag and 2. Derive Tree Segment Phase Phase(5 rules)  The approach leverages the probabilistic nature of form design and develops a 2-layered Hidden Markov Model (HMM) based artificial designer that has the ability to understand the semantics of any arbitrarily designed form.  T-HMM: Tagging HMM  S-HMM-Segmentation T-HMM 17
  • 18. Method 1b: Form Term Annotation  Refine semantics by annotating terms  Systematized Nomenclature of Medicine  Challenge: Same form term can be Clinical Terms (SNOMED CT) comprising specified in multiple contexts, i.e., 360,000 concepts belonging to various semantic categories. The key is to identify semantic categories. the semantic category for a given term.  We hypothesize that the term context can ConceptID Description Semantic Category be derived from the structure of the form tree. 0231832 Respiratory Rate Observable Entity 362508001 Both eyes, entire Body Structure 18
  • 19. Method 1b: Form term annotation Form Tree SNOMED CT Choose the Form Structure Classification best match SNOMED Term CT Analyzer Model Semantic concept from this category Concept category SNOMED CT search service 19
  • 20. Method 2: Correspondence Discovery and Validation Linguistic Exact Concept Matching Matching 1 2 20
  • 21. Total Heuristics = 4 Method 2: Validation Algorithm Past Medical X  History History X  Id HPI Medications SocialHistory Family Hx History of Meds X  present Illness Oral  Hygiene Appetite Id Options radio  1 Good 2 Fair  good poor 3 Poor Look-up table 21
  • 22. Method 3: Database Design and Evolution 1 2 3 22
  • 23. Method 3a: Birthing Algorithm Total Patterns = 12  Principles: High Quality(Complete, Correct, Compact, Normalized) and Optimization (minimize NULLs)  Traverses the form tree in depth first order M:1 Tj.ID -> Tj.c Radiobutton Pattern Textbox Pattern Category/subcategory Pattern Extended RB Pattern 23
  • 24. Method 3a: Birthing Algorithm Sibling categories pattern Textbox pattern Category- subcat. pattern Textbox 24 Radiobutton Checkbox pattern pattern pattern
  • 25. Method 3: Database Design and Evolution 1 2 3 25
  • 26. Tot. merging scenarios = 8 Method 3b: Merging Algorithm  Compactness Factor(CF): A  Each merger involves a trade-off configurable value (0,1) that indicates between compactness and the weightage given to compactness optimization (min. NULL values)  Null Value Ratio(NVR): A calculated principles. value that indicates the potential of having NULL values in a given table. New DB Existing DB NVR = 2/5=0.4 Case a: CF=0.5 Case b: CF=0.3 Final DB (CF>NVR) (CF<=NVR) 26 More Compact More Optimized
  • 28. System Goals: Principle Compliance & Min. Interventions Evaluation Goals: Java, Tomcat, A. How well the system meets the goals? MySQL Server, yFiles, JSP B. Impact of framework in accomplishing the goals ? EM & Viterbi, cross- HMM-based tree validation extraction SNOMED CT Form Tree Term Annotation Linguistic Naïve Bayes Classifier, Similarity Top-4 classes, SnAPI, =Lucene’s Default Cross-validation per Corr. Settings Form Tree with dataset Discovery Discovered Validation Correspondences Algorithm Birthing Algorithm Database Merging 28 Algorithm CF=0.7
  • 29. Data (52 real world forms from 6 medical institutions)  Healthcare : Forms are prevalent, and Information systems are unusable and inflexible. Dataset Avg. Avg. SNOMED Terms Inputs CT Mappability 1 Walk in clinic encounter 32.33 49.33 75.77 % forms (3 forms) Gold Benchmarks 2 Nursing patient 17.17 33 63.98% 52 Gold Std Trees admission forms (6 (using a DIY interface that forms) captures designers’ on- the-fly semantic decisions) 3 Labor & delivery DB data- 16.14 37.29 58.8 % entry forms (7 forms) Gold Std Annotations (4235 form terms were 4 Adult visit encounter 47.83 65.22 56.2% manually studied & 2506 forms (59%) had corr. concept in SNOMED CT) (18 forms) 5 Family practice forms 82.61 100.46 59.38% 3 pairs of Gold DBs (3 datasets were given to (13 forms) 2 experts. Each expert 6 Child visit encounter 53 67.4 62.21% manually derived the 3 forms databases) 29 (5 forms)
  • 30. Experiment 1: Form Tree Extraction  97.85% of parent child semantic associations captured correctly  An average tree with 135 edges gets generated in 0.08 seconds. Dataset1 Dataset2 Dataset3 Dataset4 Dataset5 Dataset6 Total Edges 272 362 461 2606 2674 644 Accuracy 95.22% 97.51% 100% 97.58% 98.46% 96.11% Inaccuracies because of more hierarchical complexity, i.e., semantic grouping and sub- grouping. 30
  • 31. Experiment 2: Form Term Annotation Precision (#correct annotations /# annotations) Recall (# correct annotations Baseline /# relevant (gold) annotations) (linguistics) Hybrid (linguistics + semantic structure) Hybrid++ 31
  • 32. Avg time(s)/form Exp. 2: Form Term Annotation 1.28, 1.77, 2.31, 10.29, 8.12, 3.44  Enhanced all versions by adding term processing: remove special character, clinical acronyms expansion.  Precision only slightly improved (3-5%)  Recall majorly improved (25%).  Final Precision =0.89, Recall =0.76  Baseline to Hybrid  Avg. precision Improved by 26%.  Recall no specific pattern  Hybrid to Hybrid++  Avg. Precision improved by 13%  Avg. Recall improved by 17%  Hybrid++: precision 0.86, recall 0.6 Structural knowledge can improve the overall performance. 32 Linguistic Techniques can only impact the recall.
  • 33. Experiment 3: Form to Database Mapping 3a.Linguistic-based 3b. Concept-based 3c. Hybrid Discovery Discovery Discovery 33
  • 34. Exp 3: Description of evolved databases. (35 to 450 tables), (Linguistic-based Discovery) (x:element-type y:# elements) Mapping Duration per form: few ms. to 200s. 34
  • 35. Exp 3: Comparison with Gold Datasets With Gold 1 With Gold 2  74%(avg.) of the system generated tables “perfectly match" with the tables in the gold databases.  Based on the principles of quality and optimization, the mismatches could be divided into: Negative and Positive System A Gold DB Form Pattern Generated DB Positive Mismatch Negative Mismatch 35
  • 36. Correctness. Completeness, Exp. 3: Measuring Principle Compliance Normalization, Optimization, Compactness. An approx. universal set of merging situations 3a : Linguistic Discovery DB1 DB2 DB3 DB4 DB5 DB6 > =75% compactness in 4 Linguistic databases. Discovery Databases 4, 6: >=20% rejected due of form features Concept Discovery  Datasets 4 and 6  Format Diversity: Gender (textbox, Hybrid radiobuttons - M, F); DOB (single vs. Discovery multiple textboxes)  Section Scattering 3b: Concept-based 3c: Hybrid Discovery Discovery >= 70% compactness in 3 >= 80% databases. compactness in 4 Datasets 5 & 6: >=33% databases. undetected 36
  • 37. % Reduction in no. of screens Exp 3: Measuring User Interventions Avg. screen/form presented to user Screen relevance(%)= (# of screens to which user responds) /(# tot. screens) Linguistic-based Discovery Concept-based Discovery Hybrid Discovery % Red. Avg. Screen rel. % Red. Avg. Screen % Red. Avg. Screen screens screens (%) Screens screens rel.(%) Screens screens rel. (%) 1 50 4 15.39 77 1 75 52 4 15.38 2 77 2 42.86 62 3 68.75 75 3 50 3 69 2 50.00 18 5 46.87 57 4 29.63 4 55 10 39.79 54 8 45.45 51 13 43.29 5 76 21 94.18 65 15 73.57 69 27 86.04 6 62 5 32.14 65 4 42.86 59 8 45 37
  • 38. Results Summary & Implications Exp3: Exp1: Form tree Interventions Form to DB Mapping generation (6 DBs: 35 to 450 Intervention red. 61% Accuracy = 0.98 (52 forms) tables, Intervention/form: 0.08s/tree few ms to 200s) ling.:10, con. : 8, •Supervised Hyb.:13 •Intervention 10/tree for Hybrid approach cardinality improves scenario Avg. screen rel. =50% disambiguation identification (19%) Validation compactness (13%) Principle Compliance Algorithm over pure approaches. 84.5% identical, or But performs less in Birthing superior to gold DBs Improve precision terms of interventions & Algorithm 74% compact(hybrid) (43%) and recall screen relevance. Merging (29%) over baseline Algorithm Exp2: Form term •Tune validation/merging based on form annotation Precision= 0.89 features. (2500 forms) 1 to 11s/form Recall = 0.76 •Birthing algorithm can be refined as per gold std. •Sophisticated term techniques •Interventions & screen relevance can be •SNOMED CT relationships improved by enhancing validation 38 •Unsupervised learning algorithm
  • 40. Thesis Contributions: Mapping user-designed form to relational database. (NEW problem) Form Understanding New Solution: 2-layered HMM that encodes designers Merging Algorithm knowledge. First work to apply HMMs on form understanding Balance b/w compactness & Highly accurate (98%) and efficient (0.08s per form) optimization Merged =>70% semantically matching Form Term Annotation (NEW Problem!) elements in 11/18 cases. Context-based solution leveraging semantic structure Key Recommendations Promising (0.89 precision, 0.76 recall) and efficient (1-11s); Improves over baseline by 43% in precision and 29% in recall For term annotations, design hybrid approaches leveraging both linguistics Correspondence Validation Algorithm and structural semantics. Heuristic based solution relying on frequent observations For improving database quality, design approaches leveraging both linguistic Reduces interventions by avg. 61%. and semantic methods for correspondence discovery. Birthing Algorithm Birthing algorithm could be further Intertwines quality and optimization principles refined in terms of handling radio-button groups and extended check-boxes to 4 medium (<65 tables) & 2 large (<500 tables)-scale DBs improve database quality. 3 medium-scale DBs intersect(or superior) with gold by 84.5%. Enhance validation algorithm to further reduce user interventions and improve 40 screen relevance
  • 41. Limitations – I Techniques Technique Evaluation  Form Understanding  Compare with other  Weak entities, part./card. learning models constraints.  SVM, conditional random fields, Bayesian networks,  Form Term Annotations CAR  Post coordinated mapping  Completeness and  Correspondence Discovery Correctness of Heuristics  Tree design rules, Heuristics  Concatenated matches for validation and merging,  Merging Algorithm Birthing Form Patterns, Classification attributes  Detect/eliminate circular  Assumptions references in database.  Class conditional independence, Correctness of most linguistic matching concept 41  Theoretical Validity of Birthing Algorithm
  • 42. Limitations - II Study Experimental Design  Thorough User Studies  Map and merge forms from  Can users understand/select different sources the right correspondences?  Experiments involving both automatic form tree extraction  Domain Expert Annotator and term annotation methods.  Large Scale of Databases  Result Evaluation, Gold DB  Limited Time  Implementation  Experimentation 42
  • 43. Future Directions Electronic Health Record General  Can Clinicians  Turn into an API  Design Forms, Understand/Identify  Amazon SimpleDB Correspondences  Google Datastore.  Does this framework improve  Data Quality, Patient Diagnosis  Leveraging More Form-Related  Legal Perspective Information  HIPPA regulations, Proprietary  Past Mappings systems  Usage frequency  Customize for Form Categories  Designer’s/User’s Domain  Encounter, Walk-in, Regular Visit, Data-entry Expertise  Use other UMLS terminologies  Mapping Maintenance and Record Conflict Resolution 43
  • 44. Related Publications  Exploiting Semantic Structure for Mapping User-specified Form Terms to SNOMED CT Concepts  Khare R., An Y., Li J., Song I-Y., Hu X. In the proceedings of 2nd International Health Informatics Symposium (IHI 2012), Jan 28-30, 2012, Miami, FL, USA.  Automatically Mapping and Integrating Multiple Data Entry Forms into a Database  An Y., Khare R., Song I-Y., Hu X. In the proceedings of 30th International Conference on Conceptual Modeling (ER 2011), Oct 31-Nov 3, 2011, Brussels, Belgium.  Can Clinicians Create High-Quality Databases? A Study on A Flexible Electronic Health Record (fEHR) System  Khare R., An Y., Song I-Y., Hu X., In the proceedings of 1st International Health Informatics Symposium (IHI 2010), Nov 11-12, 2010, Arlington, VA, USA.  Understanding Deep Web Search Interfaces  Khare R., An Y., Song I-Y. Special Interest Group in Management of Data (SIGMOD) Record, 39(1):33-40, 2010.  An Empirical Study on using Hidden Markov Model for Search Interface Segmentation  Khare R., and An Y., In the proceedings of 18th International Conference on Information and Knowledge Management (CIKM 2009), Nov 3-5, 2009, Hong Kong. 44

Hinweis der Redaktion

  1. Form is designed for human consumption. Shorter 10 times – studied on 50 forms from both categories , simpler – hierarchical and repre of database tables (single vs multiple) Explain what is the problem and why its challenging? Syntactic means – formatting and sequence. Patters are infinite and design is so arbittrary that a certain pattern cant be associated with a certain semantic intention.These approaches rely on rendering engines (Gecko, Trident), which makes them browser dependent and inefficient.
  2. to link these elements to the corresponding semantically matching elements of the existing hidden database.Form has values. And longer terms
  3. Whether to merge or not to mergewhether the element in question becomes a new column in a new tablecorresponding to Diagnosis and link the column through foreign key, or do we duplicate this column into the new table and reduce the number of joins.
  4. Make sure everything i.e. the rest of the presentation aligns with this. we seek the answers to these research questions through the development of a system that automatically maps a user-designed form to an existing database.
  5. Prepare obvious answers – how is DOM tree different from semantic tree. Why we generate corres from form tree and then transfer to new database – so that users are presented corres. In terms of the form they had designed. DB-DB integration could be done – but here we leverage semantic form properties. As well.
  6. The input form is represented as an equivalent semantic form tree using a form understandingalgorithm. We adopt a proactive approach to mapping in that we also standardize the formterms using an annotation technique focusing on the healthcare domain. Our solutions to theform understanding and the term annotation algorithms are described in Chapter 9.2. The generated semantic form tree is then studied with respect to the existing database; andthe semantic correspondences between the form tree and the existing database elements arediscovered and validated using user interventions and certain validation rules. This part isdescribed in Chapter 10.3. The form tree with discovered correspondences to the existing database elements is thenmapped and merged with the existing database. In particular, the matching elements aremerged to the target database elements and the new form elements are transformed into newdatabase elements and the existing database is extended using the new database elements.The database design and evolution algorithms are described in Chapter 11.
  7. Approach identifies semantic grouping
  8. the widely used medical terminology.
  9. The HMMs are tailored for data-entry forms, and are aligned with the forms hierarchical complexity thereby providing a high extraction accuracy (Khare and An, 2009)
  10. Who designed the forms? Why not other domains – which other domains? Possible. Have some idea. – opportunity to study whether systems can be improved.
  11. Why does recall decrease – when number of correct predictions decrease on applying the hybrid method. Sometime linguitic approach returns more accurate result.
  12. total number of screens wherein the user suggested to merge the elements over the total number of screens generated as a result of executing the validation algorithm.amount of redundancy minimization performed by the algorithm
  13. Each area indicates the contribution of a form in generating the database elements.The peaks denote the general pattern of forms in a given dataset. Most of the datasets peak atcolumns, implying the most prevalence of textbox fields in the forms. The database 2 peaks atvalues implying the prevalence of select and radiobuttonelds in the forms. The database 5 peaksat foreign keys indicating the prevalence of categories and subcategories in the forms. The broad areas represent the presence of longer forms, and the narrower regions represent the presence ofshorter, or mergeable forms.This does not include the form tree generation time, user intervention time, or the execution of database DDL statements. The duration follows no fixed pattern. It depends multiple factors including the size ofthe form, and the size of the existing database. Lucene indexing helped in controlling the durationand it ranges from a few milliseconds to 200 seconds, even for the large-scale databases such as theones generated from the datasets 4 and 5.
  14. We performed a table-level comparison, We manually analyzed the mismatched tables
  15. At least 50% for all datasets. Huge reduction – many scenarios could be validated were found. 5 options per screen. Screen relevance – very low This denotes that most of the correspondences, identified using the linguistic matching method adopted by Lucene, were not semantically matching, and were hence rejected by the user. The screen relevance was particularly higher (94%) for the dataset 5 that represents the family practice forms. In these forms, the linguistically matching and yet semantically differing terms were not very prevalent. Approved merger for dataset 3, out of all the mergeable form elements, identified by the validation algorithm, 97.29% were merged to a semantically matching database element.
  16. And did we reach all system goals? Specify again. Clearly. Did we reach the system goals?
  17. Our experience of tagging 52 data-entry forms suggests that the training samples can be constructed quickly and easily, as compared to the construction of exhaustive set of rules or heuristicsTo further test the performance of the mapping framework in a heterogeneous environment,