SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Project group knowAAN
   Final presentation


 Computer Science Education Group
     University of Paderborn


     October 20th 2011
Overview



Overview



    Introduction
    System components & Work flow
    Demonstration
    Development process
    Summary & Outlook
    Time for further questions of detail




                   PG knowAAN                    2
Overview



Overview: First part



    Goals
    Extraction & Storage (of data)
    Exploration (of data)
    System components & Work flow
    Analysis & Visualization (of data)




                PG knowAAN                     3
Goals



Goals

    Explore research networks
    Based on: Artifacts (scientific publications) and metadata
    Combination and analysis of data
    Computation of similarities of full texts
    Support for conference management system Ginkgo
    Data visualization
    Recommendations

              (Source: PG knowAAN project description)



                 PG knowAAN                                        4
Goals


Imagine you are interested in a conference.
You downloaded the papers of 2 or 3 years.
  Now you have nearly 100 publications.
       How do you explore them?




   100 publications. Do you know tools?
      PG knowAAN                                 5
Extraction & Storage



Extraction & Storage




           First step: Extract data and store it.




             PG knowAAN                                               6
Extraction & Storage




PG knowAAN                     7
Exploration



Exploration




               Second step: Explore data.




              PG knowAAN                             8
Exploration



Exploring a conference




             PG knowAAN            9
Exploration



Exploration




      Which extracted data is available for a publication?
                     → Database schema




                PG knowAAN                                           10
discipline                                     pub_dis                           pub_aff                                                                             affiliation
            id GUID                                        publication_id GUID               publication_id GUID                                                               id GUID
            text VARCHAR(512)                              discipline_id GUID                affiliation_id GUID                                                               text VARCHAR(512)
            parent_id GUID                               Indexes                           Indexes                                                                             location_id GUID
                                                                                                                                           aut_aff
           Indexes                                                                                                                                                            Indexes
                                                                                                                                         author_id GUID
                                                                                                                                         affiliation_id GUID
                                                                                                                                        Indexes
                                    pub_key                           publication
   keyword                        publication_id GUID               id GUID
 id GUID                          keyword_id GUID                   lucuid VARCHAR(512)
 text VARCHAR(512)                score DOUBLE                      title VARCHAR(512)                                                         author
                                                                                                                   pub_aut
Indexes                           source VARCHAR(512)               booktitle VARCHAR(512)                                                   id GUID
                                                                                                              publication_id GUID
                                 Indexes                            normtitle VARCHAR(512)                                                   text VARCHAR(512)
                                                                                                              author_id GUID                                                       location
                                                                    date VARCHAR(512)                                                        normtext VARCHAR(512)
                                                                                                           Indexes                                                             id GUID
                                    pub_con                         editor VARCHAR(512)                                                      firstname VARCHAR(512)
                                                                                                                                                                               latitude DOUBLE
   concept                        publication_id GUID               journal VARCHAR(512)                                                     lastname VARCHAR(512)
                                                                                                                                                                               longitude DOUBLE
 id GUID                          concept_id GUID                   note VARCHAR(512)                              citation                  created BIGINT
                                                                                                                                                                               text VARCHAR(512)
 text VARCHAR(512)                score DOUBLE                      pages VARCHAR(512)                        publication1_id GUID           modified BIGINT
                                                                                                                                                                              Indexes
Indexes                           source VARCHAR(512)               publisher VARCHAR(512)                                                 Indexes
                                                                                                              publication2_id GUID
                                 Indexes                            tech VARCHAR(512)                      Indexes
                                                                    volume VARCHAR(512)
                                    pub_cat                         number VARCHAR(512)
                                                                                                                                                          aut_add
   category                       publication_id GUID               rawstring VARCHAR(4096)                        pub_add
                                                                                                                                                        author_id GUID
 id GUID                          category_id GUID                  xmlfile VARCHAR(512)                      publication_id GUID
                                                                                                                                                        address_id GUID
 text VARCHAR(512)                score DOUBLE                      pdffile VARCHAR(512)                      address_id GUID
                                                                                                                                                       Indexes
Indexes                           source VARCHAR(512)               topicfile VARCHAR(512)                 Indexes
                                 Indexes                            created BIGINT
                                                                    modified BIGINT
   eventseries                                                    Indexes
                                                                                                                                                                         address
 id GUID
                                                                                                                                                                    id GUID
 text VARCHAR(512)
                                                                                               pub_evt                                                              text VARCHAR(512)
 filepath VARCHAR(512)
                                                                                             publication_id GUID                                                    location_id GUID
Indexes
                                                 event                                       event_id GUID                                                        Indexes

                                              id GUID                                      Indexes
                                              text VARCHAR(512)                                                                     category_count               bib_coupling
            evt_evs                           filepath VARCHAR(512)
           event_id GUID                      predecessor_id GUID                            discipline_count                       concept_count                co_author
           eventseries_id GUID                successor_id GUID
      Indexes                              Indexes                                           evt_pub_aut_count                      keyword_count                co_citation
System components & Work flow



System components & Work flow




           How is our system structured?
                  → Some examples.




            PG knowAAN                                              12
System components & Work flow



Components
                                                      Model                 << component >>
                      << component >>
                          Backend                                            ParscitTrainer


                                   << component >>
    << component >>
                                        Parscit
       Clustering
                                                     WebServices                  << component >>
                                                                            FrontendReferenceExtraction


    << component >>                << component >>
          DB                       TrendDetection

                                                     WebServices            << component >>
                                                                              DocBrowser


    << component >>                << component >>
       Roundtrip                    TF-Component

                                                                     JDBC


    << component >>                << component >>                          << component >>
      PDFToText                                       JDBC
                                   TopicExtraction                             DataBase




    << component >>                << component >>                          << component >>
                                                       WebServices
    Recommendation                   xmlBuilder                                   Solr




                                                       FileSystem           << component >>
                                                                              FileStorage




                              PG knowAAN                                                                  13
DocumentBrowser:              RoundTrip :                  RoundTripExecutor :             PDFToText :            Parscit:       Languagedetection:       Lemmatizer:   NounExtraction:   Solr:   DB:

             a / 1) .addPDF


                                            a / 2) .writeToFS




                                            a / 2) Path


                                            a / 3) .createThread

                                              .submitThread


                                            a / 3)

                   a / 1)




                                                                           b / 1) .run

                                                                         b / 2) .getText


                                                                           b / 2) Text
                                                                                 b / 3) .ParseFullText


                                                                                    b / 3) ParscitXML




                                                                            b / 4) .extractBodyAndAstract




                                                                            b / 4) BodyAndAbstract

                                                                                              b / 5) .getLanguage


                                                                                             b / 5) LanguageString
                                                                                                            b / 6) .lemmatize


                                                                                                         b / 6) LemmatizedText

                                                                                                                    b / 7) .extractNouns


                                                                                                                      b / 7) NounsList
                                                                                                     b / 8) .lemmatizeNounslist


                                                                                                         b / 8) LemmatizedNouns




                                                                            b / 9) .ReduceToTopNouns




                                                                            b / 9) TopNouns


                                                                            b / 10) .writeToFiles




                                                                            b / 10) Paths
                                                                                                                                 b / 11) .addTexts


                                                                                                                                   b / 11) Solrid


                                                                                                                                     b / 12) .addPublication


                                                                                                                                              b / 12)


                                                                           b / 1)
System components & Work flow



Work flow




           PG knowAAN                            15
Analysis & Visualization



Analysis & Visualization




           Third step: Analyze and visualize data.




               PG knowAAN                                                 16
Analysis & Visualization



Analysis of authors




              PG knowAAN                        17
Analysis & Visualization



Analysis of scientific publications




              PG knowAAN                                  18
Demonstration



Demonstration




                            Now: Demo.
           Image: http://www.flickr.com/photos/plaisanter/5525977163/


             PG knowAAN                                                          19
Development process



Technologies




                            Jersey



               PG knowAAN                            20
Development process



Methods of agile software development



     FDD                  XP
                                        Scrum




             PG knowAAN                                  21
Development process



Methods of agile software development




    Weekly meetings
    Sit together (as much as possible)
    Automated building system
    Continuous integration
    Issue tracking


                PG knowAAN                               22
Summary and Outlook



Summary and future work

 Summary
     Integrated processing of scientific papers
     Aggregated visualization of authors, publications and
     events
     Compute various analysis over the data
     Cleaning functionality for automated processed data

 Future work
     Parallelized Clustering
     Additional graphical visualization
     Improve extraction of metadata from PDF files
                 PG knowAAN                                           23
Summary and Outlook



Thank you for your attention




                           Questions?

              PG knowAAN                                24

Weitere ähnliche Inhalte

Mehr von Wolfgang Reinhardt

Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...Wolfgang Reinhardt
 
PUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
PUSHPIN: Supporting Scholarly Awareness in Publications and Social NetworksPUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
PUSHPIN: Supporting Scholarly Awareness in Publications and Social NetworksWolfgang Reinhardt
 
Formalized Processes at EATEL (here: SIGs and EC-TEL)
Formalized Processes at EATEL (here: SIGs and EC-TEL)Formalized Processes at EATEL (here: SIGs and EC-TEL)
Formalized Processes at EATEL (here: SIGs and EC-TEL)Wolfgang Reinhardt
 
Developing electronic classroom response apps for a wide variety of mobile de...
Developing electronic classroom response apps for a wide variety of mobile de...Developing electronic classroom response apps for a wide variety of mobile de...
Developing electronic classroom response apps for a wide variety of mobile de...Wolfgang Reinhardt
 
Mobile access to scientific event information: An Android tablet application ...
Mobile access to scientific event information: An Android tablet application ...Mobile access to scientific event information: An Android tablet application ...
Mobile access to scientific event information: An Android tablet application ...Wolfgang Reinhardt
 
PINGO: Peer Instruction in Very Large Groups
PINGO: Peer Instruction in Very Large GroupsPINGO: Peer Instruction in Very Large Groups
PINGO: Peer Instruction in Very Large GroupsWolfgang Reinhardt
 
Understanding the meaning of awareness in Research Networks
Understanding the meaning of awareness in Research NetworksUnderstanding the meaning of awareness in Research Networks
Understanding the meaning of awareness in Research NetworksWolfgang Reinhardt
 
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...Wolfgang Reinhardt
 
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...Wolfgang Reinhardt
 
A widget-based dashboard approach for awareness and reflection in online lear...
A widget-based dashboard approach for awareness and reflection in online lear...A widget-based dashboard approach for awareness and reflection in online lear...
A widget-based dashboard approach for awareness and reflection in online lear...Wolfgang Reinhardt
 
Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...Wolfgang Reinhardt
 
TEL-MOOC workshop at #jtelss12
TEL-MOOC workshop at #jtelss12TEL-MOOC workshop at #jtelss12
TEL-MOOC workshop at #jtelss12Wolfgang Reinhardt
 
Research 2.0 - Wie Forscher das Web 2.0 nutzen
Research 2.0 - Wie Forscher das Web 2.0 nutzenResearch 2.0 - Wie Forscher das Web 2.0 nutzen
Research 2.0 - Wie Forscher das Web 2.0 nutzenWolfgang Reinhardt
 
PhD Defense - Awareness Support for Knowledge Workers in Research Networks
PhD Defense - Awareness Support for Knowledge Workers in Research NetworksPhD Defense - Awareness Support for Knowledge Workers in Research Networks
PhD Defense - Awareness Support for Knowledge Workers in Research NetworksWolfgang Reinhardt
 
Idea presentation for the project group PUSHPIN
Idea presentation for the project group PUSHPINIdea presentation for the project group PUSHPIN
Idea presentation for the project group PUSHPINWolfgang Reinhardt
 
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...Wolfgang Reinhardt
 
ViLM im Einsatz in Tutorenschulungen an der UPB
ViLM im Einsatz in Tutorenschulungen an der UPBViLM im Einsatz in Tutorenschulungen an der UPB
ViLM im Einsatz in Tutorenschulungen an der UPBWolfgang Reinhardt
 
Informationsqualität in Unternehmenswikis
Informationsqualität in UnternehmenswikisInformationsqualität in Unternehmenswikis
Informationsqualität in UnternehmenswikisWolfgang Reinhardt
 

Mehr von Wolfgang Reinhardt (20)

Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
 
PUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
PUSHPIN: Supporting Scholarly Awareness in Publications and Social NetworksPUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
PUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
 
Formalized Processes at EATEL (here: SIGs and EC-TEL)
Formalized Processes at EATEL (here: SIGs and EC-TEL)Formalized Processes at EATEL (here: SIGs and EC-TEL)
Formalized Processes at EATEL (here: SIGs and EC-TEL)
 
Developing electronic classroom response apps for a wide variety of mobile de...
Developing electronic classroom response apps for a wide variety of mobile de...Developing electronic classroom response apps for a wide variety of mobile de...
Developing electronic classroom response apps for a wide variety of mobile de...
 
Mobile access to scientific event information: An Android tablet application ...
Mobile access to scientific event information: An Android tablet application ...Mobile access to scientific event information: An Android tablet application ...
Mobile access to scientific event information: An Android tablet application ...
 
Analysis of mLearn 2002-2012
Analysis of mLearn 2002-2012Analysis of mLearn 2002-2012
Analysis of mLearn 2002-2012
 
PINGO: Peer Instruction in Very Large Groups
PINGO: Peer Instruction in Very Large GroupsPINGO: Peer Instruction in Very Large Groups
PINGO: Peer Instruction in Very Large Groups
 
Understanding the meaning of awareness in Research Networks
Understanding the meaning of awareness in Research NetworksUnderstanding the meaning of awareness in Research Networks
Understanding the meaning of awareness in Research Networks
 
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
 
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
 
A widget-based dashboard approach for awareness and reflection in online lear...
A widget-based dashboard approach for awareness and reflection in online lear...A widget-based dashboard approach for awareness and reflection in online lear...
A widget-based dashboard approach for awareness and reflection in online lear...
 
Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...
 
TEL-MOOC workshop at #jtelss12
TEL-MOOC workshop at #jtelss12TEL-MOOC workshop at #jtelss12
TEL-MOOC workshop at #jtelss12
 
Research 2.0 - Wie Forscher das Web 2.0 nutzen
Research 2.0 - Wie Forscher das Web 2.0 nutzenResearch 2.0 - Wie Forscher das Web 2.0 nutzen
Research 2.0 - Wie Forscher das Web 2.0 nutzen
 
FSLN12 Introduction Paderborn
FSLN12 Introduction PaderbornFSLN12 Introduction Paderborn
FSLN12 Introduction Paderborn
 
PhD Defense - Awareness Support for Knowledge Workers in Research Networks
PhD Defense - Awareness Support for Knowledge Workers in Research NetworksPhD Defense - Awareness Support for Knowledge Workers in Research Networks
PhD Defense - Awareness Support for Knowledge Workers in Research Networks
 
Idea presentation for the project group PUSHPIN
Idea presentation for the project group PUSHPINIdea presentation for the project group PUSHPIN
Idea presentation for the project group PUSHPIN
 
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
 
ViLM im Einsatz in Tutorenschulungen an der UPB
ViLM im Einsatz in Tutorenschulungen an der UPBViLM im Einsatz in Tutorenschulungen an der UPB
ViLM im Einsatz in Tutorenschulungen an der UPB
 
Informationsqualität in Unternehmenswikis
Informationsqualität in UnternehmenswikisInformationsqualität in Unternehmenswikis
Informationsqualität in Unternehmenswikis
 

Kürzlich hochgeladen

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Kürzlich hochgeladen (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Final presentation of the project group Knowledge Awareness in Artefact-Actor-Networks (knowAAN)

  • 1. Project group knowAAN Final presentation Computer Science Education Group University of Paderborn October 20th 2011
  • 2. Overview Overview Introduction System components & Work flow Demonstration Development process Summary & Outlook Time for further questions of detail PG knowAAN 2
  • 3. Overview Overview: First part Goals Extraction & Storage (of data) Exploration (of data) System components & Work flow Analysis & Visualization (of data) PG knowAAN 3
  • 4. Goals Goals Explore research networks Based on: Artifacts (scientific publications) and metadata Combination and analysis of data Computation of similarities of full texts Support for conference management system Ginkgo Data visualization Recommendations (Source: PG knowAAN project description) PG knowAAN 4
  • 5. Goals Imagine you are interested in a conference. You downloaded the papers of 2 or 3 years. Now you have nearly 100 publications. How do you explore them? 100 publications. Do you know tools? PG knowAAN 5
  • 6. Extraction & Storage Extraction & Storage First step: Extract data and store it. PG knowAAN 6
  • 8. Exploration Exploration Second step: Explore data. PG knowAAN 8
  • 10. Exploration Exploration Which extracted data is available for a publication? → Database schema PG knowAAN 10
  • 11. discipline pub_dis pub_aff affiliation id GUID publication_id GUID publication_id GUID id GUID text VARCHAR(512) discipline_id GUID affiliation_id GUID text VARCHAR(512) parent_id GUID Indexes Indexes location_id GUID aut_aff Indexes Indexes author_id GUID affiliation_id GUID Indexes pub_key publication keyword publication_id GUID id GUID id GUID keyword_id GUID lucuid VARCHAR(512) text VARCHAR(512) score DOUBLE title VARCHAR(512) author pub_aut Indexes source VARCHAR(512) booktitle VARCHAR(512) id GUID publication_id GUID Indexes normtitle VARCHAR(512) text VARCHAR(512) author_id GUID location date VARCHAR(512) normtext VARCHAR(512) Indexes id GUID pub_con editor VARCHAR(512) firstname VARCHAR(512) latitude DOUBLE concept publication_id GUID journal VARCHAR(512) lastname VARCHAR(512) longitude DOUBLE id GUID concept_id GUID note VARCHAR(512) citation created BIGINT text VARCHAR(512) text VARCHAR(512) score DOUBLE pages VARCHAR(512) publication1_id GUID modified BIGINT Indexes Indexes source VARCHAR(512) publisher VARCHAR(512) Indexes publication2_id GUID Indexes tech VARCHAR(512) Indexes volume VARCHAR(512) pub_cat number VARCHAR(512) aut_add category publication_id GUID rawstring VARCHAR(4096) pub_add author_id GUID id GUID category_id GUID xmlfile VARCHAR(512) publication_id GUID address_id GUID text VARCHAR(512) score DOUBLE pdffile VARCHAR(512) address_id GUID Indexes Indexes source VARCHAR(512) topicfile VARCHAR(512) Indexes Indexes created BIGINT modified BIGINT eventseries Indexes address id GUID id GUID text VARCHAR(512) pub_evt text VARCHAR(512) filepath VARCHAR(512) publication_id GUID location_id GUID Indexes event event_id GUID Indexes id GUID Indexes text VARCHAR(512) category_count bib_coupling evt_evs filepath VARCHAR(512) event_id GUID predecessor_id GUID discipline_count concept_count co_author eventseries_id GUID successor_id GUID Indexes Indexes evt_pub_aut_count keyword_count co_citation
  • 12. System components & Work flow System components & Work flow How is our system structured? → Some examples. PG knowAAN 12
  • 13. System components & Work flow Components Model << component >> << component >> Backend ParscitTrainer << component >> << component >> Parscit Clustering WebServices << component >> FrontendReferenceExtraction << component >> << component >> DB TrendDetection WebServices << component >> DocBrowser << component >> << component >> Roundtrip TF-Component JDBC << component >> << component >> << component >> PDFToText JDBC TopicExtraction DataBase << component >> << component >> << component >> WebServices Recommendation xmlBuilder Solr FileSystem << component >> FileStorage PG knowAAN 13
  • 14. DocumentBrowser: RoundTrip : RoundTripExecutor : PDFToText : Parscit: Languagedetection: Lemmatizer: NounExtraction: Solr: DB: a / 1) .addPDF a / 2) .writeToFS a / 2) Path a / 3) .createThread .submitThread a / 3) a / 1) b / 1) .run b / 2) .getText b / 2) Text b / 3) .ParseFullText b / 3) ParscitXML b / 4) .extractBodyAndAstract b / 4) BodyAndAbstract b / 5) .getLanguage b / 5) LanguageString b / 6) .lemmatize b / 6) LemmatizedText b / 7) .extractNouns b / 7) NounsList b / 8) .lemmatizeNounslist b / 8) LemmatizedNouns b / 9) .ReduceToTopNouns b / 9) TopNouns b / 10) .writeToFiles b / 10) Paths b / 11) .addTexts b / 11) Solrid b / 12) .addPublication b / 12) b / 1)
  • 15. System components & Work flow Work flow PG knowAAN 15
  • 16. Analysis & Visualization Analysis & Visualization Third step: Analyze and visualize data. PG knowAAN 16
  • 17. Analysis & Visualization Analysis of authors PG knowAAN 17
  • 18. Analysis & Visualization Analysis of scientific publications PG knowAAN 18
  • 19. Demonstration Demonstration Now: Demo. Image: http://www.flickr.com/photos/plaisanter/5525977163/ PG knowAAN 19
  • 20. Development process Technologies Jersey PG knowAAN 20
  • 21. Development process Methods of agile software development FDD XP Scrum PG knowAAN 21
  • 22. Development process Methods of agile software development Weekly meetings Sit together (as much as possible) Automated building system Continuous integration Issue tracking PG knowAAN 22
  • 23. Summary and Outlook Summary and future work Summary Integrated processing of scientific papers Aggregated visualization of authors, publications and events Compute various analysis over the data Cleaning functionality for automated processed data Future work Parallelized Clustering Additional graphical visualization Improve extraction of metadata from PDF files PG knowAAN 23
  • 24. Summary and Outlook Thank you for your attention Questions? PG knowAAN 24