SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Project group


PUSHPIN
  Supporting Scholarly Awareness
in Publications and Social Networks




          University of Paderborn
      Computer Science Education Group
            Wolfgang Reinhardt
CLASSIC RESEARCH
                     +
WEB 2.0 / SEMANTIC WEB / SOCIAL NETWORKS
                     +
  NEW METHODS AND METHODOLOGIES
                     =
       RESEARCH 2.0 & PG PUSHPIN


          Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
GOALS OF THE PROJECT
             GROUP
• Data     Mining in scientific publications

• Who’s     writing about what? Who’s writing with whom?

• Clustering    & similarity measures, Recommendations, Experts

• Connections      to Social Networking sites (ginkgo)

• visual   analytics, visualizations

• Extension     of the knowAAN architecture & analysis of large
 data sets
                      Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
RESULTS OF PG KNOWAAN

• Java-basedbackend that allows automatically analysis of
 publications (metadata extraction, text analysis, relations
 between publications a.m.m.)

• Clustering   and similarity detection

  • currently   first test with Hadoop & Mahout

• Rails-, JavaScript-, CSS-based            frontend for navigation

• Examples:
                    Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
CO-AUTHOR NETWORKS




    Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
LOCATION OF AUTHORS




    Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
WORD CLOUDS




 Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
BIBLIOMETRIC NETWORKS




     Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
20. OCTOBER 2011
      4:45PM
GINKGO
• conference   management tool + social network

• Goal:

  • checksubmitted publications for plagiarized content, topical
   and social connections

  • Recommendations          (users, events, publications)


                    http://ginkgosem.com
                   Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
GENERAL FRAMEWORK
PEOPLE


• Prof. Johannes   Magenheim

• Wolfgang   Reinhardt

• Tobias Varlemann
GOALS OF A PROJECT GROUP
• self-organization   to the greatest extent

• systematic    assignment of roles and responsibilities

• finding    and facilitate special talents

• process    oriented personnel placement like in industry

• regular   presentations of work progress

• creation   of interim and final reports

• working    on the edge of science
TIMEFRAME

• 18.10.2011    - 31.10.2012 (54 weeks)

• 30   ECTS = 900 hours of work (approx. 17h / week)

• Seminar    phase until January 2012

• Creativity   workshops in January

• Core   implementation phase from February 2012 onwards

  • agile   Development (4 milestones, 4 iterations per milestone)
REQUIREMENTS

• active   participation

  • check    UPB mails at least daily

  • good     communication skills,

  • team    work

• creativity     in design and implementation

• testing   ;)
TOOLS

• SVN    and Trac
                                         #pgpushpin
• Blog

• Twitter   (if you like)

• Mendeley     for exchange of research papers

• Delicious   for social bookmarks
SEMINAR PHASE
SEMINAR PHASE

• each   one of you works on one topic

  • theoretical   framework, applications, prototypes

• regular   meetings with supervisors

• regular   blogging at http://pgpushpin.wordpress.com

• presentation    in mid January 2012 (25 minutes plus discussion)

• article   due at end January 2012 (approx. 16-24 pages)
TOPICS FOR THE
SEMINAR PHASE
1.HTML5 and Javascript                9.Distributed computing with
  Frameworks                            Hadoop 2

2.Visual Analytics                    10.Developing Multitouch Table
                                        Applications
3.Agile Software Development in
  Small Teams                         11.Clustering of text documents

4.Trend detection and visualization   12.Plagiarism detection

5.Text processing                     13.Social Network Analysis

6.Metadata extraction from            14.Faceted Search User Interfaces
  research papers
                                      15.Browser-based visualization of
7.Text similarities                     large networks

8.Distributed computing with          16.Scientific recommender systems
  Hadoop 1
ALL TOPICS ARE FOCUSED
ON SCHOLARLY OUTPUT

 E.G. SCIENTIFIC PAPERS,
       RESEARCHER
   COLLABORATION
HTML5 AND JAVASCRIPT
         FRAMEWORKS
• development      of sustainable web applications (responsive
 design)

• current   and coming standards

  • web workers, local storage, WebGL, server-side JS, web
    sockets

• Visualizations, Word   Clouds, time-dependent course

• Javascript   frameworks for visualizations, graphs etc.
VISUAL ANALYTICS
• information    / scientific visualization that allow reasoning

• visual   analytics and their application to research

  • cartography     / geovisualization

  • flow     visualization

  • diagrammatic     reasoning

  • state   of the art and mockups for new developments

    • tools/frameworks      for realization (browser-based)
AGILE SOFTWARE DEVELOP.
        IN SMALL TEAMS
• agile
     software development and project management in small
 teams

• application   to the project group (roles and requirements)

• TDD, BDD, FDD

• Scrum, eXtreme     programming, Kanban

• Pair    Programming
TREND DETECTION AND
    VISUALIZATION & SEARCH
• trend   spotting and visualization & forecasting

• which   topics are gaining ground and which are on the decline

• which   networks are expanding, which are saturated

• ThemeRiver     - StreamGraph visualizations

• Custom    Search Applications (Solr and its extensions)

  • semantic   search, linked data approaches
TEXT PROCESSING

• PDF   text extraction (get rid of headers and footers)

• Part-of-speech   detection, lemmatizing text, stemming

• classification, topic   extraction and knowledge discovery
 (untrained)

• LDA   from Mahout

• usageof Apache OpenNLP & Apache Mahout for
 prototypes
METADATA EXTRACTION
      FROM RESEARCH PAPERS
• How   to best extract metadata from research papers?

• Parscit   and others (?)

• Conditional    Random Fields -- CRF++            good
• Support Vector    Machines                    mathematical
                                                 knowledge
• Selected   information is relevant only         needed
• extract   geo locations from papers
TEXT SIMILARITIES

• Vector   Space Model & Term Document Matrix

• LSA   / LSI with SVD

• methods   for calculation text-based similarities

 • possibility   for live calculations

 • temporary      files

• usage    of Apache Mahout for prototypes
DISTRIBUTED COMPUTING
       WITH HADOOP 1
• MapReduce

• Hadoop

• HBase

• HDFS

• usage   of Apache Mahout for prototypes
DISTRIBUTED COMPUTING
        WITH HADOOP 2
• MapReduce

• Hadoop

• Hive   Data Warehousing

• Job   Orchestration (e.g. with Zookeeper)

• Pig   Data Flow

• usage    of Apache Mahout for prototypes
DEVELOPING MULTITOUCH
     TABLE APPLICATIONS
• http://www.youtube.com/watch?v=f1X5ffRrde8

• C#   and .Net 4.0, Visual Studio 2010

• WPF     and Surface SDK

• Fiducials

• buildsimulation, mockups of possible applications, state-of-the-
 art presentation

• http://www.microsoft.com/silverlight/pivotviewer/
CLUSTERING OF TEXT
             DOCUMENTS
• Methods   for analyzing large collections of texts

 • k-means, single-link, full-link, canopy

 • visualization   opportunities

• how   to add documents to a large clustering

• usage   of Apache Mahout for prototypes
PLAGIARISM DETECTION

• How    to detect potentially plagiarized content?

• Ethical   discussion on (self-)plagiarism

• text   breakdown in elements (sections, paragraphs, sentences)

• n-grams

• internal   and external plagiarism detection
SOCIAL NETWORK ANALYSIS

• Social   Network Theory

• measures     from SNA

  • existing   examples of research applications

• bibliometrics   and scientometrics

• take   real conference series as example
FACETED SEARCH &
                INTERFACE EVAL
• Best   practices and design recommendations

• frameworks     for development

• enclosure    / APIs

• only   work on JSON data & no direct DB access

• Java   / ASP .Net / SEAM ....

• own    prototype
BROWSER-BASED VISUALIZ.
      OF LARGE NETWORKS
• level   of detail

• WebGL, web          workers

• Gephi

• visualize   properties

• allow   faceted search

• should    be working on tablets
SCIENTIFIC RECOMMENDER
            SYSTEMS
• state   of the art

  • item-based
             and collaborative filtering / hybrid
    recommenders

• algorithms, visualizations

• existing   applications in research

• usage      of Apache Mahout for prototypes
NEXT STEPS
NEXT STEPS
• vote   for three topics until Wednesday, 8pm

  • mail   with favorite topic, 2nd and 3rd place

  • decision   on Friday

• create Wordpress, Delicious    and Mendeley account

• finalpresentation of PG knowAAN this Thursday, 4.45pm
 in F0.231

• first   meetings with supervisors next two weeks
wolfgang reinhardt  university of paderborn



                                                social media               sna
twitter        recommendations
                                                 awareness
research networks
                                                  bibliometrics
  artefact-actor-networks
                                                                     ginkgo
                            research 2.0
                 www.isitjustme.de        www.ginkgosem.com
  @wollepb                           @wollepb                  @wolfgang.reinhardt
  @wollepb                           @wollepb                  @wolfgang.reinhardt
  @wollepb                           @wollepb                  @wollepb

Weitere ähnliche Inhalte

Ähnlich wie 1st meeting of PG PUSHPIN

Some technical hurdles towards open science
Some technical hurdles towards open scienceSome technical hurdles towards open science
Some technical hurdles towards open scienceBjörn Brembs
 
Expert Finding and Visualisation in a Personal Learning Environment
Expert Finding and Visualisation in a Personal Learning EnvironmentExpert Finding and Visualisation in a Personal Learning Environment
Expert Finding and Visualisation in a Personal Learning EnvironmentWolfgang Reinhardt
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Dataaba-sah
 
Hydra Project Management Survey
Hydra Project Management SurveyHydra Project Management Survey
Hydra Project Management SurveyMark Notess
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and TechniquesBernhard Haslhofer
 
Hydra - Content Processing Framework for Search Driven Solutions
Hydra - Content Processing Framework for Search Driven SolutionsHydra - Content Processing Framework for Search Driven Solutions
Hydra - Content Processing Framework for Search Driven SolutionsFindwise
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist ToolboxAndrei Savu
 
Exposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway ProtocolExposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway ProtocolElectronic Resources & Libraries
 
Data visualisation in python tool - a brief
Data visualisation in python tool - a briefData visualisation in python tool - a brief
Data visualisation in python tool - a briefameermalik11
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!gagravarr
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlySarah Guido
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache StanbolAlkuvoima
 
Big_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundBig_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundNidhiAhuja30
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptxShree Shree
 
Open hpi semweb-06-part2
Open hpi semweb-06-part2Open hpi semweb-06-part2
Open hpi semweb-06-part2Nadine Ludwig
 
SC11 Science Gateway Group Overview
SC11 Science Gateway Group OverviewSC11 Science Gateway Group Overview
SC11 Science Gateway Group Overviewmarpierc
 

Ähnlich wie 1st meeting of PG PUSHPIN (20)

Some technical hurdles towards open science
Some technical hurdles towards open scienceSome technical hurdles towards open science
Some technical hurdles towards open science
 
Szomszor "Methods and Tools for Scholarly Data Analytics"
Szomszor "Methods and Tools for Scholarly Data Analytics"Szomszor "Methods and Tools for Scholarly Data Analytics"
Szomszor "Methods and Tools for Scholarly Data Analytics"
 
Expert Finding and Visualisation in a Personal Learning Environment
Expert Finding and Visualisation in a Personal Learning EnvironmentExpert Finding and Visualisation in a Personal Learning Environment
Expert Finding and Visualisation in a Personal Learning Environment
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Data
 
Hydra Project Management Survey
Hydra Project Management SurveyHydra Project Management Survey
Hydra Project Management Survey
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
 
Hydra - Content Processing Framework for Search Driven Solutions
Hydra - Content Processing Framework for Search Driven SolutionsHydra - Content Processing Framework for Search Driven Solutions
Hydra - Content Processing Framework for Search Driven Solutions
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
Exposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway ProtocolExposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway Protocol
 
Data visualisation in python tool - a brief
Data visualisation in python tool - a briefData visualisation in python tool - a brief
Data visualisation in python tool - a brief
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
Cassandra eu
Cassandra euCassandra eu
Cassandra eu
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
Big_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundBig_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic background
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
 
Open hpi semweb-06-part2
Open hpi semweb-06-part2Open hpi semweb-06-part2
Open hpi semweb-06-part2
 
SC11 Science Gateway Group Overview
SC11 Science Gateway Group OverviewSC11 Science Gateway Group Overview
SC11 Science Gateway Group Overview
 

Mehr von Wolfgang Reinhardt

Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...Wolfgang Reinhardt
 
PUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
PUSHPIN: Supporting Scholarly Awareness in Publications and Social NetworksPUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
PUSHPIN: Supporting Scholarly Awareness in Publications and Social NetworksWolfgang Reinhardt
 
Formalized Processes at EATEL (here: SIGs and EC-TEL)
Formalized Processes at EATEL (here: SIGs and EC-TEL)Formalized Processes at EATEL (here: SIGs and EC-TEL)
Formalized Processes at EATEL (here: SIGs and EC-TEL)Wolfgang Reinhardt
 
Developing electronic classroom response apps for a wide variety of mobile de...
Developing electronic classroom response apps for a wide variety of mobile de...Developing electronic classroom response apps for a wide variety of mobile de...
Developing electronic classroom response apps for a wide variety of mobile de...Wolfgang Reinhardt
 
Mobile access to scientific event information: An Android tablet application ...
Mobile access to scientific event information: An Android tablet application ...Mobile access to scientific event information: An Android tablet application ...
Mobile access to scientific event information: An Android tablet application ...Wolfgang Reinhardt
 
PINGO: Peer Instruction in Very Large Groups
PINGO: Peer Instruction in Very Large GroupsPINGO: Peer Instruction in Very Large Groups
PINGO: Peer Instruction in Very Large GroupsWolfgang Reinhardt
 
Understanding the meaning of awareness in Research Networks
Understanding the meaning of awareness in Research NetworksUnderstanding the meaning of awareness in Research Networks
Understanding the meaning of awareness in Research NetworksWolfgang Reinhardt
 
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...Wolfgang Reinhardt
 
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...Wolfgang Reinhardt
 
A widget-based dashboard approach for awareness and reflection in online lear...
A widget-based dashboard approach for awareness and reflection in online lear...A widget-based dashboard approach for awareness and reflection in online lear...
A widget-based dashboard approach for awareness and reflection in online lear...Wolfgang Reinhardt
 
Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...Wolfgang Reinhardt
 
TEL-MOOC workshop at #jtelss12
TEL-MOOC workshop at #jtelss12TEL-MOOC workshop at #jtelss12
TEL-MOOC workshop at #jtelss12Wolfgang Reinhardt
 
Research 2.0 - Wie Forscher das Web 2.0 nutzen
Research 2.0 - Wie Forscher das Web 2.0 nutzenResearch 2.0 - Wie Forscher das Web 2.0 nutzen
Research 2.0 - Wie Forscher das Web 2.0 nutzenWolfgang Reinhardt
 
PhD Defense - Awareness Support for Knowledge Workers in Research Networks
PhD Defense - Awareness Support for Knowledge Workers in Research NetworksPhD Defense - Awareness Support for Knowledge Workers in Research Networks
PhD Defense - Awareness Support for Knowledge Workers in Research NetworksWolfgang Reinhardt
 
Idea presentation for the project group PUSHPIN
Idea presentation for the project group PUSHPINIdea presentation for the project group PUSHPIN
Idea presentation for the project group PUSHPINWolfgang Reinhardt
 
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...Wolfgang Reinhardt
 
ViLM im Einsatz in Tutorenschulungen an der UPB
ViLM im Einsatz in Tutorenschulungen an der UPBViLM im Einsatz in Tutorenschulungen an der UPB
ViLM im Einsatz in Tutorenschulungen an der UPBWolfgang Reinhardt
 
Informationsqualität in Unternehmenswikis
Informationsqualität in UnternehmenswikisInformationsqualität in Unternehmenswikis
Informationsqualität in UnternehmenswikisWolfgang Reinhardt
 

Mehr von Wolfgang Reinhardt (20)

Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
Studentische Softwareentwicklung - Warum es keine Alternative zu agilen Metho...
 
PUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
PUSHPIN: Supporting Scholarly Awareness in Publications and Social NetworksPUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
PUSHPIN: Supporting Scholarly Awareness in Publications and Social Networks
 
Formalized Processes at EATEL (here: SIGs and EC-TEL)
Formalized Processes at EATEL (here: SIGs and EC-TEL)Formalized Processes at EATEL (here: SIGs and EC-TEL)
Formalized Processes at EATEL (here: SIGs and EC-TEL)
 
Developing electronic classroom response apps for a wide variety of mobile de...
Developing electronic classroom response apps for a wide variety of mobile de...Developing electronic classroom response apps for a wide variety of mobile de...
Developing electronic classroom response apps for a wide variety of mobile de...
 
Mobile access to scientific event information: An Android tablet application ...
Mobile access to scientific event information: An Android tablet application ...Mobile access to scientific event information: An Android tablet application ...
Mobile access to scientific event information: An Android tablet application ...
 
Analysis of mLearn 2002-2012
Analysis of mLearn 2002-2012Analysis of mLearn 2002-2012
Analysis of mLearn 2002-2012
 
PINGO: Peer Instruction in Very Large Groups
PINGO: Peer Instruction in Very Large GroupsPINGO: Peer Instruction in Very Large Groups
PINGO: Peer Instruction in Very Large Groups
 
Understanding the meaning of awareness in Research Networks
Understanding the meaning of awareness in Research NetworksUnderstanding the meaning of awareness in Research Networks
Understanding the meaning of awareness in Research Networks
 
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
Supporting Scholarly Awareness and Researchers’ Social Interactions using PUS...
 
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
Exploration wissenschaftlicher Netzwerke und Publikationen mittels einer Mult...
 
A widget-based dashboard approach for awareness and reflection in online lear...
A widget-based dashboard approach for awareness and reflection in online lear...A widget-based dashboard approach for awareness and reflection in online lear...
A widget-based dashboard approach for awareness and reflection in online lear...
 
Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...Personal dashboards for individual learning and project awareness in social s...
Personal dashboards for individual learning and project awareness in social s...
 
TEL-MOOC workshop at #jtelss12
TEL-MOOC workshop at #jtelss12TEL-MOOC workshop at #jtelss12
TEL-MOOC workshop at #jtelss12
 
Research 2.0 - Wie Forscher das Web 2.0 nutzen
Research 2.0 - Wie Forscher das Web 2.0 nutzenResearch 2.0 - Wie Forscher das Web 2.0 nutzen
Research 2.0 - Wie Forscher das Web 2.0 nutzen
 
FSLN12 Introduction Paderborn
FSLN12 Introduction PaderbornFSLN12 Introduction Paderborn
FSLN12 Introduction Paderborn
 
PhD Defense - Awareness Support for Knowledge Workers in Research Networks
PhD Defense - Awareness Support for Knowledge Workers in Research NetworksPhD Defense - Awareness Support for Knowledge Workers in Research Networks
PhD Defense - Awareness Support for Knowledge Workers in Research Networks
 
Idea presentation for the project group PUSHPIN
Idea presentation for the project group PUSHPINIdea presentation for the project group PUSHPIN
Idea presentation for the project group PUSHPIN
 
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
Awareness Support for Knowledge Workers in Research Networks - Very brief PhD...
 
ViLM im Einsatz in Tutorenschulungen an der UPB
ViLM im Einsatz in Tutorenschulungen an der UPBViLM im Einsatz in Tutorenschulungen an der UPB
ViLM im Einsatz in Tutorenschulungen an der UPB
 
Informationsqualität in Unternehmenswikis
Informationsqualität in UnternehmenswikisInformationsqualität in Unternehmenswikis
Informationsqualität in Unternehmenswikis
 

Kürzlich hochgeladen

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 

Kürzlich hochgeladen (20)

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 

1st meeting of PG PUSHPIN

  • 1. Project group PUSHPIN Supporting Scholarly Awareness in Publications and Social Networks University of Paderborn Computer Science Education Group Wolfgang Reinhardt
  • 2. CLASSIC RESEARCH + WEB 2.0 / SEMANTIC WEB / SOCIAL NETWORKS + NEW METHODS AND METHODOLOGIES = RESEARCH 2.0 & PG PUSHPIN Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  • 3. GOALS OF THE PROJECT GROUP • Data Mining in scientific publications • Who’s writing about what? Who’s writing with whom? • Clustering & similarity measures, Recommendations, Experts • Connections to Social Networking sites (ginkgo) • visual analytics, visualizations • Extension of the knowAAN architecture & analysis of large data sets Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  • 4. RESULTS OF PG KNOWAAN • Java-basedbackend that allows automatically analysis of publications (metadata extraction, text analysis, relations between publications a.m.m.) • Clustering and similarity detection • currently first test with Hadoop & Mahout • Rails-, JavaScript-, CSS-based frontend for navigation • Examples: Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  • 5. CO-AUTHOR NETWORKS Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  • 6. LOCATION OF AUTHORS Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  • 7. WORD CLOUDS Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  • 8. BIBLIOMETRIC NETWORKS Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  • 10. GINKGO • conference management tool + social network • Goal: • checksubmitted publications for plagiarized content, topical and social connections • Recommendations (users, events, publications) http://ginkgosem.com Wolfgang Reinhardt - wolle@upb.de - Universität Paderborn
  • 12. PEOPLE • Prof. Johannes Magenheim • Wolfgang Reinhardt • Tobias Varlemann
  • 13. GOALS OF A PROJECT GROUP • self-organization to the greatest extent • systematic assignment of roles and responsibilities • finding and facilitate special talents • process oriented personnel placement like in industry • regular presentations of work progress • creation of interim and final reports • working on the edge of science
  • 14. TIMEFRAME • 18.10.2011 - 31.10.2012 (54 weeks) • 30 ECTS = 900 hours of work (approx. 17h / week) • Seminar phase until January 2012 • Creativity workshops in January • Core implementation phase from February 2012 onwards • agile Development (4 milestones, 4 iterations per milestone)
  • 15. REQUIREMENTS • active participation • check UPB mails at least daily • good communication skills, • team work • creativity in design and implementation • testing ;)
  • 16. TOOLS • SVN and Trac #pgpushpin • Blog • Twitter (if you like) • Mendeley for exchange of research papers • Delicious for social bookmarks
  • 18. SEMINAR PHASE • each one of you works on one topic • theoretical framework, applications, prototypes • regular meetings with supervisors • regular blogging at http://pgpushpin.wordpress.com • presentation in mid January 2012 (25 minutes plus discussion) • article due at end January 2012 (approx. 16-24 pages)
  • 20. 1.HTML5 and Javascript 9.Distributed computing with Frameworks Hadoop 2 2.Visual Analytics 10.Developing Multitouch Table Applications 3.Agile Software Development in Small Teams 11.Clustering of text documents 4.Trend detection and visualization 12.Plagiarism detection 5.Text processing 13.Social Network Analysis 6.Metadata extraction from 14.Faceted Search User Interfaces research papers 15.Browser-based visualization of 7.Text similarities large networks 8.Distributed computing with 16.Scientific recommender systems Hadoop 1
  • 21. ALL TOPICS ARE FOCUSED ON SCHOLARLY OUTPUT E.G. SCIENTIFIC PAPERS, RESEARCHER COLLABORATION
  • 22. HTML5 AND JAVASCRIPT FRAMEWORKS • development of sustainable web applications (responsive design) • current and coming standards • web workers, local storage, WebGL, server-side JS, web sockets • Visualizations, Word Clouds, time-dependent course • Javascript frameworks for visualizations, graphs etc.
  • 23. VISUAL ANALYTICS • information / scientific visualization that allow reasoning • visual analytics and their application to research • cartography / geovisualization • flow visualization • diagrammatic reasoning • state of the art and mockups for new developments • tools/frameworks for realization (browser-based)
  • 24. AGILE SOFTWARE DEVELOP. IN SMALL TEAMS • agile software development and project management in small teams • application to the project group (roles and requirements) • TDD, BDD, FDD • Scrum, eXtreme programming, Kanban • Pair Programming
  • 25. TREND DETECTION AND VISUALIZATION & SEARCH • trend spotting and visualization & forecasting • which topics are gaining ground and which are on the decline • which networks are expanding, which are saturated • ThemeRiver - StreamGraph visualizations • Custom Search Applications (Solr and its extensions) • semantic search, linked data approaches
  • 26. TEXT PROCESSING • PDF text extraction (get rid of headers and footers) • Part-of-speech detection, lemmatizing text, stemming • classification, topic extraction and knowledge discovery (untrained) • LDA from Mahout • usageof Apache OpenNLP & Apache Mahout for prototypes
  • 27. METADATA EXTRACTION FROM RESEARCH PAPERS • How to best extract metadata from research papers? • Parscit and others (?) • Conditional Random Fields -- CRF++ good • Support Vector Machines mathematical knowledge • Selected information is relevant only needed • extract geo locations from papers
  • 28. TEXT SIMILARITIES • Vector Space Model & Term Document Matrix • LSA / LSI with SVD • methods for calculation text-based similarities • possibility for live calculations • temporary files • usage of Apache Mahout for prototypes
  • 29. DISTRIBUTED COMPUTING WITH HADOOP 1 • MapReduce • Hadoop • HBase • HDFS • usage of Apache Mahout for prototypes
  • 30. DISTRIBUTED COMPUTING WITH HADOOP 2 • MapReduce • Hadoop • Hive Data Warehousing • Job Orchestration (e.g. with Zookeeper) • Pig Data Flow • usage of Apache Mahout for prototypes
  • 31. DEVELOPING MULTITOUCH TABLE APPLICATIONS • http://www.youtube.com/watch?v=f1X5ffRrde8 • C# and .Net 4.0, Visual Studio 2010 • WPF and Surface SDK • Fiducials • buildsimulation, mockups of possible applications, state-of-the- art presentation • http://www.microsoft.com/silverlight/pivotviewer/
  • 32. CLUSTERING OF TEXT DOCUMENTS • Methods for analyzing large collections of texts • k-means, single-link, full-link, canopy • visualization opportunities • how to add documents to a large clustering • usage of Apache Mahout for prototypes
  • 33. PLAGIARISM DETECTION • How to detect potentially plagiarized content? • Ethical discussion on (self-)plagiarism • text breakdown in elements (sections, paragraphs, sentences) • n-grams • internal and external plagiarism detection
  • 34. SOCIAL NETWORK ANALYSIS • Social Network Theory • measures from SNA • existing examples of research applications • bibliometrics and scientometrics • take real conference series as example
  • 35. FACETED SEARCH & INTERFACE EVAL • Best practices and design recommendations • frameworks for development • enclosure / APIs • only work on JSON data & no direct DB access • Java / ASP .Net / SEAM .... • own prototype
  • 36. BROWSER-BASED VISUALIZ. OF LARGE NETWORKS • level of detail • WebGL, web workers • Gephi • visualize properties • allow faceted search • should be working on tablets
  • 37. SCIENTIFIC RECOMMENDER SYSTEMS • state of the art • item-based and collaborative filtering / hybrid recommenders • algorithms, visualizations • existing applications in research • usage of Apache Mahout for prototypes
  • 39. NEXT STEPS • vote for three topics until Wednesday, 8pm • mail with favorite topic, 2nd and 3rd place • decision on Friday • create Wordpress, Delicious and Mendeley account • finalpresentation of PG knowAAN this Thursday, 4.45pm in F0.231 • first meetings with supervisors next two weeks
  • 40. wolfgang reinhardt university of paderborn social media sna twitter recommendations awareness research networks bibliometrics artefact-actor-networks ginkgo research 2.0 www.isitjustme.de www.ginkgosem.com @wollepb @wollepb @wolfgang.reinhardt @wollepb @wollepb @wolfgang.reinhardt @wollepb @wollepb @wollepb