SlideShare ist ein Scribd-Unternehmen logo
1 von 45
A Collaboration Recommender
      Based on Linked Open Data
    Conforming to the VIVO Ontology
Anup Sawant, Hugh J. Devlin, Noshir Contractor (Northwestern)
        Brandyn J. Kusenda, David Eichmann (Iowa)




                                   VIVO 2012 Miami, Florida USA



     This research was supported by grants from the following grants: National Science Foundation grants
            CNS-1010904, OCI-0904356, IIS-0838564, UL1RR024146-06S2 and NIH CTSA awards
                                  UL1RR025741, 5UL1RR025741-04S3
SONIC C-IKNOW VIVO Recommender
            Outline
• Motivation & Project overview
• MTML collaboration recommendation heuristics
• Report on our practical experience in building
  collaboration recommender systems
• Importance of relational data in recommending
  collaborations, citation in particular
• Recommendations, Future Work, Questions,
  Comments, Suggestions
• Acknowledge Contributors, Collaborators, and Tools
Ascendance of Teams
Studies of 19.9 million research articles over 5 decades as recorded in the Web of Science
database, and an additional 2.1 million patent records from 1975-2005 found three important
facts.

1. For virtually all fields, research is increasingly done in teams

2. Teams typically produce more highly cited research than individuals do (accounting for
self-citations), and this team advantage is increasing over time.

3. Teams now produce the exceptionally high impact research, even where that distinction
was once the domain of solo authors.




                 Sources: Wuchty, Jones, and Uzzi, 2007a, 2007b
Ascendance of Virtual Teams
The trend toward virtual communities was not driven by a growth in teamwork by
scientists working with other co-located scientists. Using the Web of Science
database to analyze the collaboration arrangements of over 4,000,000 papers over
a 30 year period, Jones, Wuchty, Uzzi found that:

1. Team science is increasingly composed of co-authors located at different
   universities.

2. These “virtual communities of scholars” produce higher impact work than
   comparable co-located teams or solo scientists.

3. This change is true for all fields and team sizes, as well as for research done at
   elite universities




                     Source: Jones, Wuchty, Uzzi (2008)
Findings for all proposal collaborations
          Explaining Proposal Collaboration Relation (p*/ERGM results)
                                                                 Full model
                               Effects
                                                                  (N=2,186)
Control Isolates (single author)                                    5.447*
Control Edge (proposal collaboration relation)                      -6.751*
          Weighted degree (negative measure
Control                                                             4.623*
          of preferential attachment)
H1        Gender (Female)                                            0.021
H2        Tenure (Years since PhD)                                  0.002*
H3        Institution Tier (Top 10% universities)                   -0.098*
H4        H-index                                                   -0.014*           Researchers are more likely to
                                                                                      have better familiarity of and
H5        Co-authorship                                             2.431*            collaborate again with those they
                                                                                      share a collaboration history (co-
H6        Citation relation                                         1.132*            authorship) or with those they cite
          * Indicates p<0.05
            Lungeanu, Huang, Contractor (2012) “A network perspective on success in collaboration: Stop
            citing me for our own good?”, Academy of Management                                                       5
SONIC C-IKNOW VIVO Recommender
           Project Goals
• Port the SONIC collaboration recommendation heuristics
  to VIVO
• Gain practical experience in building systems that use
   – Linked Open Data (LOD)
   – SPARQL query language
• Cross-institutional recommending
   – Generalize the SONIC collaboration recommendation prototype from a
     single institution (Northwestern) to multiple institutions
   – Explore use of distributed, federated queries
• Technology adoption study of the utilization and impact
  of our social-science grounded recommendation
  heuristics
WHY DO WE
CREATE,
MAINTAIN,
DISSOLVE, AND
RECONSTITUTE OUR
COMMUNICATION AND
KNOWLEDGE NETWORKS?
Social Drivers:
               Why do we create and sustain networks?

         • Theories of self-                           •    Theories of contagion
           interest                                    •    Theories of balance
         • Theories of social and                      •    Theories of homophily
           resource exchange                           •    Theories of proximity
         • Theories of mutual
           interest and collective
           action
Contractor, N. S., Wasserman, S. & Faust, K. (2006). Testing multi-theoretical multilevel hypotheses
 about organizational networks: An analytic framework and empirical example. Academy of
 Management Review
Multi-theoretical, Multi-level (MTML)
  Collaboration Recommendation Heuristics
Heuristic                Social theory          Relations                Metric
Affiliation              proximity              affiliation              neighbor
                                                coauthorship             neighbor
Cocitation               mutual interest        cocitation               neighbor
                                                coauthorship             neighbor
Most Qualified           self-interest          citation                 h-index
                                                authorship
Friend of a friend       balance                coauthorship             distance
                                                                         count of geodesics
Social Exchange          reciprocity            citation                 dyadic in-degree
Follow the crowd         contagion              coauthorhip + citation   centrality
                                                coauthorship             distance
Birds of a feather       homophily              (attributes)             count
Mobilizing               collective action      coauthorhip + citation   shortest path
                                                                         betweenness
Feeling lucky            probablistic model      coauthorship             p*/ERGM
                                                 citation
                     Monge, P. R. and N. S. Contractor (2003)
                     Theories of communication networks NY:
                     Oxford University Press
Affiliation Heuristic
• The ‘Affiliation’ score is proportional to the
  number of experts present in same
  department as the seeker but haven’t done
  any collaboration in the past with the seeker
• A form of proximity theory – social relations
  are (at least in part) opportunistic

   “We both work in the same department so we might want to
    collaborate in future.”
   Example - Works in the same department (Entomology and
    Nematology) but never coauthored.
Affiliation Recommendation

             Qualified
              Experts




 Coauthors                 Same
                         Affiliation
Co-Citation Heuristic
• The Co-Citation score is proportional to the
  number of times the seeker is co-cited with an
  identified expert
• A Cognitive metric: 3rd party rating of similarity
• Mutual interest theory
  – Sherif, M. (1958) "Superordinate Goals in the Reduction of Intergroup
     Conflict."
   “I have been co-cited with a qualified person quite a few times so
    I might want to collaborate with him in future.”
   Example – Co-cited with you 3 times.
                  (specifically disallows previous co-authors)
Co-citation Recommendation

              Qualified
               Experts




  Coauthors               Co-cited
H-index
•   A scientist has index ‘h’ if ‘h’ of his/her N papers referencing the query term have
    at least h citations each, and the other (N-h) papers have no more than h citations
    each.




                                                                                        (image source: Wikipedia)




        Hirsch, J. E. (2005) An index to quantify an individual's scientific research output
Qualified H-index
• A scientist has a “qualified h index,” that is, an h-index qualified by a given
  concept, based on the number of their publications which are associated
  with that concept as a keyword
Most Qualified Heuristic
• The ‘Most Qualified’ score is proportional to
  the expert’s “Qualified h-index”
• Self interest theory
  – Simon, Herbert (1957). "A Behavioral Model of Rational Choice“
  – MacDonald, C. and Ounis, I. (2006) “Voting for candidates: adapting data fusion
    techniques for an expert search task ”


   “I like to work with someone who is most useful to me and seems to have a lot of
    expertise to offer.”
   Example – 2 of all of this expert’s articles including the query term have been cited
    at least 2 number of times.
VIVO Ontology
       Representation of Concepts

• Research Areas (associated with researchers)
• Subject Areas (associated with articles)
• Free Text Keywords (associated with articles)
Friend of a Friend Heuristic
• The ‘Friend of a Friend’ score is proportional to the
  number of distinct paths through which the expert is
  indirectly connected to the seeker, and favors experts
  close to the seeker in the collaboration network.
• Balance theory, AKA “closing open triangles”
   – Monge, P. R. and N. S. Contractor (2003). Theories of communication
     networks.


   “I like to work with someone I have not previously worked with. If I give
    our mutual friend as a reference, they’re more likely to accept.”
   Example - Connected indirectly through Hoy,Marjorie Ann via Co-
    authorship network.
Friend of a Friend …
• Network: (global) Collaboration
  – (scalar) Expert attributes
     • Path length: distance d from seeker u to expert e
     • Number of geodesics n from seeker u to expert e


                                     nsp (u, e)
             fobj (u, e)                     2
                                    d (u, e)
         (specifically disallows previous co-authors)
Social Exchange Heuristic
• The ‘Social Exchange’ score is proportional to
  the number of articles c authored by the
  expert e which cite the seeker u
• Reciprocity theory
  – Blau, P. M. (2006) Exchange and power in social life.
                     fobj (u, e) c(u, e)

   “I’ve helped them in the past, so they’re more likely to help me
    now.”
   Example – Cited your work in 3 articles.
Follow the Crowd Heurustic
• The ‘Follow the Crowd’ score is proportional to
  the expert’s overall popularity in terms of
  collaboration and being cited, and favors experts
  close to the seeker in the collaboration network.
• Contagion theory
  – Krackhardt, D. and Brass, D. J. (1994) Intraorganizational networks: the micro
    side.
  – Krackhardt, D. M. (1986) Cognitive social structures.

   “They seem to be the most qualified person since many others are
    working with them.”
   Example - Co-authored or cited by 5 people and is within 3 step(s)
    from you via Co-authorship and Citation Network.
Follow the Crowd …
                          deg in (e)
            fobj (u, e)
                          d (u, e)
• inDeg: Expert’s in-degree in the combined
  network (Collaboration + citation)
• d: distance from seeker u to expert e in the
  collaboration network if connected, max(d)
  otherwise
Birds of a Feather Heuristic
• The ‘Birds of a Feather’ score is proportional to the (weighted w)
  number of attributes a shared between the seeker u and the
  expert e, such as moniker (title), department, grad school and
  major field of study
• Homophily theory
   –   Foucault Welles, B., A. Van Devender, et al. (2010) Is a “Friend” a Friend? Investigating the Structure
       of Friendship Networks in Virtual Worlds
• No network measures

                          fobj (u , e)                  wk ak (u , e)
                                                   k
    “I find it easier to communicate with someone who has things in common
     with me.”
    Example - Shares one or more of the following attributes : moniker, work
     department, grad school and major field of study.
Mobilizing Heuristic
• The ‘Mobilizing’ score favors experts who are brokers and close to
  the seeker in the union of the collaboration and citation networks.
• Theory of Collection Action
   – Coleman, J. S. (1966) "Individual interests and collective action.“
   – Laumann, E. O. and F. U. Pappi (1976) Networks of collective action

       “He seems to be connected to lots of qualified experts and can help me make
        more useful connections.”
       Example – Qualified expert who is a broker among other experts.

                                          inDeg(e)             bet (e)
                          fobj (u, e)
                                          outDeg(e)            d (u, e)
   –    fobj(u,e) : Objective function of user u and expert e
   –    inDeg(e) : in-degree of expert in union of the Collaboration and Citation networks.
   –    outDeg(e) : out-degree of expert in union of the Collaboration and Citation networks.
   –    d(u,e) : seeker to expert distance in union of the Collaboration and Citation networks.
   –    bet(e) expert’s betweenness centrality in union of the Collaboration and Citation networks, see
       Wasserman, S. and K. Faust (1995) Social Network Analysis: Methods and Applications
Feeling Lucky Heuristic
• The ‘Feeling Lucky’ is an estimate of the probability of
  collaboration using a p*/Exponential Random Graph
  Model (ERGM) model of scientific team formation.
• A Probabilistic Model of relationship formation
   – Wasserman, S. and G. Robins (2003) An introduction to random graphs,
     dependence graphs, and p*
• Factors effecting probability
   – In-Degree Centrality of expert in the union of Collaboration and Citation networks
   – Publication count of expert
   – Similarity (~ “birds of a feather”)
        –   Moniker
        –   Work department
        –   Grad school
        –   Major Field of Study
   – Number of times collaborated with seeker
   – Number of times cited seeker
Findings for all proposal collaborations
          Explaining Proposal Collaboration Relation (p*/ERGM results)
                                                                 Full model
                               Effects
                                                                  (N=2,186)
Control Isolates (single author)                                    5.447*
Control Edge (proposal collaboration relation)                      -6.751*
          Weighted degree (negative measure
Control                                                             4.623*
          of preferential attachment)
H1        Gender (Female)                                            0.021
H2        Tenure (Years since PhD)                                  0.002*
H3        Institution Tier (Top 10% universities)                   -0.098*
H4        H-index                                                   -0.014*           Researchers are more likely to
                                                                                      have better familiarity of and
H5        Co-authorship                                             2.431*            collaborate again with those they
                                                                                      share a collaboration history (co-
H6        Citation relation                                         1.132*            authorship) or with those they cite
          * Indicates p<0.05
            Lungeanu, Huang, Contractor (2012) “A network perspective on success in collaboration: Stop
            citing me for our own good?”, Academy of Management                                                      26
Scientometric Relations
           Bibliometric Relations
• Authorship relations (author-article)
   – Primary evidence of historical collaboration
     behavior
• Citation relations (article-article)
   – An important leading indicator of future
     collaboration behavior
Bibliometric Relations
                        Descriptions
                                    Directed/
Domain-Range        Relation                     Magnitude
                                   Undirected
 author-article     authorship       directed       N
 author-author     co-authorship    undirected       Y
 article-article     citation        directed       N
 author-author       citation        directed        Y
 article-article    co-citation     undirected       Y
 author-author      co-citation     undirected       Y
Citation-related Relations
              Dependencies
                               Article-Article
                                  Citation




            Article-Article                        Author-Author
             Co-Citation                              Citation



Author-Author
 Co-Citation


  Garfield, Eugene (1955) "Citation indexes for science"
  M. M. Kessler (1963) "Bibliographic coupling between scientific papers"
Citation-related Relations
   Four Useful Primitive Operations
• Authorship-related (derived from VIVO)
  1. Given an author A, find all articles by A
     getArticles(authorURI)
  2. Given an article A, find all authors of A
     getAuthors(articleURI)
• Citation-related (derived from PubMed)
  3. Given an article A, find all articles which cite A
     getArticleArticleCitationFrom(articleID)
  4. Given an article A, find all articles cited by A
     getArticleArticleCitationTo(articleID)
Linking Scientometric Data
       VIVO Recommender Sources

Data category          VIVO            PubMed
 Researcher ids      Very strong        Very weak
   Article ids    Some PubMed Ids       Very strong
  Citation data     little or none         Good
                                       International,
     Scope        University faculty
                                       1809-present
Author Representation
               VIVO vs. PubMed
                         Prof. Alan R. Katritzky,
                        Department of Chemistry,
                          University of Florida


             UF VIVO                                PubMed
http://vivo.ufl.edu/individual/n3622            AR Katritzky
                                              Alan Roy Katritzky
                                               Alan R Katritzky
                                                A R Katritzky
Linking UF VIVO to PubMed
                  Approach Diagram
•   VIVO       Author1         Author2                Author3          Author4


                               Authorship relations


               Article1         Article2              Article3         Article4


                   PubMed ID                               PubMed ID

•   PubMed     Article1                               Article3
                                citation
Linking UF VIVO to PubMed
        Publication coverage
                    • 8852 publications in UF VIVO
                    • 8037 distinct PubMed ids
                      associated with UF VIVO
                      publications
                    • ~90% of UF VIVO’s articles key into
                      PubMed, making article-article
With PubMed ID        citation data available using Linked
                      Open Data
Without PubMed ID
Linking UF VIVO to PubMed
                 Faculty coverage
                                • 6578 Faculty Members in UF VIVO
                                • 990 (15%) of UF Faculty Members have
                                  at least one publication in UF VIVO
                                • 906 UF Faculty Members have at least
                                  one publication in PubMed
                                • Therefore using our approach
                                  (VIVO+PubMed mash-up) just 14% of
                                  UF Faculty Members have the
With at least one PubMed ID
                                  possibility of having article-article
                                  citation data (and hence author-author
                                  citation data) available
no pubs or no pubs with PubMed ID
Cross-Institutional Search
         Previous Work (VIVO 2011)
• Direct2Experts
  – http://direct2experts.org/
  – Distributed query
  – Links to a researcher’s home RNS
  – Weber GM, Barnett W, Conlon M, Eichmann D, Kibbe W, Falk-Krzesinski
    H, Halaas M, Johnson L, Meeks E, Mitchell D, Schleyer T, Stallings
    S, Warden M, Kahlon M (2011) Direct2Experts: a pilot national network to
    demonstrate interoperability among research-networking platforms
• VIVO Search
  – http://beta.vivosearch.org/
  – Centralized index of multiple sites
SONIC C-IKNOW VIVO Recommender
  SPARQL Query Language for RDF




 Just Say NO! to Web Crawling
SONIC C-IKNOW VIVO
            Collaboration Recommender
                         SONIC C-IKNOW VIVO       Web browser
                            Collaboration         (PC, Mac, Smart Phone, tablet)
                         Recommender client                                Remote
SONIC
                                                                           servers
servers                                    Ranked
                                           recommendations                   VIVO
                                                                           (Florida)
                         SONIC C-IKNOW VIVO
p*/ERGM
                            Collaboration
 server                                                                      VIVO
                         Recommender server
                                                                           (Cornell)
                                                  SPARQL
    R          Community                          (profiles,                PubMed
(statnet)                        User             publications,
               of interest                                                   (Iowa)
                                profiles          citations,
                                                  keywords)
                             Multiple saved
                             search criteria
Lessons learned
• Researcher Networking Systems (RNSs) should
  take article-article citation data seriously
• Adding a robust SPARQL endpoint to each
  VIVO-compliant RNS facilitates publishing and
  sharing linked open data
• Available free and open source software
  (FOSS) tools are mature and more than
  adequate to begin building interesting
  applications on RNSs
Lessons learned …
                VIVO Ontology
• Embrace the existing support in the already
  included bibo ontology for article-article
  citation data and populate the data
• Add researcher attributes
  – Year of last degree
  – Gender
Future Work
• Technology adoption study for an online
  collaboration recommendation tool for
  research scientists
• p*/ERGM probabilistic recommendations
• Improve navigation through the concept space
  using an ontology such as MeSH
• Recommend entities
SONIC C-IKNOW VIVO Recommender
                Demonstration
• http://ciknow1.northwestern.edu/vivorecommender/

• Migrating soon to:
  http://ciknow.northwestern.edu/vivorecommender/

• GitHub:
  http://github.com/soniclab
  http://github.com/soniclab/vivo-recommender
SONIC C-IKNOW VIVO Recommender
    Open Source Software Stack
• Java – programming language
• Apache Jena
   – RDF interface
   – ARQ: SPARQL support
• Java Universal Network/Graph Framework (JUNG) –
  social network analysis (SNA) algorithms
   – Centrality measures
   – Degree of nodes
   – etc
• JUNIT – unit testing and quality assurance
• Data-Driven Documents (D3) - visualization
SONIC C-IKNOW VIVO Recommender
          Our Collaborators
• University of Florida
   – Mike Conlon
   – Nicholas Rejack
   – Stephen Williams
• University of Iowa
   – David Eichmann
   – Brandyn Kusenda
• Cornell University
   –   Jon Corson-Rikert
   –   Brian Caruso
   –   Christopher Manly
   –   John Fereira
SONIC C-IKNOW VIVO Recommender
             SONIC Contributors
• Anup Sawant     • Hugh Devlin




• Joe Gilborne    • Willem Pieterson




• Jinling Li      • Noshir Contractor

Weitere ähnliche Inhalte

Ähnlich wie Collaboration Recommender

Levine-Clark, Michael, “Citation Indexes,” Seminario Entre Pares, Puebla, Mex...
Levine-Clark, Michael, “Citation Indexes,” Seminario Entre Pares, Puebla, Mex...Levine-Clark, Michael, “Citation Indexes,” Seminario Entre Pares, Puebla, Mex...
Levine-Clark, Michael, “Citation Indexes,” Seminario Entre Pares, Puebla, Mex...Michael Levine-Clark
 
Evolving and emerging scholarly communication services in libraries: public a...
Evolving and emerging scholarly communication services in libraries: public a...Evolving and emerging scholarly communication services in libraries: public a...
Evolving and emerging scholarly communication services in libraries: public a...Claire Stewart
 
Towards knowledge maintenance in scientific digital libraries with the keysto...
Towards knowledge maintenance in scientific digital libraries with the keysto...Towards knowledge maintenance in scientific digital libraries with the keysto...
Towards knowledge maintenance in scientific digital libraries with the keysto...jodischneider
 
Beyond the Factor: Talking about Research Impact
Beyond the Factor: Talking about Research ImpactBeyond the Factor: Talking about Research Impact
Beyond the Factor: Talking about Research ImpactClaire Stewart
 
Allenjcochran 02 ab,bib,cplan
Allenjcochran 02 ab,bib,cplanAllenjcochran 02 ab,bib,cplan
Allenjcochran 02 ab,bib,cplanAllen Cochran
 
Data Citation Update
Data Citation UpdateData Citation Update
Data Citation UpdateJisc RDM
 
Literature Review Handout - Carnegie Mellon University Global Communication C...
Literature Review Handout - Carnegie Mellon University Global Communication C...Literature Review Handout - Carnegie Mellon University Global Communication C...
Literature Review Handout - Carnegie Mellon University Global Communication C...Jonathan Underwood
 
A Citation-Based Recommender System For Scholarly Paper Recommendation
A Citation-Based Recommender System For Scholarly Paper RecommendationA Citation-Based Recommender System For Scholarly Paper Recommendation
A Citation-Based Recommender System For Scholarly Paper RecommendationDaniel Wachtel
 
Defense Ates Gursimsek Mutlimodal Semiotics and Collaborative Design
Defense Ates Gursimsek Mutlimodal Semiotics and Collaborative DesignDefense Ates Gursimsek Mutlimodal Semiotics and Collaborative Design
Defense Ates Gursimsek Mutlimodal Semiotics and Collaborative DesignRobin Teigland
 
An Empirical Appraisal Of Canadian Doctoral Dissertations Using Grounded Theo...
An Empirical Appraisal Of Canadian Doctoral Dissertations Using Grounded Theo...An Empirical Appraisal Of Canadian Doctoral Dissertations Using Grounded Theo...
An Empirical Appraisal Of Canadian Doctoral Dissertations Using Grounded Theo...James Heller
 
The Research Data Alliance: Creating the culture and technology for an intern...
The Research Data Alliance: Creating the culture and technology for an intern...The Research Data Alliance: Creating the culture and technology for an intern...
The Research Data Alliance: Creating the culture and technology for an intern...Research Data Alliance
 
Network analyses of psychological science
Network analyses of psychological scienceNetwork analyses of psychological science
Network analyses of psychological scienceKevin Lanning
 
God's property
God's propertyGod's property
God's propertySoushilove
 
Reliability and Comparability of Peer Review Results
Reliability and Comparability of Peer Review ResultsReliability and Comparability of Peer Review Results
Reliability and Comparability of Peer Review ResultsNadine Rons
 
2014 09-04-foster-metricsworkshopslides
2014 09-04-foster-metricsworkshopslides2014 09-04-foster-metricsworkshopslides
2014 09-04-foster-metricsworkshopslidesNathalie Cornée
 
Grounded Theory: an Introduction (updated Jan 2011)
Grounded Theory: an Introduction (updated Jan 2011)Grounded Theory: an Introduction (updated Jan 2011)
Grounded Theory: an Introduction (updated Jan 2011)Hora Tjitra
 
Open Access: Trends and opportunities from the publisher's perspective
Open Access: Trends and opportunities from the publisher's perspectiveOpen Access: Trends and opportunities from the publisher's perspective
Open Access: Trends and opportunities from the publisher's perspectiveCaroline Sutton
 

Ähnlich wie Collaboration Recommender (20)

Levine-Clark, Michael, “Citation Indexes,” Seminario Entre Pares, Puebla, Mex...
Levine-Clark, Michael, “Citation Indexes,” Seminario Entre Pares, Puebla, Mex...Levine-Clark, Michael, “Citation Indexes,” Seminario Entre Pares, Puebla, Mex...
Levine-Clark, Michael, “Citation Indexes,” Seminario Entre Pares, Puebla, Mex...
 
Evolving and emerging scholarly communication services in libraries: public a...
Evolving and emerging scholarly communication services in libraries: public a...Evolving and emerging scholarly communication services in libraries: public a...
Evolving and emerging scholarly communication services in libraries: public a...
 
Towards knowledge maintenance in scientific digital libraries with the keysto...
Towards knowledge maintenance in scientific digital libraries with the keysto...Towards knowledge maintenance in scientific digital libraries with the keysto...
Towards knowledge maintenance in scientific digital libraries with the keysto...
 
Beyond the Factor: Talking about Research Impact
Beyond the Factor: Talking about Research ImpactBeyond the Factor: Talking about Research Impact
Beyond the Factor: Talking about Research Impact
 
Allenjcochran 02 ab,bib,cplan
Allenjcochran 02 ab,bib,cplanAllenjcochran 02 ab,bib,cplan
Allenjcochran 02 ab,bib,cplan
 
Data Citation Update
Data Citation UpdateData Citation Update
Data Citation Update
 
citation analysis
citation analysiscitation analysis
citation analysis
 
Altmetrics
AltmetricsAltmetrics
Altmetrics
 
Literature Review Handout - Carnegie Mellon University Global Communication C...
Literature Review Handout - Carnegie Mellon University Global Communication C...Literature Review Handout - Carnegie Mellon University Global Communication C...
Literature Review Handout - Carnegie Mellon University Global Communication C...
 
A Citation-Based Recommender System For Scholarly Paper Recommendation
A Citation-Based Recommender System For Scholarly Paper RecommendationA Citation-Based Recommender System For Scholarly Paper Recommendation
A Citation-Based Recommender System For Scholarly Paper Recommendation
 
Defense Ates Gursimsek Mutlimodal Semiotics and Collaborative Design
Defense Ates Gursimsek Mutlimodal Semiotics and Collaborative DesignDefense Ates Gursimsek Mutlimodal Semiotics and Collaborative Design
Defense Ates Gursimsek Mutlimodal Semiotics and Collaborative Design
 
An Empirical Appraisal Of Canadian Doctoral Dissertations Using Grounded Theo...
An Empirical Appraisal Of Canadian Doctoral Dissertations Using Grounded Theo...An Empirical Appraisal Of Canadian Doctoral Dissertations Using Grounded Theo...
An Empirical Appraisal Of Canadian Doctoral Dissertations Using Grounded Theo...
 
The Research Data Alliance: Creating the culture and technology for an intern...
The Research Data Alliance: Creating the culture and technology for an intern...The Research Data Alliance: Creating the culture and technology for an intern...
The Research Data Alliance: Creating the culture and technology for an intern...
 
Network analyses of psychological science
Network analyses of psychological scienceNetwork analyses of psychological science
Network analyses of psychological science
 
God's property
God's propertyGod's property
God's property
 
Research design and methodology
Research design and methodologyResearch design and methodology
Research design and methodology
 
Reliability and Comparability of Peer Review Results
Reliability and Comparability of Peer Review ResultsReliability and Comparability of Peer Review Results
Reliability and Comparability of Peer Review Results
 
2014 09-04-foster-metricsworkshopslides
2014 09-04-foster-metricsworkshopslides2014 09-04-foster-metricsworkshopslides
2014 09-04-foster-metricsworkshopslides
 
Grounded Theory: an Introduction (updated Jan 2011)
Grounded Theory: an Introduction (updated Jan 2011)Grounded Theory: an Introduction (updated Jan 2011)
Grounded Theory: an Introduction (updated Jan 2011)
 
Open Access: Trends and opportunities from the publisher's perspective
Open Access: Trends and opportunities from the publisher's perspectiveOpen Access: Trends and opportunities from the publisher's perspective
Open Access: Trends and opportunities from the publisher's perspective
 

Collaboration Recommender

  • 1. A Collaboration Recommender Based on Linked Open Data Conforming to the VIVO Ontology Anup Sawant, Hugh J. Devlin, Noshir Contractor (Northwestern) Brandyn J. Kusenda, David Eichmann (Iowa) VIVO 2012 Miami, Florida USA This research was supported by grants from the following grants: National Science Foundation grants CNS-1010904, OCI-0904356, IIS-0838564, UL1RR024146-06S2 and NIH CTSA awards UL1RR025741, 5UL1RR025741-04S3
  • 2. SONIC C-IKNOW VIVO Recommender Outline • Motivation & Project overview • MTML collaboration recommendation heuristics • Report on our practical experience in building collaboration recommender systems • Importance of relational data in recommending collaborations, citation in particular • Recommendations, Future Work, Questions, Comments, Suggestions • Acknowledge Contributors, Collaborators, and Tools
  • 3. Ascendance of Teams Studies of 19.9 million research articles over 5 decades as recorded in the Web of Science database, and an additional 2.1 million patent records from 1975-2005 found three important facts. 1. For virtually all fields, research is increasingly done in teams 2. Teams typically produce more highly cited research than individuals do (accounting for self-citations), and this team advantage is increasing over time. 3. Teams now produce the exceptionally high impact research, even where that distinction was once the domain of solo authors. Sources: Wuchty, Jones, and Uzzi, 2007a, 2007b
  • 4. Ascendance of Virtual Teams The trend toward virtual communities was not driven by a growth in teamwork by scientists working with other co-located scientists. Using the Web of Science database to analyze the collaboration arrangements of over 4,000,000 papers over a 30 year period, Jones, Wuchty, Uzzi found that: 1. Team science is increasingly composed of co-authors located at different universities. 2. These “virtual communities of scholars” produce higher impact work than comparable co-located teams or solo scientists. 3. This change is true for all fields and team sizes, as well as for research done at elite universities Source: Jones, Wuchty, Uzzi (2008)
  • 5. Findings for all proposal collaborations Explaining Proposal Collaboration Relation (p*/ERGM results) Full model Effects (N=2,186) Control Isolates (single author) 5.447* Control Edge (proposal collaboration relation) -6.751* Weighted degree (negative measure Control 4.623* of preferential attachment) H1 Gender (Female) 0.021 H2 Tenure (Years since PhD) 0.002* H3 Institution Tier (Top 10% universities) -0.098* H4 H-index -0.014* Researchers are more likely to have better familiarity of and H5 Co-authorship 2.431* collaborate again with those they share a collaboration history (co- H6 Citation relation 1.132* authorship) or with those they cite * Indicates p<0.05 Lungeanu, Huang, Contractor (2012) “A network perspective on success in collaboration: Stop citing me for our own good?”, Academy of Management 5
  • 6. SONIC C-IKNOW VIVO Recommender Project Goals • Port the SONIC collaboration recommendation heuristics to VIVO • Gain practical experience in building systems that use – Linked Open Data (LOD) – SPARQL query language • Cross-institutional recommending – Generalize the SONIC collaboration recommendation prototype from a single institution (Northwestern) to multiple institutions – Explore use of distributed, federated queries • Technology adoption study of the utilization and impact of our social-science grounded recommendation heuristics
  • 7. WHY DO WE CREATE, MAINTAIN, DISSOLVE, AND RECONSTITUTE OUR COMMUNICATION AND KNOWLEDGE NETWORKS?
  • 8. Social Drivers: Why do we create and sustain networks? • Theories of self- • Theories of contagion interest • Theories of balance • Theories of social and • Theories of homophily resource exchange • Theories of proximity • Theories of mutual interest and collective action Contractor, N. S., Wasserman, S. & Faust, K. (2006). Testing multi-theoretical multilevel hypotheses about organizational networks: An analytic framework and empirical example. Academy of Management Review
  • 9. Multi-theoretical, Multi-level (MTML) Collaboration Recommendation Heuristics Heuristic Social theory Relations Metric Affiliation proximity affiliation neighbor coauthorship neighbor Cocitation mutual interest cocitation neighbor coauthorship neighbor Most Qualified self-interest citation h-index authorship Friend of a friend balance coauthorship distance count of geodesics Social Exchange reciprocity citation dyadic in-degree Follow the crowd contagion coauthorhip + citation centrality coauthorship distance Birds of a feather homophily (attributes) count Mobilizing collective action coauthorhip + citation shortest path betweenness Feeling lucky probablistic model coauthorship p*/ERGM citation Monge, P. R. and N. S. Contractor (2003) Theories of communication networks NY: Oxford University Press
  • 10. Affiliation Heuristic • The ‘Affiliation’ score is proportional to the number of experts present in same department as the seeker but haven’t done any collaboration in the past with the seeker • A form of proximity theory – social relations are (at least in part) opportunistic  “We both work in the same department so we might want to collaborate in future.”  Example - Works in the same department (Entomology and Nematology) but never coauthored.
  • 11. Affiliation Recommendation Qualified Experts Coauthors Same Affiliation
  • 12. Co-Citation Heuristic • The Co-Citation score is proportional to the number of times the seeker is co-cited with an identified expert • A Cognitive metric: 3rd party rating of similarity • Mutual interest theory – Sherif, M. (1958) "Superordinate Goals in the Reduction of Intergroup Conflict."  “I have been co-cited with a qualified person quite a few times so I might want to collaborate with him in future.”  Example – Co-cited with you 3 times. (specifically disallows previous co-authors)
  • 13. Co-citation Recommendation Qualified Experts Coauthors Co-cited
  • 14. H-index • A scientist has index ‘h’ if ‘h’ of his/her N papers referencing the query term have at least h citations each, and the other (N-h) papers have no more than h citations each. (image source: Wikipedia) Hirsch, J. E. (2005) An index to quantify an individual's scientific research output
  • 15. Qualified H-index • A scientist has a “qualified h index,” that is, an h-index qualified by a given concept, based on the number of their publications which are associated with that concept as a keyword
  • 16. Most Qualified Heuristic • The ‘Most Qualified’ score is proportional to the expert’s “Qualified h-index” • Self interest theory – Simon, Herbert (1957). "A Behavioral Model of Rational Choice“ – MacDonald, C. and Ounis, I. (2006) “Voting for candidates: adapting data fusion techniques for an expert search task ”  “I like to work with someone who is most useful to me and seems to have a lot of expertise to offer.”  Example – 2 of all of this expert’s articles including the query term have been cited at least 2 number of times.
  • 17. VIVO Ontology Representation of Concepts • Research Areas (associated with researchers) • Subject Areas (associated with articles) • Free Text Keywords (associated with articles)
  • 18. Friend of a Friend Heuristic • The ‘Friend of a Friend’ score is proportional to the number of distinct paths through which the expert is indirectly connected to the seeker, and favors experts close to the seeker in the collaboration network. • Balance theory, AKA “closing open triangles” – Monge, P. R. and N. S. Contractor (2003). Theories of communication networks.  “I like to work with someone I have not previously worked with. If I give our mutual friend as a reference, they’re more likely to accept.”  Example - Connected indirectly through Hoy,Marjorie Ann via Co- authorship network.
  • 19. Friend of a Friend … • Network: (global) Collaboration – (scalar) Expert attributes • Path length: distance d from seeker u to expert e • Number of geodesics n from seeker u to expert e nsp (u, e) fobj (u, e) 2 d (u, e) (specifically disallows previous co-authors)
  • 20. Social Exchange Heuristic • The ‘Social Exchange’ score is proportional to the number of articles c authored by the expert e which cite the seeker u • Reciprocity theory – Blau, P. M. (2006) Exchange and power in social life. fobj (u, e) c(u, e)  “I’ve helped them in the past, so they’re more likely to help me now.”  Example – Cited your work in 3 articles.
  • 21. Follow the Crowd Heurustic • The ‘Follow the Crowd’ score is proportional to the expert’s overall popularity in terms of collaboration and being cited, and favors experts close to the seeker in the collaboration network. • Contagion theory – Krackhardt, D. and Brass, D. J. (1994) Intraorganizational networks: the micro side. – Krackhardt, D. M. (1986) Cognitive social structures.  “They seem to be the most qualified person since many others are working with them.”  Example - Co-authored or cited by 5 people and is within 3 step(s) from you via Co-authorship and Citation Network.
  • 22. Follow the Crowd … deg in (e) fobj (u, e) d (u, e) • inDeg: Expert’s in-degree in the combined network (Collaboration + citation) • d: distance from seeker u to expert e in the collaboration network if connected, max(d) otherwise
  • 23. Birds of a Feather Heuristic • The ‘Birds of a Feather’ score is proportional to the (weighted w) number of attributes a shared between the seeker u and the expert e, such as moniker (title), department, grad school and major field of study • Homophily theory – Foucault Welles, B., A. Van Devender, et al. (2010) Is a “Friend” a Friend? Investigating the Structure of Friendship Networks in Virtual Worlds • No network measures fobj (u , e) wk ak (u , e) k  “I find it easier to communicate with someone who has things in common with me.”  Example - Shares one or more of the following attributes : moniker, work department, grad school and major field of study.
  • 24. Mobilizing Heuristic • The ‘Mobilizing’ score favors experts who are brokers and close to the seeker in the union of the collaboration and citation networks. • Theory of Collection Action – Coleman, J. S. (1966) "Individual interests and collective action.“ – Laumann, E. O. and F. U. Pappi (1976) Networks of collective action  “He seems to be connected to lots of qualified experts and can help me make more useful connections.”  Example – Qualified expert who is a broker among other experts. inDeg(e) bet (e) fobj (u, e) outDeg(e) d (u, e) – fobj(u,e) : Objective function of user u and expert e – inDeg(e) : in-degree of expert in union of the Collaboration and Citation networks. – outDeg(e) : out-degree of expert in union of the Collaboration and Citation networks. – d(u,e) : seeker to expert distance in union of the Collaboration and Citation networks. – bet(e) expert’s betweenness centrality in union of the Collaboration and Citation networks, see Wasserman, S. and K. Faust (1995) Social Network Analysis: Methods and Applications
  • 25. Feeling Lucky Heuristic • The ‘Feeling Lucky’ is an estimate of the probability of collaboration using a p*/Exponential Random Graph Model (ERGM) model of scientific team formation. • A Probabilistic Model of relationship formation – Wasserman, S. and G. Robins (2003) An introduction to random graphs, dependence graphs, and p* • Factors effecting probability – In-Degree Centrality of expert in the union of Collaboration and Citation networks – Publication count of expert – Similarity (~ “birds of a feather”) – Moniker – Work department – Grad school – Major Field of Study – Number of times collaborated with seeker – Number of times cited seeker
  • 26. Findings for all proposal collaborations Explaining Proposal Collaboration Relation (p*/ERGM results) Full model Effects (N=2,186) Control Isolates (single author) 5.447* Control Edge (proposal collaboration relation) -6.751* Weighted degree (negative measure Control 4.623* of preferential attachment) H1 Gender (Female) 0.021 H2 Tenure (Years since PhD) 0.002* H3 Institution Tier (Top 10% universities) -0.098* H4 H-index -0.014* Researchers are more likely to have better familiarity of and H5 Co-authorship 2.431* collaborate again with those they share a collaboration history (co- H6 Citation relation 1.132* authorship) or with those they cite * Indicates p<0.05 Lungeanu, Huang, Contractor (2012) “A network perspective on success in collaboration: Stop citing me for our own good?”, Academy of Management 26
  • 27. Scientometric Relations Bibliometric Relations • Authorship relations (author-article) – Primary evidence of historical collaboration behavior • Citation relations (article-article) – An important leading indicator of future collaboration behavior
  • 28. Bibliometric Relations Descriptions Directed/ Domain-Range Relation Magnitude Undirected author-article authorship directed N author-author co-authorship undirected Y article-article citation directed N author-author citation directed Y article-article co-citation undirected Y author-author co-citation undirected Y
  • 29. Citation-related Relations Dependencies Article-Article Citation Article-Article Author-Author Co-Citation Citation Author-Author Co-Citation Garfield, Eugene (1955) "Citation indexes for science" M. M. Kessler (1963) "Bibliographic coupling between scientific papers"
  • 30. Citation-related Relations Four Useful Primitive Operations • Authorship-related (derived from VIVO) 1. Given an author A, find all articles by A getArticles(authorURI) 2. Given an article A, find all authors of A getAuthors(articleURI) • Citation-related (derived from PubMed) 3. Given an article A, find all articles which cite A getArticleArticleCitationFrom(articleID) 4. Given an article A, find all articles cited by A getArticleArticleCitationTo(articleID)
  • 31. Linking Scientometric Data VIVO Recommender Sources Data category VIVO PubMed Researcher ids Very strong Very weak Article ids Some PubMed Ids Very strong Citation data little or none Good International, Scope University faculty 1809-present
  • 32. Author Representation VIVO vs. PubMed Prof. Alan R. Katritzky, Department of Chemistry, University of Florida UF VIVO PubMed http://vivo.ufl.edu/individual/n3622 AR Katritzky Alan Roy Katritzky Alan R Katritzky A R Katritzky
  • 33. Linking UF VIVO to PubMed Approach Diagram • VIVO Author1 Author2 Author3 Author4 Authorship relations Article1 Article2 Article3 Article4 PubMed ID PubMed ID • PubMed Article1 Article3 citation
  • 34. Linking UF VIVO to PubMed Publication coverage • 8852 publications in UF VIVO • 8037 distinct PubMed ids associated with UF VIVO publications • ~90% of UF VIVO’s articles key into PubMed, making article-article With PubMed ID citation data available using Linked Open Data Without PubMed ID
  • 35. Linking UF VIVO to PubMed Faculty coverage • 6578 Faculty Members in UF VIVO • 990 (15%) of UF Faculty Members have at least one publication in UF VIVO • 906 UF Faculty Members have at least one publication in PubMed • Therefore using our approach (VIVO+PubMed mash-up) just 14% of UF Faculty Members have the With at least one PubMed ID possibility of having article-article citation data (and hence author-author citation data) available no pubs or no pubs with PubMed ID
  • 36. Cross-Institutional Search Previous Work (VIVO 2011) • Direct2Experts – http://direct2experts.org/ – Distributed query – Links to a researcher’s home RNS – Weber GM, Barnett W, Conlon M, Eichmann D, Kibbe W, Falk-Krzesinski H, Halaas M, Johnson L, Meeks E, Mitchell D, Schleyer T, Stallings S, Warden M, Kahlon M (2011) Direct2Experts: a pilot national network to demonstrate interoperability among research-networking platforms • VIVO Search – http://beta.vivosearch.org/ – Centralized index of multiple sites
  • 37. SONIC C-IKNOW VIVO Recommender SPARQL Query Language for RDF Just Say NO! to Web Crawling
  • 38. SONIC C-IKNOW VIVO Collaboration Recommender SONIC C-IKNOW VIVO Web browser Collaboration (PC, Mac, Smart Phone, tablet) Recommender client Remote SONIC servers servers Ranked recommendations VIVO (Florida) SONIC C-IKNOW VIVO p*/ERGM Collaboration server VIVO Recommender server (Cornell) SPARQL R Community (profiles, PubMed (statnet) User publications, of interest (Iowa) profiles citations, keywords) Multiple saved search criteria
  • 39. Lessons learned • Researcher Networking Systems (RNSs) should take article-article citation data seriously • Adding a robust SPARQL endpoint to each VIVO-compliant RNS facilitates publishing and sharing linked open data • Available free and open source software (FOSS) tools are mature and more than adequate to begin building interesting applications on RNSs
  • 40. Lessons learned … VIVO Ontology • Embrace the existing support in the already included bibo ontology for article-article citation data and populate the data • Add researcher attributes – Year of last degree – Gender
  • 41. Future Work • Technology adoption study for an online collaboration recommendation tool for research scientists • p*/ERGM probabilistic recommendations • Improve navigation through the concept space using an ontology such as MeSH • Recommend entities
  • 42. SONIC C-IKNOW VIVO Recommender Demonstration • http://ciknow1.northwestern.edu/vivorecommender/ • Migrating soon to: http://ciknow.northwestern.edu/vivorecommender/ • GitHub: http://github.com/soniclab http://github.com/soniclab/vivo-recommender
  • 43. SONIC C-IKNOW VIVO Recommender Open Source Software Stack • Java – programming language • Apache Jena – RDF interface – ARQ: SPARQL support • Java Universal Network/Graph Framework (JUNG) – social network analysis (SNA) algorithms – Centrality measures – Degree of nodes – etc • JUNIT – unit testing and quality assurance • Data-Driven Documents (D3) - visualization
  • 44. SONIC C-IKNOW VIVO Recommender Our Collaborators • University of Florida – Mike Conlon – Nicholas Rejack – Stephen Williams • University of Iowa – David Eichmann – Brandyn Kusenda • Cornell University – Jon Corson-Rikert – Brian Caruso – Christopher Manly – John Fereira
  • 45. SONIC C-IKNOW VIVO Recommender SONIC Contributors • Anup Sawant • Hugh Devlin • Joe Gilborne • Willem Pieterson • Jinling Li • Noshir Contractor

Hinweis der Redaktion

  1. Every study SONIC conducts on collaboration patterns in science reveals that co-authorship data, and then citation data, are important predictors of future collaboration. Prof. Contractor may have more to say about this research on Friday. For now here is one such study, enabled by a unique, brief peak into grant proposal activity offered our lab by the NSF.
  2. Our overarching goal is to conduct a technology adoption study for a online collaboration recommendation tool for research scientists, with an emphasis on testing and validating the efficacy of and user satisfaction with a suite of recommendation heuristics draw from social science research.
  3. This table serves a topic slide for the next section of our presentation. We view collaboration recommendation systems as a critical application area to test our ideas of the factors contributing to high-impact collaborations in science.
  4. FOAF is proportional to the number of distinct shortest paths between the seeker and the expert, so that an expert with many distinct connections is preferred to one with fewer; and more specifically, to the square root of the number of shortest paths, so that for example the 11th shortest path does not contribute as much as the 2nd; and inversely proportional to the square of the distance, so that closer experts are prefered over more distance experts.
  5. the popularity contest
  6. A similarity measure; our birds of a feather heuristic is perhaps our recommending heuristic that would be most recognizable to those familiar with traditional approaches t recommending.
  7. Every study SONIC conducts on collaboration patterns in science reveals that co-authorship data, and then citation data, are important predictors of future collaboration. Prof. Contractor may have more to say about this research on Friday. For now here is one such study, enabled by a unique, brief peak into grant proposal activity offered our lab by the NSF.
  8. We find a number of bibliographic relations to be useful in recommending collaborations. The “Magnitude” column suggest whether a relation may be considered to have a magnitude, for example, the number of papers co-authored by two authors.
  9. Quality co-authorship relations can be derived from accurate and complete author-article relations; similarly, variants of citation and co-citation depend critically on the accuracy and completeness of article-article citation data.
  10. From our experience researcher networking systems (RNSs) need to be designed to include efficient implementations of four primitive data retrieval operations on bibliometric data in order to facilitate collaboration recommendations.
  11. SONIC’s C-IKNOW VIVO Recommender is a so-called “mash-up” of heterogeneous, linked open data (LOD) sources: one or more VIVO instances, supplemented by citation data from PubMed, courtesy of a SPARQL endpoint implemented by our collaborators at the University of Iowa. VIVO’s strength is disambiguation of researchers, while PubMed’s strength is disambiguating publications.
  12. For an example of the different strengths &amp; weaknesses of VIVO and PubMed, consider the representation of Prof. Alan Katritzky of the department of Chemistry at the University of Florida (the most prolific author at UFL). Following LOD best practices, the UFL VIVO team assigned a unique URI to Prof. Katritzky, while in PubMed he is represented by at least 4 different character strings. Similarly, it would be trivial to find, in any given VIVO instance, duplicate publications, perhaps with slight variations in the title. Heuristics for the disambiguation of names is an active area of research. In our work we are more concerned with the recommendation heuristics, so we do not employ any disambiguation heuristics, instead taking an institution’s VIVO as the authority on the their faculty’s publications, and taking PubMed as the authority for article-article citation data.
  13. An overview of our approach to supplementing VIVO authorship data with article-article citation data from PubMed. For example, here Author1 and Author2 cite Author3. Note that not all researchers have articles, not all researchers have articles with PubMed IDs, and not all articles in PubMed have citation links. A consequence of this approach is that the recommender is most useful in the biomedical sciences.
  14. The UF VIVO system administration team has done an excellent job of associating PubMed identifiers to publications in their VIVO instance.Notes: 34,313 authorship instances (author-article relations) average 3.8 authors/article;
  15. On the other hand most UF faculty members have no publications listed in UF’s VIVO. Then, only a the subset of articles in PubMed, specifically those comprising PubMed Central, that is, the full-text articles, have article-article citation data in machine-readable form; the “References” section of an article is encoded by human curators as part of the process of ingestion of an article into PubMed Central.
  16. After being inspired by these two divergent approaches to cross-institutional recommending at VIVO 2011, we hoped to find something of a middle ground.
  17. We embraced SPARQL for our project, targeting specifically the data we need at query time, as opposed to alternative approaches involving web crawling or harvesting RDF from dereferencing URIs.
  18. The easy availability of high-quality,free and open source software tools greatly reduces the cost of getting started building sophisticated applications on top of rich linked open data such as VIVO. We must acknowledge the contributions of all those who worked on these projects without whom our work would not be possible.