Presentation about our community-driven approach for reputation eliciting and estimation, given at the Altmetrics Workshop, during WebSci Conference 2011 held in Koblenz, Germany.
Enabling Community Participation of Senior Citizens
2011 06-14 cristhian-parra_u_count
1. UCount: A community-driven approach for measuring Scientific Reputation Altmetrics Workshop / websci2011 Cristhian Parra University of Trento, Italy parra@disi.unitn.it
3. What is Scientific Reputation? Scientific Reputation is the social evaluation (opinion) by the scientific community of a researcher or its contributions given a certain criterion (scientific impact)
4. Main Goal understand To understand the way reputationis formed within and across scientific communities ? How, Why
5. Science is an Economy of Reputation [Whitley 2000] Motivation Improve support for Decision Making Readership Affiliation Bibliometrics
9. Results Surveys: Correlation between bibliometric indicators and reputation is always in the rank of (-0.5:0.5) Research Position Contests CNRS dataset: same result as in surveys Italian dataset: around 50% of effectiveness in predictions for all metrics Bibliometrics are not a good describer of real reputation
12. UCount Eliciting Reputation Been there Peer Review based assessment (Research Position Contests) Surveys Community oriented Surveys Peer Review Feedback
13. UCount Surveys List of Candidates DBLP Coauthorship Graph ICST Affinity Shortest Path + Jaccard Editorial Boards Palsberg Top H Researchers http://icst.org/UCount-Survey/ http://icst.org/icst-transactions/ http://www.cs.ucla.edu/~palsberg/h-number.html
17. Reverse Engineering Approaches Decision Trees No tree with more than 60% of accuracy Unsupervised Methods Genetic algorithms applied on CNRS Dataset improved correlation in an average of 15% (running only for 5 minutes) Highly improved correlation for fields Research Management and Politics. Next Applying Machine Learning techniques Explore other techniques (e.g. neural networks) Obtain other types of features (e.g. keynotes, advisory networks) http://code.google.com/p/revengrep/ https://github.com/cdparra/melquiades/
18. Reverse Engineering Problem (2) Possible Examples of Combinations One single feature with the highest correlation to reputation (e.g. H-Index for Databases, Readership for Social Informatics) A linear combination of features A complex logic algorithm (e.g. a decision tree)
22. Lamont (2009). How professors think: Inside the curious world of academic judgment. Bollen (2009) et. al. A principal component analysis of 39 scientific impact measures. Sabater (2005) et. al. Review on computational trust and reputation models. Hirsch (2005). “An index to quantify an individual’s scientific research output.” Calstelfranchi (2002). Social trust: A cognitive approach. Priem (2010) et. al. Alt-metrics: A manifesto. 2010. Mann (2006) et. al. Bibliometric impact measures leveraging topic analysis. Mussi (2010) et. al. Discovering Scientific Communities using Conference Network. Nazri (2007) et. al. Journal Impact Factor. 2007. Bergstrom (2007). Eigenfactor: Measuring the value and prestige of scholarly journals. Bar-Ilan (2008). Informetrics at the beginning of the 21st century–A review. Jensen (2009) et. al. Testing bibliometric indicators by their prediction of scientists promotions. Kulasegarah (2010) et. al. Comparison of the h-index with standard bibliometric indicators to rank influential otolaryngologists in Europe and North America. Katsaros (2008) et. al. Evaluating Greek Departments of Computer Science/Engineering using Bibliometric Indices. Whitley (2000). The intellectual and social organization of the sciences. Oxford: Oxford University Press References
41. Paper at CLEI, 2010(**) http://project.liquidpub.org/karaku (**) http://project.liquidpub.org/resman First year in one slide :)
42. Social Networking Services Mendeley CiteULike Connotea Delicious Digital libraries SRSAPI SRS Repository Data Crawling DBLP Source dumps Target DB Staging area Scopus Xplore Data Loading and Cleaning …
43. Social Networking Services Mendeley CiteULike Connotea Delicious Digital libraries SRSAPI SRS Repository Data Crawling DBLP Source dumps Target DB Staging area Scopus Xplore Data Loading and Cleaning …
44. Data Acquisition Storage Data Acquisition Layer Off-line acquisition On-demand acquisition Adapter Layer DBLP MAS CiteULike Delicious Twitter MAS DBLP CiteULike Delicious Twitter Data Sources
45. How professors think [Lamont 2009] Correlation experiments [Jensen 2009, Kulasegarah 2010, Katsaros 2008] There is no direct study of reputation in the research evaluation process.
Hinweis der Redaktion
Good afternoon everyone. My name is Cristhian Parra and today I will present the work we are pushing forward in Trento to first capture and later estimate reputation in academia
The most basic definition of reputation comes in the following way: reputation (in this case scientific) is the social evaluation of a group of entities (the scientific community) towards a person, group of persons, organizations, artifacts (researchers and contributions in this case) on a certain criterion (which is more frequently the scientific impact)And why is this of any importance?
With this title, we want to refer to the two main elements of the proposal. The first element is “understanding”, which refers to the main goal of the proposal: to understand the way reputation is formed within and across scientific communities. Very few people will doubt about reputation people like Einstein in Physics, Turing in CS, or more recently by Aho in CS (famous to us students for his Dragon Book). Their good reputation is safe, in a way. Now, few people will also know how to precisely explain why this happens or what exactly make researchers to have such a good opinion about some of their peer. Which lead us to the second element of our proposal, related to the fundamental problem we will need to solve in order to get to the goal: Reverse Engineering Scientific Reputation.How can we derive the main aspects that affect reputation of researchersin the mind of people?
Because Science is basically an Economy of Reputation, where the reward for contributing to science is fundamentally building up your reputation.An this reputation is mainly based on your Scientific impact, is a multi-dimensional construct that can not be adequately measured by any single indicator [9]. It might depend on features ranging from citation-based bibliometrics, to newly web based readership or download, twitter counts, or simply the reputation of your affiliation or collaborators.This features can be both objective (e.g. bibliometrics) and subjective (e.g.affiliation) criteria resulting in a measure of and they are highly dependant of the communities. Some communities might be more or less subjective than others. Researchers will understand criteria behind their own reputationResearchers will also understand how this reputation varies across communitiesAll this understanding will help to ease the pressure of the publish or perish cultureIn general, it will improve support for decision making in evaluation processes.
weak positive linear dependence wrt H-Index (with self-citations).medium positive linear dependence wrtnumber of publications,
Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (ICST)
Measure the difference on reputation across different communitiesValidation of resultsAnd the challenges are basically the following. First, we need to get reputation info. This is, we need to know the opinion researchers have about other researchersSecond, we need to understand what are the features that characterize to researchersor their work in computer science. Example of Features are Indicators as the "Total number of publications" and other Informations that can give an idea of thequality of the work of a scientist (e.g. keynotes talks, awards, grants, affiliation, etc.) Then, we need to find a way of representing and "Collecting" these features. That is, we need to crawl the web, academic libraries, search engines, etc. looking for this info. Once we have all the data, the next step is to efectively "derive" and "represent"reputation logic behind a particular ranking. And finally, the big challenge is to validate the work. To measure how much our derived reputation algorithms can actually help researchers make better decisions.
Measure the difference on reputation across different communitiesValidation of resultsAnd the challenges are basically the following. First, we need to get reputation info. This is, we need to know the opinion researchers have about other researchersSecond, we need to understand what are the features that characterize to researchersor their work in computer science. Example of Features are Indicators as the "Total number of publications" and other Informations that can give an idea of thequality of the work of a scientist (e.g. keynotes talks, awards, grants, affiliation, etc.) Then, we need to find a way of representing and "Collecting" these features. That is, we need to crawl the web, academic libraries, search engines, etc. looking for this info. Once we have all the data, the next step is to efectively "derive" and "represent"reputation logic behind a particular ranking. And finally, the big challenge is to validate the work. To measure how much our derived reputation algorithms can actually help researchers make better decisions.
Measure the difference on reputation across different communitiesValidation of resultsAnd the challenges are basically the following. First, we need to get reputation info. This is, we need to know the opinion researchers have about other researchersSecond, we need to understand what are the features that characterize to researchersor their work in computer science. Example of Features are Indicators as the "Total number of publications" and other Informations that can give an idea of thequality of the work of a scientist (e.g. keynotes talks, awards, grants, affiliation, etc.) Then, we need to find a way of representing and "Collecting" these features. That is, we need to crawl the web, academic libraries, search engines, etc. looking for this info. Once we have all the data, the next step is to efectively "derive" and "represent"reputation logic behind a particular ranking. And finally, the big challenge is to validate the work. To measure how much our derived reputation algorithms can actually help researchers make better decisions.
Possible Examples of CombinationsOne single feature with the highest correlation to reputation (e.g. H-Index for Databases, Readership for Social Informatics)A linear combination of featuresA complex logic algorithm (e.g. a decision tree)
Measure the difference on reputation across different communitiesValidation of resultsAnd the challenges are basically the following. First, we need to get reputation info. This is, we need to know the opinion researchers have about other researchersSecond, we need to understand what are the features that characterize to researchersor their work in computer science. Example of Features are Indicators as the "Total number of publications" and other Informations that can give an idea of thequality of the work of a scientist (e.g. keynotes talks, awards, grants, affiliation, etc.) Then, we need to find a way of representing and "Collecting" these features. That is, we need to crawl the web, academic libraries, search engines, etc. looking for this info. Once we have all the data, the next step is to efectively "derive" and "represent"reputation logic behind a particular ranking. And finally, the big challenge is to validate the work. To measure how much our derived reputation algorithms can actually help researchers make better decisions.
Possible Examples of CombinationsOne single feature with the highest correlation to reputation (e.g. H-Index for Databases, Readership for Social Informatics)A linear combination of featuresA complex logic algorithm (e.g. a decision tree)
Now, I’m sure that you are all thinking now. “Why do we want to do this?”Yes, and NO.
Researchers will understand criteria behind their own reputation, allowing them to know what re- ally matters when it comes to research impact. This is what indicators contribute most to the researcher’s opinion of reputation.• Researchers will also understand how this reputation varies across communities, giving an important in- put for the always difficult problem of cross community comparisons.• This understanding will be done using data sources that include traditional but also social indicators (e.g. liquidpub, citeulike, mendeley, etc.) which means that our results will naturally extent metrics beyond cita- tions, helping to identify ways to measure scientific reputation in accurate terms (i.e. closer to the real opinion of people)• All these understanding will help to ease the pressure of the publish or perish culture and allow scientists to better focus on what it is really important.
In our case, because we want to analyze Reputation in the context of Science, we need to understand Research Evaluationbecause in order to come up with an opinion about a peer in science, what we do is EVALUATING himIn research evaluation, not onlyResearchers are the subject of evaluation, but alsoTheir contributions (papers)The dissemination means such as Journals and ConferencesAnd the Institutions. To do so, we have been using two main methods:Committees (such as those of peer review)Quantitative Analysis (such as bibliometric indicators)
weak positive linear dependence wrt H-Index (with self-citations).medium positive linear dependence wrtnumber of publications,
weak positive linear dependence wrt H-Index (with self-citations).medium positive linear dependence wrtnumber of publications,