Anzeige

From Bibliometrics to Cybermetrics - a book chapter by Nicola de Bellis

Assistant Professor um University of Tsukuba
24. Jul 2014
Anzeige

Más contenido relacionado

Anzeige

Similar a From Bibliometrics to Cybermetrics - a book chapter by Nicola de Bellis(20)

Anzeige

From Bibliometrics to Cybermetrics - a book chapter by Nicola de Bellis

  1. NICOLA DE BELLIS PRESENTED BY XANAT V. MEZA
  2.  The web exhibits a citation structure, links between web pages being similar to bibliographic citations.  Thanks to the markup languages, the information units composing a text can be marked and made recognizable by a label that facilitates their automatic connection with the full text of the cited document.
  3.  Disciplinary databases:  Chemical Abstract Service (CAS)  SAO/NASA Astrophysics Data System (ADS)  SPIRES HEP database  MathSciNet  Citeseer  Ieee Xploree  Citebase  Citations in Economics
  4.  Multidisciplinary databases:  Web of Science  Google scholar  Scopus  The relevance of a webpage to a user query can be estimated by looking at the link rates and topology of the other pages pointing to it.
  5.  Pagerank:  Google’s ranking algorithm.  It assigns different “prestige” scores to individual pages according to their position in the overall network.  More weight is assigned to the pages receiving more links.  An “authority” is a page that receives many links from quality “hubs” (like a citation classic).  A quality “hub” is a page providing many links to “authorities” (like a good review paper).
  6.  Advantages:  The immediacy of scientific literature implied an information revolution.  The web significantly helps to increase citation impact, and local online usage became one of the best predictors of future citations.  Less gate-keeping.  Disadvantages:  Fewer distinct articles are cited more.  Citations tend to concentrate on more recent publications.
  7.  How to quantify the Web-wide cognitive and social life of scientific literature?  The impact of a set of documents outside the ISI circuit can be estimated by:  Counting, by means of usage mining techniques, the number of document views or downloads over a certain period of time  Interviewing a significant sample of readers  Counting, by means of search engines’ facilities, the number of links to the website hosting the documents  Identifying and counting, as ISI indexes do, the bibliographic citations to those documents from non- ISI sources.
  8.  Standards and protocols have been developed in the context of national and international projects to make uniform the recording and reporting of online usage statistics:  COUNTER (Counting Online Usage of Networked Electronic Resources)  SUSHI (Standarized Usage Harvesting Initiative)  MESUR (Metrics from Scholarly Usage of Resources)
  9.  Peer-reviewed open access journals appeared in the 1980’s, for example New Horizons in Adult Education, Psycholoquy, Postmodern Culture and Surfaces.  In the 1990’s RePEc-Research Papers in Economics, Medline/PubMed Central and CogPrints were started or opened to public.  In 1991 Ginsparg setted up arXiv, a preprint and postprint central repository initially only for high-energy physics.
  10.  Under the slogan “Public access to publicly funded research”, the Open Access movement has published theoretical and business models along with technical infrastructure, to support the free online dissemination of peer-reviewed scientific literature since the late 1990’s.
  11.  There are two options for authors following this way of publication:  Submit a paper directly to an OA journal.  IT peer-reviews and makes freely available all of its contents for all users while shifting editorial costs onto the author of the funding institution.  There are over 3,200 OA journals in the Directory of Open Access Journals (www.doaj.org)  Keep publishing in traditional journals, but archive a peer-reviewed version of the same content into an open accessible repository.
  12.  A goal of the OA movements has been to demonstrate that open access substantially increases research impact:  In 2001, Lawrence provided evidence that citation rates in a sample of computer science conference articles appeared significantly correlated with their level of accessibility.  In 2007, Harnad and Brody’s team has been detecting OA citation advantage across all disciplines in a twelve-year sample of ISI articles (1992-2003). The citation impact was 25 to 250% higher for OA papers.
  13.  Counter-arguments:  Subjectivity factor in the selection of postable items  Increased visibility  Readership  Shelf-exposition  Best authors tend to be overrepresented  Self-selection bias postulate
  14.  In 2007 a paper by Moed performs a citation analysis of papers posted to the arXiv’s condensed matter section before being published in scientific journals and compares the results with those of a parallel citation analysis for unposted articles published in the same journals.  Articles posted to the preprint server are actually more cited than unposted ones, but the effect varies with the papers’ age.  The citation advantage of many OA papers fades into the individual performances of the authors themselves through publishing strategy.
  15.  Two studies on the citation impact of OA journals indexed on the Web of Science appeared in 2004. The impact factor of ISA OA journals was lower than no OA journals.  Despite the evidence, there are important reasons to support OA journals:  Shortening the paths between invisible colleges and turn them into real time collaboration network will increase the speed and effectiveness of scientific communication.  In the non-big research areas, it increases the opportunity of pursuing research goals.  It will allow the shaping of new ideas in constant interplay with other scientists with similar interests.
  16.  Harnad is proposing a multidimensional, field-sensitive, and carefully validated open access scientometrics, taken advantage of open access materials. The key is…  Metadata: set of encoded data attached to information units processed by the automatic indexing system to help identify, retrieve, and manage them in an effective fashion.  But there needs to be a metadata standard. To date, indexing algorithms have failed.
  17.  www.citebase.org is an indexing system of OA repositories. It was developed by Brody’s team in US in 2001. It uses the OAI-Protocol for Metadata Harvesting.  The Citebase software parses the bibliographic references of the fulltext papers hosted by the servers and, every time a reference matches the full text of another paper in the same repository, it creates a link.  A usage/Citation Impact Correlator produces a correlation table comparing the number of times an article has been cited with the approximate number of times it has been downloaded.
  18.  CiteSeer, formerly ResearchIndex (citeseer.ist.psu.edu), is a digital library search and management system developed in US.  It gathers together research article preprints and postprints from several distributed nodes of the open access Web through web crawling techniques.  It extracts the context surrounding the citation in the body of the paper.
  19.  The new Web Citation Index, based on CiteSeer technology, was launched officially in 2005.  It covers materials from OA repositories that meet quality criteria, such as:  arXiv.  The Caltech Collection of Open Digital Archives.  The Australian National University Eprints Repository.  The NASA Langley Technical Library Digital Repository.  The open access content in Digital Commons.  OAI-compliant institutional repository service.  Citebase and Citeseer are not advisable tools for bibliometric evaluations for now. They are still pilot projects.
  20.  The probability of a webpage to be included into a search engine database increases as the web crawler fetches other pages linking to it.  But!  Links do not acknowledge intellectual debts.  They lack peer review.  Links are not indelible footprints in the landscape of recorded scholarly activity.
  21.  Their study is divided in:  1. Complex network analysis, which investigates the topological properties of the Internet and the Web as particular cases of an evolving complex network.  2. Hyperlink network analysis, which interprets the connections between websites as technological symbols of social ties among individuals, groups, organizations and nations.  3. Webometrics, which extends to the web space concepts and methods originally developed in the field of bibliometrics.
  22.  The web topological structure, i.e. the number and distribution of links between the nodes, initially played the crucial role of understanding a wide range of issues:  The way users surf the Web.  The ease with which they gather information.  The formation of Web communities as clusters of highly interacting nodes.  The spread of ideas, innovations, hacking attacks, and computer viruses.
  23.  Theoretical physicists have recently shifted the attention to the dynamics of the structure by progressive addition or removal of nodes and links.  The key role on the modeling exercise is the graph:  What kind of graph is the Web?  What pattern, if any, is revealed by the hyperlink distribution among the nodes?  Do the links tend to be evenly distributed?  If not, why not?
  24.  In the late 1950s, when Erds and Renyi supplied graph theory with a coherent probabilistic foundation, the conviction gained ground that complex social and natural systems could be represented, in mathematical terms, by random graphs.  Each node of a random graph has an equal probability of acquiring a link, and the frequency distribution of links among nodes is conveniently described by a probability distribution (Poisson).
  25.  In random graphs, there is a dominant average number of links per node called the network’s “scale”. It is an upper threshold that prevents the system from having nodes with a disproportionately higher number of links.  Nodes are not clustered and display statistically short distances between each other.  Empirical evidence seemed to contradict this model because the structure of complex networks was somewhere between a totally regular graph and a random graph.
  26.  In 1998, Watts and Strogartz set a model of complex networks using the small world.  A small world is said to exist whenever members of any large group are connected to each other through short chains of intermediate acquaintances.
  27.  The path to small worlds:  Pool and Kochen made mathematical descriptions of social contact based on statistical mechanics methods, encompassing graph-theoretic models and Monte Carlo simulations in the 1950’s  In 1967, Milgram initiated a series of experiments to test the small world conjecture in real social networks. He found that in average, the acquaintance chain required to connect two random individuals is composed of about six links.
  28.  In 1967, Watts and Strogatz showed that a complex network is a small world displaying both the highly clustered sets of nodes typical of regular graphs and the small path lengths between any two nodes typical of random graphs.  They computarized the clustering coefficient and recognized the importance of short cuts.  Further experiments confirmed that documents on the web are nineteen clicks away from each other in average.
  29.  In 19678, Albert and Barabasi issued an alternative class of models for the large-scale properties of complex networks.  Networks grow by the addition of new nodes linking to already existing ones.  This addition follows a mechanism of preferential attachment that replicates the Matthew Effect.  This means that nodes have a higher probability to link with highly connected nodes than with poorly connected or isolated ones.
  30.  P(n) is the probability that a node has to establish a link.  n is a node.  An experiment in 1999 confirmed the World Wide Web is a scale-free network governed by the power law. P(n) = 1 n a
  31.  Nowadays, the network came increasingly to represent not simply a communication facility, but a tool for building online collaboration platforms where new knowledge can be created, modified, and negotiated, in a sort of virtual laboratory without walls.  Sociologists have been using Social Network Analysis (SNA) in the World Wide Web hyperlink texture since 1997. It is called Hyperlink Network Analysis (HNA).
  32.  Objectives:  Check whether the hyperlink network is organized around central websites which play the role of hubs.  Centrality measures are carried out by counting the number of ingoing and outgoing links for a given website (indegree and outdegree centrality).  Centrality has an aspect of “closeness”, intended to single out the website with the shortest path to all others.  Betweeness estimates a website’s frequency with which it falls between the paths connecting other sites.
  33.  OHNA techniques have been promisingly applied in case studies dealing with topics such as e-commerce; social movements; and interpersonal, interorganizational, and international communication.  But, can links be used as proxies for scientific communication flows and as building blocks of new, web-inclusive scientometric indicators of research prominence?
  34.  In 1995, Bossy suggested that the digital network layer offered an unprecedented source of information on the scholarly sociocognitive activities that predate publication ouput.  It meant to move from bibliographic citation to webpages, websites and links from universities, departments, research institutes and individual scientists webpages.  At first, Altavista was used.
  35.  In 1995, Algorythm of co-word mapping by Prabowo and Thellwall was used by Leydesdorff and Curran to identify the connectivity patterns of the Triple-Helix.  The Web Impact Factor (WIF) of a site or area of the Eb, introduced by Ingwersen in 1998 may be defined as a measure of the frequency with which the average webpage of the site has been linked at a certain time.
  36.  S is the Site.  I is the total number of link pages (including self-link) to the Site.  P is the number of webpages published in S that are indexed by the search engine. WIF(S) = I = 100 = P 50 2
  37.  But where do link data come from? How reliable and valid are the tools for gathering them?  Commercial search engines don’t restore a reliable and consistent picture of global and local connectivity rates over time because:  Search engines crawl and index only a small portion of the World Wide Web. There is an “invisible web”.  Different search engines use distinct crawling algorithms.  Overlapping between competing search engines’ databases is small.
  38.  The WIF is also not a very good bibliometric measure, due to content variability and structural instability:  The number of links can be spuriously inflated by a huge number of unlinkable files, and the format of the webpage can be as single or split.  Webpages also lack coding standarization and their half-life is variable.  For longitudinal studies, www.archive.org can be used.
  39.  Since 2000 the Academic Web Link Database Project has been collecting link data relative to the academic web spaces of New Zealand, Australia, UK, Spain, China and Taiwan.  Mike Thelwall’s Alternative Document Models (ADMs) allow modulating link analysis by truncating the linking URLs at a higher level than that of the web page:  Directory  Domain  Site
  40.  The Webometrics Ranking of World Universities (www.webometrics.info) launched in 2004 in Spain.  It ranks web domains of academic and research organizations according to volume, visibility and impact of their content.  They apply WIF to capture ratio between visibility, measured by inlink rates returned by commercial search engines, and size, measured by number of hosted web pages.
  41.  Two additional measures, dubbed Rich file and Scholar Indexes, capture the volume of potentially relevant academic output in standard formats:  Adobe Portable Document Format .pdf  Adobe PostScript .ps  Microsoft Word Document .doc  Microsoft Powerpoint .ppt  And the number of papers and citations for each academic domain in Google Scholar.
  42.  Thelwall and colleagues’ methodology of link analysis also investigates the patterns of connections between groups of academic sites at the national level.  University websites have been found to be relatively more stable than other cyber- traces in longitudinal studies.  But we have to remember that web visibility and academic performance are different affairs.
  43.  Bibliometricians usually resort to direct surveys of webmasters’ reasons to link or hyperlink context and content analysis to investigate the psychological side of the link generation process.  Links usually are meant to facilitate navigation toward quarters of loosely structured and generically useful information, or to suggest related resources.  But they alone are not sufficient to pin down communication patterns on the Web and their statistical analysis will probably follow the same path of citation analysis.

Hinweis der Redaktion

  1. http://creativecommons.org.nz/wp-content/uploads/2011/10/openaccess.jpg
  2. Postscript is a document that lets you print visually rich documents reliably.
Anzeige