A presentation on Google Scholar, webometrics ranking of higher institutions and Open Access to research publications. The presentation details the parameters Google scholar uses for indexing research publications and the implication of that for the visibility of scholars, their institutions and their webometrics rank.
A Critique of the Proposed National Education Policy Reform
Understanding the Depth of Google Scholar and its Implication for Webometrics Ranking of Higher Institutions
1. Adegbilero-Iwari, Idowu is:
An IFLA/OCLC Fellow,
An Emerging Technology Librarian,
A DeGruyter Open Access Funding Board Member and a
Understanding the Depth of Google Scholar and its
Implication for Webometrics Ranking of Higher
Institutions
Presented by Adegbilero-Iwari, Idowu
At the 2016 Open Access Week Programme, Elizade University, Ilara-Mokin,
Ondo State, Nigeria
Date: 25th October, 2016
2. Open Access, briefly
• Open Access is the
free, immediate,
online availability
of research
articles, coupled
with the rights to
use these articles
fully in the digital
environment
• Ways authors can provide
open access:
• self-archiving their journal
articles in an open
access repository, also known
as 'green' open access, or
• publishing in an open access
journal, known as 'gold' open
access but with the payment
of an APC
4. Open Access according to Peter Suber
• Peter Suber of Harvad Library’S OSC and
author of Open Access has it that,
"The basic idea of OA is simple: Make research
literature available online without price barriers
and without most permission barriers.“
• If this was said in Harvard, then we must sing
it in Africa
5. But the Dilemma
• Institutions in Resource poor countries are likely
not able to subscribe to pay-walled journals or
databases
• In the same vein, their scholars are likely not able
to pay over a $1000 a high impact factor journal
will ask for as Open Access Charges
• The Result? research outputs of such nations will
continue to be in the Dark or their scholars fall
for Predatory OA publishers with as low as $100
6. And Some help? The website that shows you this
image, that is it!
http://why
openresea
rch.org/
7. Reducing publishing costs Tips from the Site
i. Find a no-cost open access journal:
as of 2014, over 70% of journals
indexed in the Directory of Open
Access Journals
ii. Find a low-cost open access journal
iii. Request a waiver
8. Archiving and Publishing Colours
• According to Bill Hubbard of SHERPA’s Repository
Support Project, we have
• Green: can archive pre-print and post-print
• Blue: can archive post-print (i.e. final draft post-
refereeing)
• Yellow: can archive pre-print (i.e. pre-refereeing)
• White: archiving not formally supported Open
• Open Access Publishing i.e. Gold Publishing or Gold
route to open access: author pays cost of article
publication and the work is freely available.
9. Open Access and Webometrics Ranking of World
Universities
• The overall goals are Visibility and Impact
• Thus, the relationship between and OA and
Webometrics Ranking (The Ranking) is direct
• The aims of the Ranking are:
• 1. To improve the Web presence of research
and academic institutions
• 2. To promote Open Access to research
10. Webometrics Ranking
• The Ranking measures the strength of
universities’ web presence using their:
• A. web domain
• B. Sub-pages
• C. Rich files
• D. Scholarly articles
11. Webometrics
• Largest academic ranking of Higher Education
Institutions
• Performed by Cybermetrics Lab (Spanish
National Research Council, CSIC)
• Started in 2004 based on ARWU and released
twice per year since 2006
• Global in scope and based on the web
presence and impact of universities
12. The Ranking is
not:
• to evaluate websites, their design or usability
or the popularity of their contents according
to the number of visits or visitors
• But measures:
• All of universities’ tripartite mission: research,
teaching and “the economic relevance of the
technology transfer to industry, the
community engagement”
13. Categories of the Webometrics
Ranking
i. The Ranking Web of World Universities
(Green)
ii. The Ranking of Institutional Repositories
(Red)
iii. The Ranking of Hospitals (Grey)
iv. The Ranking of Research Centers (Blue)
v. The Ranking of Business Schools* (Orange)
14. Why Learn Google Scholar?
• Google Scholar score is 30% for both i and ii
above
• It, thus, worth understanding the depth of
Google Scholar
15.
16. Google Scholar (GS)
• An indexer*
• A machine (Search Engine)
• A research tool
• A researcher’s ladder to the top
• Or, as a researcher, why search Google when
you have Google Scholar???
21. Google Scholar: H-Index (Hirsch Index
or Hirsch Number)
• The h-index is an author-level metric that
attempts to measure both
the productivity and citation impact of
the publications of a scientist or scholar based
on the set of the scientist's most cited papers
and the number of citations that they have
received in other publications
22. Google Scholar: H-Index (Hirsch Index
or Hirsch Number)
• The index can also be applied to the
productivity and impact of a scholarly
journal as well as a group of scientists, such as
a department or university or country
• The index was suggested in 2005 by Jorge E.
Hirsch, a physicist at University of California,
San Diego as a tool for determining theoretical
physicists' relative quality
23. H-Index simply defined
• It goes like, a scholar with an index of h has
published h papers each of which has been
cited in other papers at least h times
• the h-index reflects both the number of
publications and the number of citations per
publication
24. H-index calculated
• If f is the function that corresponds to the
number of citations for each publication, we
compute the h index as follows:
• First, we order the values of f from the largest
to the lowest value.
• Then, we look for the last position in which f is
greater than or equal to the position (we
call h this position)
25. H-index cont’d
• Example
• if we have a researcher with 6 publications A, B,
C, D, E and F with 11, 9, 7, 4, 3 and 2 citations,
respectively, the h-index is equal to 5 because the
5th publication has 3 citations and the 6th has
only 2
• Tools for measuring H-Index:
Web of Science
Scopus
Google Scholar
28. Google Scholar Metrics
• Google Scholar Metrics provide an easy way for authors to quickly gauge the visibility and influence
of recent articles in scholarly publications.
• Scholar Metrics summarize recent citations to many publications, to help authors as they consider
where to publish their new research.
Coverage of Publications
• Scholar Metrics currently cover articles published between 2011 and 2015, both inclusive. The
metrics are based on citations from all articles that were indexed in Google Scholar in June 2016.
Included Publications:
• journal articles from websites that follow Scholar’s inclusion guidelines;
• selected conference articles in Computer Science and Electrical Engineering;
• preprints from arXiv, SSRN, NBER and RePEC - for these sites, metrics are computed for individual
collections, e.g., "arXiv Superconductivity (cond-mat.supr-con)" or "CEPR Discussion Papers".
Excluded Publications:
• court opinions, patents, books, and dissertations;
• publications with fewer than 100 articles published between 2011 and 2015;
• publications that received no citations to articles published between 2011 and 2015.
31. GS as Indexer: Getting Included
Channel 1: Individual Author’s
Website
e.g.,
“www.example.edu/~professo
r/jpdr2009.pdf; and add a link
to it on your publications page,
such as
www.example.edu/~professor
/publications.html.”
Criteria for Inclusion
• the full text of your paper is in a
PDF file that ends with ".pdf",
• the title of the paper appears in
a large font on top of the first
page,
• the authors of the paper are
listed right below the title on a
separate line, and
• there's a bibliography section
titled, e.g., "References" or
"Bibliography" at the end.These done, GS search robots should
normally find your paper and include it.
32. Channel 2
Institutional Repositories
• Institutional repositories should use the latest
version of popular repository software such as
Eprints (eprints.org), Digital Commons
(digitalcommons.bepress.com), or DSpace
(dspace.org) to host researchers’ papers.
• Repositories must be configured for indexing
in Google Scholar
33. Channel 3
Journal Publishers
• Three options:
• Use established journal hosting services, e.g., Atypon and
Highwire
• Or
• Use Aggregators that host many journals on a single
website, such as JSTOR or SciELO only if they support full-
text indexing in GS
• Or
• Use Open Journal Systems (OJS) software that's available
for download from the Public Knowledge Project (PKP) if
you have technical expertise to manage your site
34. • The content of your
website needs to meet the
two basic criteria:
• 1. Scholarly articles:
journal papers, conference
papers, technical reports,
or their drafts,
dissertations, pre-prints,
post-prints, or abstracts
• 2. Abstract shown (or
contains full-text of article)
Guidelines for Contents
Things the site must avoid:
must not require users (or search
robots) to sign in, install special
software, accept disclaimers,
dismiss popup or interstitial
advertisements, click on links or
buttons, or scroll down the page
before they can read the entire
abstract of the paper.
35. 1. File formats must be HTML
or PDF with searchable text
not exceeding 5MB
2. Good browse interface for
search robots to discover
your articles urls
Crawl Guidelines
Note:
Just like Google search GS uses
automated software, known as
"robots" or "crawlers", to fetch
your files for inclusion in the
search results.
Guide for organizing website containing a small publication:
• list all articles on a single HTML page, such as
www.example.edu/~professor/publications.html, and include links to their full text in the
PDF format
For sites containing 1000s of publications:
• list them by the date of publication or the date of record entry instead of browse by
author or browse by keywords interfaces
• create an additional browse interface that lists only the articles added in the last two
weeks
• use of Flash, JavaScript, or form-based navigation makes it hard for our automated system
to find your articles so add browse by date interface that uses only simple HTML GET links
if your site uses any of these.
36. 3. Website availability: at all
times to both crawler and
users
4. Robots exclusion protocol:
Crawl Guideline cont’d
While it should block robots from accessing large
dynamically generated spaces that aren't useful in the
discovery of your articles, such as shopping carts,
comment forms, your website must however NOT block
Google's search robots from accessing your articles or
your browse URLs.
37. • Things to do:
1. When preparing article URLs:
Each paper must have its own unique URL
in order for it to be included in Google
Scholar. Place each article and each
abstract in a separate HTML or PDF file.
2.a. When Configuring the meta-tags:
Configure your repository or journal
management software to export
bibliographic data in HTML "<meta>" tags
e.g
• The title tag, e.g., citation_title or
DC.title, must contain the title of the
paper
• The publication date tag, e.g.,
citation_publication_date or DC.issued,
must contain the date of publication
Indexing Guidelines
Note:
• GS uses automated software,
known as "parsers", to identify
bibliographic data of your
papers, as well as references
between the papers.
• Incorrect identification of
bibliographic data or
references will lead to poor
indexing of your site.
38. 2.b. Indexing of content
without the meta-tags
i. The title of the paper must be the
largest chunk of text on top of the
page say font size 24
ii. The authors of the paper must be
listed right before or right after the
title, in a slightly smaller font that is
still larger than normal text say 16-23
iii. Include a bibliographic citation to a
published version of the paper on a
line by itself, and place it inside the
header or the footer of the first page
in the PDF file or if unpublished,
include the full date of its present
version on a line by itself
iv. Avoid use of Type 3 fonts in PDF files,
because they're often generated with
missing or incorrect font size and
character encoding information
If it is not possible
to implement the HTML
"<meta>" tags, e.g., if your
papers are only available in
the PDF format, then the
document needs to be
visually laid out according to
the following conventions
39. 3. When Marking
the References
• Mark the section of the paper
that contains references to other
works with a standard heading,
such as "References" or
"Bibliography", on a line just by
itself
• Individual references inside this
section should be either
numbered "1. - 2. - 3." or "[1] -
[2] - [3]" in PDF, or put inside an
"<ol>" list in HTML.
• The text of each reference must
be a formal bibliographic
citation in a commonly used
format, without free-form
commentary.
Note:
references are identified
automatically by the parser
software; they're not entered
or corrected by human
operators
40. Bibliography
1. Google Scholar
https://scholar.google.com/intl/en/scholar/inclusion.html#indexing
2. Wikipedia https://en.wikipedia.org/wiki/H-index#i10-index
3. Bill Hubbard
http://www.sherpa.ac.uk/documents/sherpaplusdocs/Nottingham-
colour-guide.pdf
4. Peter Suber https://osc.hul.harvard.edu/policies/
5. Cornell University Library
http://guides.library.cornell.edu/c.php?g=32272&p=203391
6. SPARC http://www.sparc.arl.org/issues/open-
access#sthash.EzrFGvc1.dpuf
7. Why Open Research? http://whyopenresearch.org/costs.html
8. http://www.123rf.com/photo_7911221_3d-man-on-bicycle.html
9. http://www.deviantart.com/tag/asante