A longitudinal examination of SIGITE conference submission data

A LONGITUDINAL EXAMINATION OF
SIGITE CONFERENCE SUBMISSION DATA
2007‐2012
Presentation for SIGITE 2014 1 by Randy Connolly, Janet Miller, and Rob Friedman

THE ABSTRACT
This paper examines
submission data for the
SIGITE conference between
the years 2007‐2012.
It examines which external
factors and which internal
characteristics of the
submissions are related to
eventual reviewer ratings.
Ramifications of the findings
for future authors and
conference organizers are
also discussed.

RELATEDWORK Peer review is the main quality control
mechanism within the academic
sciences and is used for assessing the
merits of a written work as well as for
ensuring the standards of the academic
field.
3

PEER REVIEW
Enjoys broad support, yet …
BIAS PROBLEMS
• Author/Institution status
• Asymmetrical power
relations
SOLUTIONS
•Single‐Blind Reviews (SBR)
•Double‐Blind Reviews (DBR)
SIGITE 2007‐2012
Used Double‐Blind reviews

RESEARCH ON SBRAND DBR
RELIABILITY
ISSUES
VALIDITY
ISSUES

PEER REVIEW OFTEN LACKS RELIABILITY
That is, reviewers often differ strongly about the merits of any given paper.
6

PEER REVIEW OFTEN LACKS VALIDITY
There is often little relationship between the judgments of
reviewers and the subsequent judgments of the relevant
larger scholarly community as defined by eventual citations.

SOME RESEARCH
DISAGREES
Others have found that there is indeed a
“statistically significant association between
selection decisions and the applicants' scientific
achievements, if quantity and impact of research
publications are used as a criterion for scientific
achievement”

Our Study
PROVIDES A UNIQUE ADDITION TO THIS
LITERATURE
Unlike earlier work, our study assesses reviews and submissions for a single
international computing conference across an extended time period (2007‐2012).
It assesses the reliability of the peer view process at SIGITE by examining both
internal and external factors; the combination of these analyses is also unique.
This paper also provides some innovation in the measures it uses to assess the
validity of the peer review process.

10
METHOD
From 2007 to 2012, the ACM SIGITE
conference used the same “Grinnell”
submission system as the larger
SIGCSE and ITiCSE education
conferences.
10
This web‐based system was used by
authors to submit their work, by
reviewers to review submissions, and
by program committees to evaluate
reviews and to organize the eventual
conference program.

DATACOLLECTION
STEP 4
Data was further
manipulated in
Excel and then
exported and
statistically analyzed
using SPSS.
STEP 3
Other relevant
data (e.g., number
of references,
citation rates, etc)
were manually
gathered.
STEP 2
Since 2007‐2010
conferences used a
slightly different
process, the data
had to be
normalized.
STEP 1
Individual Access
databases used by
the submission
system for each year
had to be merged
into a single file.

12
RESULTS
Over the six years, there were 1026
reviews from 192 different reviewers,
and 508 authors were involved in
submitting a total of 332 papers.
12
The 2010 version of the conference
had the lowest number of paper
submissions (n=37), while the 2012
had the largest (n=87).

AUTHOR AND PAPER INFORMATION
Who were our authors and how did they do on their papers?

PAPERS WERE SUBMITTED FROM 32 DIFFERENT COUNTRIES
USA
N=378
Canada
N=24
Saudi Arabia
N=14
Pakistan
N=8
Italy
N=8
United Arab Emirates
N=8
Finland
N=7
Korea
N=7

Acceptance
Rate (74.1%)
However, this acceptance figure is
not representative of the true
acceptance rate of SIGITE,
because the review process was
altered back in 2011.
From 2007‐2010 there was a
separate abstract submission
stage, which helped reduce the
eventual number of rejected
papers during those years.

Actual acceptance rates were:
41% (2007)
63% (2008)
68% (2009)
49% (2010)
52% (2011)
58% (2012)

Single Author
31%
Two Authors
38%
Four+ Authors
16%
Three Authors
15%
There was no difference
in acceptance rates
between multi‐author
and single author papers.

PAPER CATEGORIES
What were our papers about?

CATEGORIES BY IT PILLAR

GENRE TRENDS

REVIEWER INFORMATION
Who were our reviewers?

REVIEWER INFORMATION
1026
reviews
192 reviewers
70% reviewed by
3 or 4 reviewers
3.11 reviews / paper

INTERESTING FINDING
The number of reviews a paper had was negatively correlated with its
probability of being accepted to the conference.
Generally speaking, the more reviews a paper had, the less likely it was of
being accepted!

RATING INFORMATION
What did the ratings look like?

FIVE CATEGORIES
Reviewers supplied a rating between 1 and 6 for five different categories
TECHNICAL ORGANIZATION ORIGINALITY SIGNIFICANCE OVERALL
3.62 mean 3.86 mean 3.70 mean 3.75 mean 3.60 mean

OVERALL RATING
Rating definitions and number received
Overall Rating Description N %
1 Deficient 51 5.0%
2 Below Average 192 18.7%
3 Average 223 21.7%
4 Very Good 254 24.8%
5 Outstanding 267 26.0%
6 Exceptional 39 3.8%
Total 1026 100.0%

INTERESTING FINDING
These subcategory ratings were significantly correlated (p<0.00) with the overall rating.
Additional post‐hoc testing showed significant relationships between every one of these
four factors and every level of overall rating, which suggested strong internal reliability
for each of the reviewers (i.e, each reviewer was consistent with him/herself).
Generally speaking, this means that the subcategory ratings were not really needed.

REVIEWER VARIABILITY
Central tendency statistics for these ratings alone does not adequately capture the
variability of reviewer scoring for poor, average, and excellent papers.

REVIEWER VARIABILITY
Combination of min vs max overall rating
Maximum Values
Minimum
Values
1 2 3 4 5 6 N
1 2 5 8 10 14 2 41
2 8 23 29 47 10 117
3 11 21 51 5 88
4 16 31 14 61
5 16 5 21
6 2 2
# papers 330

INTERESTING FINDING
While the overall statistics exhibited a strong tendency towards the mean,
paper ratings can vary considerably from reviewer to reviewer.
Based on these findings, it is recommended that future program
committees individually consider papers where rating scores deviate by 2
or more rating points.

FACTORS AFFECTING RATING
What things affect reviewer ratings?

Characteristics Reviewer
Here we looked at two
characteristics that may
impact reviewer ratings:
1. familiarity with the
subject being reviewed
2. regional location.

REVIEWER FAMILIARITY
FAMILIARITY
•For each review, reviewers
assigned themselves a
familiarity rating of low,
medium, or high
ANALYSIS
•We performed ANOVA tests
to see if the reviewer’s
familiarity affected their
ratings.
THERE WERE NO DIFFERENCES BETWEEN GROUPS
This supports findings of other researchers

WHAT ABOUT
REVIEWER LOCATION?

Europe
N=53
Everywhere else
N=70
English
Speaking
N=903
We found no differences
between regions

TEXTUAL
CHARACTERISTICS
36
We compared several
quantitative textual
measures on a subset of our
papers to see if any of them
were related to reviewers’
overall ratings.
The readability indices that
we tested included the
following:
the percentage of complex
words, the Flesh‐Kincaid
Reading Ease Index, the
Gunning Fog Score, the
SMOG index, and the
Coleman Liau Index.
All of these indices are
meant to measure the
reading difficulty of a block
of text.

TEXTUAL CHARACTERISTICS
The results
Characteristic Significant Correlation
Total number of words in paper
(n=55, M=3152.22) No r = 0.264
p = 0.052
Readability indices of paper
(n=55, M=39.33) No r = ‐0.016
p = 0.909
Readability indices of abstract
(n=34, M=30.96) No r = ‐0.083
p = 0.641
Total # of words in abstract
(n=159; M=115.13) Yes r = 0.379
p < 0.00
Number of references in paper
(n=159; M=16.47) Yes r = 0.270
p = 0.001

INTERESTING FINDING
We were not surprised to find that the number of references in the paper
would affect reviewer ratings.
We were surprised to discover that the length of the abstract affects
reviewer ratings!

PEER REVIEW VALIDITY
How accurate were our reviewers?

40
WHAT IS VALIDITY?
Validity refers to the degree to which a reviewer’s
ratings of a paper are reflective of the paper’s
actual value.
While this may be the goal of all peer
review, it is difficult to measure
objectively.
Perhaps the easiest way to assess the
academic impact and quality of a paper is
to examine the paper’s eventual citation
count.
We grouped all the accepted papers
(n=245) into four quartiles based on
average overall rating.
We then took a random sampling of 96
papers from all six years, with an even
number from each year and each quartile.
Image description
Lorem ipsum dolor sit amet

We gathered
THE NUMBER OF CITATIONS FROM GOOGLE SCHOLAR
As well as
THE NUMBER OF DOWNLOADS FROM
THE ACM DIGITAL LIBRARY
96 papers
Presentation for SIGITE 2014 41
by Randy Connolly, Janet Miller, and Rob Friedman
And then checked if
REVIEWER RATINGS WERE
REFLECTIVE OF CITATIONS
OR DOWNLOADS
For each of these

VALIDITY MEASURES
Did the peer review process at SIGITE predict the longer‐term impact of the paper?
Characteristic Significant Correlation
Number of Google Scholar citations
(n=96; M=4.60) No r = 0.121
p = 0.241
Cumulative ACM DL downloads to date
(n=96; M=239.61) No r = 0.096
p = 0.351
Number of ACM DL downloads in past year
(n=96; M=37.23) No r = 0.023
p = 0.822

This study has several limitations.
Our data set contained six years of data for a
computing education conference: such
conferences arguably have a unique set of
reviewers and authors in comparison to
“normal” computing conferences.
As such, there may be limits to the
generalizability of our results.
It is also important to recognize that
correlations are not the same as causation.
43

OTHER LIMITATIONS
In the future, we hope also to examine
whether reviewer reliability is related to
the experience level of the reviewer.
We would like to also fine tune our
validity analysis by seeing if correlations
differ for the top or bottom quartile of
papers.

SIGNIFICANT VARIABILITY IN REVIEWER
RATINGS
REVIEWER #1
4
REVIEWER #2
5
REVIEWER #3
1
REVIEWER #4
3
REVIEWER #5
2
Future program chairs would be advised to control
for this variability by increasing the number of
reviewers per paper.

4.0 Need
reviewers per paper in the future.

EXTERNAL FACTORS DID NOT MATTER
Happily, there was no evidence that the nationality (or whether they were
native English speakers) of the reviewer or the author played a statistical
significant role in the eventual ratings the paper received.
Presentation for SIGITE 2014 48
by Randy Connolly, Janet Miller, and Rob Friedman

SOME TEXTUAL FACTORS DID MATTER
Significant
Number of references
Significant
Number of words in abstract
No Significance
Total number of words in paper
No Significance
Readability Indices

50
WHY THE ABSTRACT?
We were quite surprised to find that
the number of words in the abstract
was statistically significant.
Presumably, reviewers read the
abstract particularly carefully.
As such, our results show that erring
on the side of abstract brevity is
usually a mistake.
On the contrary, our evidence shows
that it is important for authors to
make sure the abstract contains
sufficient information.

We also found that the number of
references was significant.
ACCEPTANCE Probability based on number of references
REJECTION
Almost None
Very few
Sufficient Lots of em!

21.26
per paper
16.47
per paper
+103%
SIGITE: Avg # of References
ACM Digital Library
+110%
Science Citation Index
34.36
per paper
+110%

OBVIOUS CONCLUSIONS
Making a concerted effort at increasing citations is likely to improve a
paper’s ratings with reviewers.
It should be emphasized that the number of citations is not the cause of
lower or better reviewer ratings.
Rather, the number of citations is likely a proxy measure for determining if
the paper under review is a properly researched paper that is connected to
the broader scholarly community.

Final Conclusion
VALIDITY
We did not find any connection between reviewers’ ratings of a paper and its
subsequent academic impact (measured by citations) or practical impact (measured by
ACM Digital Library downloads).
This might seem to be a disturbing result.
However, other research in this area also found no correlation between reviewer ratings
and subsequent academic impact.
It is important to remember that, “the aim of the peer review process is not the selection
of high impact papers, but is simply to filter junk papers and accept only the ones above
a certain quality threshold”.

FUTUREWORK
55
We hope to extend our analysis to
include not only more recent years, but
also to include more fine‐grained
examinations of the different factors
affecting peer review at the SIGITE
conference.

QUESTIONS?

A longitudinal examination of SIGITE conference submission data

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie A longitudinal examination of SIGITE conference submission data

Ähnlich wie A longitudinal examination of SIGITE conference submission data (20)

Mehr von Randy Connolly

Mehr von Randy Connolly (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

A longitudinal examination of SIGITE conference submission data