genealogy-tree-academic

Genealogy Tree: An Academic lineage for authors and their
Advisors,siblings and Students
Snehanshu Saha, Gouri Ginde, Sourav Poddar, Sandra Anil, Saijal Shrivastava,
Somya Bansal, Archana Mathur, Harika Samala, Namita Chaukimath, Shobhit Kumar
October 27, 2016
Abstract
Genealogy tree gives information about the researcher and his scholastic lineage which is of paramount
importance in today’s world of computer technology .Gaining an insight into academic genealogy could be a
way of helping phd students or early career academics in the field ,achieve Academic socialization within the
discipline by making explicit connections that may be influential .Awareness of his scientific heritage , gives
the user a broader perspective of his own research project .The paper puts forth a software model which creates
genealogy tree of any academician .This software intends to become a reference tool ,made more reliable through
contributions of the users.
1 INTRODUCTION
Genealogy is an account of descent of a person, family or group from an ancestor or from older forms.It is the
study of the history of the past and present members of a family or families. Historical records are used for
genealogical research. The ideal sources are original records mostly primary or firsthand information and the
conclusions which can be drawn from them. Source citation is also important while conducting genealogical
research. Academic genealogy is tracing the mentoring relationships of doctorate students. A genealogy tree can
be formulated on this basis where student is considered a child with his adviser as the parent. A student can have
multiple advisers. Students of the same adviser belong to a common sibling network in the tree.This tree traces
the academic pedigree of each entity in it.
Over the years the number of people pursuing PhD have increased, leading to an exponential growth of the
academic genealogical tree. With this rising number, keeping track and documentation of scholastic relationships
between scientists has become difficult. An attempt in this direction has been made the American Mathematical
Society by means of their Mathematics Genealogy Project. Their objective is to catalog the complete mathematics
community. It gives information of an author, his ancestry and lineage in the tree along with his dissertation and
year of being awarded the degree. A similar software model has been put forth by this paper for the department
of Computer Science. This would hold information about all the scientists who have contributed to the field at
research-level.
The database is built by contributions of the scientists who input their details like dissertation and year and
institute of procuring degree. This database is then searched based on user input. The tree obtained can be
based on two criterion: author or domain. The tree based on author describes the author’s heritage and his
descendants.Details about the author’s degree are also provided in this genealogical tree. Computer Science can
be perceived as an umbrella housing a large number of domains which have multiple research areas within them.
The domain based tree traces the complete hierarchy of scientists who have contributed to it.
2 COMMUNITY DETECTION MODEL
In this section we discuss the concepts used for detecting communities among authors by calculating citations .We
also discuss about the different cases encountered during the process of community detection
∗*This work was supported by PES Institute of Technology Bangalore South and Indian Institute of Technology Patna in the form of
funding research associates Gouri Ginde.
†2. Authors are affiliated to Faculty of Computer Science and Engineering and Center for Appplied Mathematical Modeling and Simula-
tion(CAMMS), PESIT South Campus, Bangalore, India.

• Community
A network is said to have community structure if the nodes of the network can be easily grouped into sets
of nodes such that each set of nodes is densely connected internally
• Community Detection
The adjacency matrix c[i][j] is an author id matrix where the value present at the intersection of ith row and
jth column is the number of times author i cities author j.total citation list of all authors is represented by
t[i]
A B C D E F G H






















A 25 21 18 0 0 0 0 0
B 17 3 0 23 0 0 0 0
C 25 0 15 15 10 0 0 0
D 0 22 5 53 0 10 16 0
E 0 0 0 0 0 0 20 0
F 12 0 0 7 0 0 0 0
G 0 0 0 0 0 4 0 0
H 6 0 0 0 0 0 0 41
Figure 1: Author Citation Matrix for Sample Graph
1: Input: An adjacency matrix c[i][j] representing citation information,author ids,total citations total[i]
2: Output: An equivalence class of authors
3: c[i][j] represents the number of citations done by author i to author j
4: Diagonal entries of c[i][j] represents self citations of author stored as x
5: for every author id iinthematrixc[i][j] do
6: for every author id jinthematrixc[i][j] do
7: if x >= 0.5∗t[i] then
8: corrupt author count+ = 1
9: sel fcite author count+ = 1
10: Forming list for realtionships
11: r sel f = author id
12: end if
13: end for
14: end for
15: for every author id iinthematrixc[i][j] do
16: for every author id jinthematrixc[i][j] do
17: if c[i][j] > 0.5∗t[j] then
18: k = k +1
19: if c[i][j] > 0.5∗t[i] then
20: s = s+1
21: corrupt author count+ = 2
22: r bidirectional = author id i : author id j
23: else
24: r unidirectional = author id i
25: end if
26: end if
27: end for
28: end for
29: if k=s=z then
30: Forming dictionary for Mafia Network
31: rmafia=author id i:z 1,z 2,z 3 ......z n
32: end if
33: Forming relationship using the output r self,r unidirectional,r bidirectional and rmafia gives the community network.
• This algorithm checks for every author id i if the number of selfcitation of i is greater than threshold of its
total citations.
• When the number of self citation of an author are greater than threshold percentage of total citations,the
author is said to be corrupt and incremented by 1 when the self citation is greater than threshold percentage
of total citations then self cite author count is incremented.

qA qB
qC qD
qE qF
qG qH
21
17
18 25
7
10
20
4
710
16
15
5
12
6
2223
25 3
15 53
41
Figure 2: Sample Author Network
• r self is a list of authors who have selfcitied more than the given threshold value author id i which satisﬁes
the if condition is added to this list
• Variable k keeps count of authors who have been cited more than the threshold value by author j
• Variable s keeps count of authors who have been cited more than the threshold value by author i
• For every author i cited more than the threshold value by author j ,increment k
• For every author j cited more than the threshold value by author i ,increment s
• If bidirectional relationship exists between author i and author j ,corrupt author count is incremented by 2
and author ids of i and j are added to the dictionary r bidirectional as a key value pair where author id of i
is the key and the author id of j is value
• If unidirectional relationship exists between author i and author j ,corrupt author count is incremented by
1 and author id of i is added to the list r unidirectional which keeps track of unidirectional relationships
• If k and s are equal to the numeric parameter z then they are added to the dictionary rmaﬁa where any one
of the author ids of the network acts as a key to access all the other authors
• LOCAL CITEConsider an author has 200 citations.Out of 200 citations if 70 percent of author citation
is from siblings then list all the citations who collabroated with this author and also list all citations with
others.
• If author A cities author B and also author B cities author A then it is said to exist binary realtion between the
author A and author B .This information is represented in the form of matrix c[i][j] and the binary realtion
is represented with 1
• SUSPECTED AUTHORThe Author holding the comparable binary realtionships is said to be suspected author

A B C D E F G H






















A 25 21 18 0 0 0 0 0
B 17 3 0 23 0 0 0 0
C 25 0 15 15 10 0 0 0
D 0 22 5 53 0 10 16 0
E 0 0 0 0 0 0 20 0
F 12 0 0 7 0 0 0 0
G 0 0 0 0 0 4 0 0
H 6 0 0 0 0 0 0 41
Figure 3: Author Citation Matrix for Sample Graph
Algorithm 1 MAFIA IDENTIFICATION
0
1: Input: Collection of large data sets for citation information reprsented by matrix M
2: Output: Calculating Threshold and identifying binary Realtions between Suspected authors
3: To calculate threshold for an author in suspected list L
4: for doi in list L
5: for do j in list L
6: Threshold ← ∑c[i][j]÷suspected authors
7: if thenThreshold < x x is calculated from trend among siblings
8: c[i][j] > Definedvalue Obtained from trend algorithm
9: the author is said to be involved in maﬁa
10: end if
11: end for
12: end for

genealogy-tree-academic

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie genealogy-tree-academic

Ähnlich wie genealogy-tree-academic (20)

genealogy-tree-academic