SlideShare ist ein Scribd-Unternehmen logo
1 von 4
Downloaden Sie, um offline zu lesen
Genealogy Tree: An Academic lineage for authors and their
Advisors,siblings and Students
Snehanshu Saha, Gouri Ginde, Sourav Poddar, Sandra Anil, Saijal Shrivastava,
Somya Bansal, Archana Mathur, Harika Samala, Namita Chaukimath, Shobhit Kumar
October 27, 2016
Abstract
Genealogy tree gives information about the researcher and his scholastic lineage which is of paramount
importance in today’s world of computer technology .Gaining an insight into academic genealogy could be a
way of helping phd students or early career academics in the field ,achieve Academic socialization within the
discipline by making explicit connections that may be influential .Awareness of his scientific heritage , gives
the user a broader perspective of his own research project .The paper puts forth a software model which creates
genealogy tree of any academician .This software intends to become a reference tool ,made more reliable through
contributions of the users.
1 INTRODUCTION
Genealogy is an account of descent of a person, family or group from an ancestor or from older forms.It is the
study of the history of the past and present members of a family or families. Historical records are used for
genealogical research. The ideal sources are original records mostly primary or firsthand information and the
conclusions which can be drawn from them. Source citation is also important while conducting genealogical
research. Academic genealogy is tracing the mentoring relationships of doctorate students. A genealogy tree can
be formulated on this basis where student is considered a child with his adviser as the parent. A student can have
multiple advisers. Students of the same adviser belong to a common sibling network in the tree.This tree traces
the academic pedigree of each entity in it.
Over the years the number of people pursuing PhD have increased, leading to an exponential growth of the
academic genealogical tree. With this rising number, keeping track and documentation of scholastic relationships
between scientists has become difficult. An attempt in this direction has been made the American Mathematical
Society by means of their Mathematics Genealogy Project. Their objective is to catalog the complete mathematics
community. It gives information of an author, his ancestry and lineage in the tree along with his dissertation and
year of being awarded the degree. A similar software model has been put forth by this paper for the department
of Computer Science. This would hold information about all the scientists who have contributed to the field at
research-level.
The database is built by contributions of the scientists who input their details like dissertation and year and
institute of procuring degree. This database is then searched based on user input. The tree obtained can be
based on two criterion: author or domain. The tree based on author describes the author’s heritage and his
descendants.Details about the author’s degree are also provided in this genealogical tree. Computer Science can
be perceived as an umbrella housing a large number of domains which have multiple research areas within them.
The domain based tree traces the complete hierarchy of scientists who have contributed to it.
2 COMMUNITY DETECTION MODEL
In this section we discuss the concepts used for detecting communities among authors by calculating citations .We
also discuss about the different cases encountered during the process of community detection
∗*This work was supported by PES Institute of Technology Bangalore South and Indian Institute of Technology Patna in the form of
funding research associates Gouri Ginde.
†2. Authors are affiliated to Faculty of Computer Science and Engineering and Center for Appplied Mathematical Modeling and Simula-
tion(CAMMS), PESIT South Campus, Bangalore, India.
• Community
A network is said to have community structure if the nodes of the network can be easily grouped into sets
of nodes such that each set of nodes is densely connected internally
• Community Detection
The adjacency matrix c[i][j] is an author id matrix where the value present at the intersection of ith row and
jth column is the number of times author i cities author j.total citation list of all authors is represented by
t[i]
A B C D E F G H






















A 25 21 18 0 0 0 0 0
B 17 3 0 23 0 0 0 0
C 25 0 15 15 10 0 0 0
D 0 22 5 53 0 10 16 0
E 0 0 0 0 0 0 20 0
F 12 0 0 7 0 0 0 0
G 0 0 0 0 0 4 0 0
H 6 0 0 0 0 0 0 41
Figure 1: Author Citation Matrix for Sample Graph
1: Input: An adjacency matrix c[i][j] representing citation information,author ids,total citations total[i]
2: Output: An equivalence class of authors
3: c[i][j] represents the number of citations done by author i to author j
4: Diagonal entries of c[i][j] represents self citations of author stored as x
5: for every author id iinthematrixc[i][j] do
6: for every author id jinthematrixc[i][j] do
7: if x >= 0.5∗t[i] then
8: corrupt author count+ = 1
9: sel fcite author count+ = 1
10: Forming list for realtionships
11: r sel f = author id
12: end if
13: end for
14: end for
15: for every author id iinthematrixc[i][j] do
16: for every author id jinthematrixc[i][j] do
17: if c[i][j] > 0.5∗t[j] then
18: k = k +1
19: if c[i][j] > 0.5∗t[i] then
20: s = s+1
21: corrupt author count+ = 2
22: r bidirectional = author id i : author id j
23: else
24: r unidirectional = author id i
25: end if
26: end if
27: end for
28: end for
29: if k=s=z then
30: Forming dictionary for Mafia Network
31: rmafia=author id i:z 1,z 2,z 3 ......z n
32: end if
33: Forming relationship using the output r self,r unidirectional,r bidirectional and rmafia gives the community network.
• This algorithm checks for every author id i if the number of selfcitation of i is greater than threshold of its
total citations.
• When the number of self citation of an author are greater than threshold percentage of total citations,the
author is said to be corrupt and incremented by 1 when the self citation is greater than threshold percentage
of total citations then self cite author count is incremented.
qA qB
qC qD
qE qF
qG qH
21
17
18 25
7
10
20
4
710
16
15
5
12
6
2223
25 3
15 53
41
Figure 2: Sample Author Network
• r self is a list of authors who have selfcitied more than the given threshold value author id i which satisfies
the if condition is added to this list
• Variable k keeps count of authors who have been cited more than the threshold value by author j
• Variable s keeps count of authors who have been cited more than the threshold value by author i
• For every author i cited more than the threshold value by author j ,increment k
• For every author j cited more than the threshold value by author i ,increment s
• If bidirectional relationship exists between author i and author j ,corrupt author count is incremented by 2
and author ids of i and j are added to the dictionary r bidirectional as a key value pair where author id of i
is the key and the author id of j is value
• If unidirectional relationship exists between author i and author j ,corrupt author count is incremented by
1 and author id of i is added to the list r unidirectional which keeps track of unidirectional relationships
• If k and s are equal to the numeric parameter z then they are added to the dictionary rmafia where any one
of the author ids of the network acts as a key to access all the other authors
• LOCAL CITEConsider an author has 200 citations.Out of 200 citations if 70 percent of author citation
is from siblings then list all the citations who collabroated with this author and also list all citations with
others.
• If author A cities author B and also author B cities author A then it is said to exist binary realtion between the
author A and author B .This information is represented in the form of matrix c[i][j] and the binary realtion
is represented with 1
• SUSPECTED AUTHORThe Author holding the comparable binary realtionships is said to be suspected author
A B C D E F G H






















A 25 21 18 0 0 0 0 0
B 17 3 0 23 0 0 0 0
C 25 0 15 15 10 0 0 0
D 0 22 5 53 0 10 16 0
E 0 0 0 0 0 0 20 0
F 12 0 0 7 0 0 0 0
G 0 0 0 0 0 4 0 0
H 6 0 0 0 0 0 0 41
Figure 3: Author Citation Matrix for Sample Graph
Algorithm 1 MAFIA IDENTIFICATION
0
1: Input: Collection of large data sets for citation information reprsented by matrix M
2: Output: Calculating Threshold and identifying binary Realtions between Suspected authors
3: To calculate threshold for an author in suspected list L
4: for doi in list L
5: for do j in list L
6: Threshold ← ∑c[i][j]÷suspected authors
7: if thenThreshold < x x is calculated from trend among siblings
8: c[i][j] > Definedvalue Obtained from trend algorithm
9: the author is said to be involved in mafia
10: end if
11: end for
12: end for

Weitere ähnliche Inhalte

Ähnlich wie genealogy-tree-academic

DH2017 - Intellectual Structure of Digital Humanities: author co-citation net...
DH2017 - Intellectual Structure of Digital Humanities: author co-citation net...DH2017 - Intellectual Structure of Digital Humanities: author co-citation net...
DH2017 - Intellectual Structure of Digital Humanities: author co-citation net...Jin Gao
 
The Mathematics of Social Network Analysis: Metrics for Academic Social Networks
The Mathematics of Social Network Analysis: Metrics for Academic Social NetworksThe Mathematics of Social Network Analysis: Metrics for Academic Social Networks
The Mathematics of Social Network Analysis: Metrics for Academic Social NetworksEditor IJCATR
 
Digital Author Identifier (DAI) / Author Identifier System (AIS)
Digital Author Identifier (DAI) / Author Identifier System (AIS)Digital Author Identifier (DAI) / Author Identifier System (AIS)
Digital Author Identifier (DAI) / Author Identifier System (AIS)Thiyagu K
 
Social Network Analysis report
Social Network Analysis reportSocial Network Analysis report
Social Network Analysis reportMatthieu Cisel
 
datasets source.pptx
datasets source.pptxdatasets source.pptx
datasets source.pptxSuhaAbdullah5
 
Relational database
Relational databaseRelational database
Relational databaseSanthiNivas
 
Social Friend Overlying Communities Based on Social Network Context
Social Friend Overlying Communities Based on Social Network ContextSocial Friend Overlying Communities Based on Social Network Context
Social Friend Overlying Communities Based on Social Network ContextIRJET Journal
 
Beyond Collaborative Filtering: Learning to Rank Research Articles
Beyond Collaborative Filtering: Learning to Rank Research ArticlesBeyond Collaborative Filtering: Learning to Rank Research Articles
Beyond Collaborative Filtering: Learning to Rank Research ArticlesMaya Hristakeva
 
Birds of a Feather Flock Together? A Study of Developers’ Flocking and Migrat...
Birds of a Feather Flock Together? A Study of Developers’ Flocking and Migrat...Birds of a Feather Flock Together? A Study of Developers’ Flocking and Migrat...
Birds of a Feather Flock Together? A Study of Developers’ Flocking and Migrat...IJCSIS Research Publications
 
Advance Data Mining Project Report
Advance Data Mining Project ReportAdvance Data Mining Project Report
Advance Data Mining Project ReportArnab Mukhopadhyay
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningEditor IJCATR
 
IRJET- Privacy Preserving Friend Matching
IRJET- Privacy Preserving Friend MatchingIRJET- Privacy Preserving Friend Matching
IRJET- Privacy Preserving Friend MatchingIRJET Journal
 
Desigining of Database - ER Model
Desigining of Database - ER ModelDesigining of Database - ER Model
Desigining of Database - ER ModelAjay Chhimpa
 
Community DetectionSlide
Community DetectionSlideCommunity DetectionSlide
Community DetectionSlideAshwini Tokekar
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 

Ähnlich wie genealogy-tree-academic (20)

DH2017 - Intellectual Structure of Digital Humanities: author co-citation net...
DH2017 - Intellectual Structure of Digital Humanities: author co-citation net...DH2017 - Intellectual Structure of Digital Humanities: author co-citation net...
DH2017 - Intellectual Structure of Digital Humanities: author co-citation net...
 
The Mathematics of Social Network Analysis: Metrics for Academic Social Networks
The Mathematics of Social Network Analysis: Metrics for Academic Social NetworksThe Mathematics of Social Network Analysis: Metrics for Academic Social Networks
The Mathematics of Social Network Analysis: Metrics for Academic Social Networks
 
Web Mining .ppt
Web Mining .pptWeb Mining .ppt
Web Mining .ppt
 
Web Mining .ppt
Web Mining .pptWeb Mining .ppt
Web Mining .ppt
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Digital Author Identifier (DAI) / Author Identifier System (AIS)
Digital Author Identifier (DAI) / Author Identifier System (AIS)Digital Author Identifier (DAI) / Author Identifier System (AIS)
Digital Author Identifier (DAI) / Author Identifier System (AIS)
 
Social Network Analysis report
Social Network Analysis reportSocial Network Analysis report
Social Network Analysis report
 
datasets source.pptx
datasets source.pptxdatasets source.pptx
datasets source.pptx
 
Relational database
Relational databaseRelational database
Relational database
 
Social Friend Overlying Communities Based on Social Network Context
Social Friend Overlying Communities Based on Social Network ContextSocial Friend Overlying Communities Based on Social Network Context
Social Friend Overlying Communities Based on Social Network Context
 
Beyond Collaborative Filtering: Learning to Rank Research Articles
Beyond Collaborative Filtering: Learning to Rank Research ArticlesBeyond Collaborative Filtering: Learning to Rank Research Articles
Beyond Collaborative Filtering: Learning to Rank Research Articles
 
Birds of a Feather Flock Together? A Study of Developers’ Flocking and Migrat...
Birds of a Feather Flock Together? A Study of Developers’ Flocking and Migrat...Birds of a Feather Flock Together? A Study of Developers’ Flocking and Migrat...
Birds of a Feather Flock Together? A Study of Developers’ Flocking and Migrat...
 
Advance Data Mining Project Report
Advance Data Mining Project ReportAdvance Data Mining Project Report
Advance Data Mining Project Report
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data Mining
 
IRJET- Privacy Preserving Friend Matching
IRJET- Privacy Preserving Friend MatchingIRJET- Privacy Preserving Friend Matching
IRJET- Privacy Preserving Friend Matching
 
Desigining of Database - ER Model
Desigining of Database - ER ModelDesigining of Database - ER Model
Desigining of Database - ER Model
 
Final Algos
Final AlgosFinal Algos
Final Algos
 
Community DetectionSlide
Community DetectionSlideCommunity DetectionSlide
Community DetectionSlide
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED  ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 

genealogy-tree-academic

  • 1. Genealogy Tree: An Academic lineage for authors and their Advisors,siblings and Students Snehanshu Saha, Gouri Ginde, Sourav Poddar, Sandra Anil, Saijal Shrivastava, Somya Bansal, Archana Mathur, Harika Samala, Namita Chaukimath, Shobhit Kumar October 27, 2016 Abstract Genealogy tree gives information about the researcher and his scholastic lineage which is of paramount importance in today’s world of computer technology .Gaining an insight into academic genealogy could be a way of helping phd students or early career academics in the field ,achieve Academic socialization within the discipline by making explicit connections that may be influential .Awareness of his scientific heritage , gives the user a broader perspective of his own research project .The paper puts forth a software model which creates genealogy tree of any academician .This software intends to become a reference tool ,made more reliable through contributions of the users. 1 INTRODUCTION Genealogy is an account of descent of a person, family or group from an ancestor or from older forms.It is the study of the history of the past and present members of a family or families. Historical records are used for genealogical research. The ideal sources are original records mostly primary or firsthand information and the conclusions which can be drawn from them. Source citation is also important while conducting genealogical research. Academic genealogy is tracing the mentoring relationships of doctorate students. A genealogy tree can be formulated on this basis where student is considered a child with his adviser as the parent. A student can have multiple advisers. Students of the same adviser belong to a common sibling network in the tree.This tree traces the academic pedigree of each entity in it. Over the years the number of people pursuing PhD have increased, leading to an exponential growth of the academic genealogical tree. With this rising number, keeping track and documentation of scholastic relationships between scientists has become difficult. An attempt in this direction has been made the American Mathematical Society by means of their Mathematics Genealogy Project. Their objective is to catalog the complete mathematics community. It gives information of an author, his ancestry and lineage in the tree along with his dissertation and year of being awarded the degree. A similar software model has been put forth by this paper for the department of Computer Science. This would hold information about all the scientists who have contributed to the field at research-level. The database is built by contributions of the scientists who input their details like dissertation and year and institute of procuring degree. This database is then searched based on user input. The tree obtained can be based on two criterion: author or domain. The tree based on author describes the author’s heritage and his descendants.Details about the author’s degree are also provided in this genealogical tree. Computer Science can be perceived as an umbrella housing a large number of domains which have multiple research areas within them. The domain based tree traces the complete hierarchy of scientists who have contributed to it. 2 COMMUNITY DETECTION MODEL In this section we discuss the concepts used for detecting communities among authors by calculating citations .We also discuss about the different cases encountered during the process of community detection ∗*This work was supported by PES Institute of Technology Bangalore South and Indian Institute of Technology Patna in the form of funding research associates Gouri Ginde. †2. Authors are affiliated to Faculty of Computer Science and Engineering and Center for Appplied Mathematical Modeling and Simula- tion(CAMMS), PESIT South Campus, Bangalore, India.
  • 2. • Community A network is said to have community structure if the nodes of the network can be easily grouped into sets of nodes such that each set of nodes is densely connected internally • Community Detection The adjacency matrix c[i][j] is an author id matrix where the value present at the intersection of ith row and jth column is the number of times author i cities author j.total citation list of all authors is represented by t[i] A B C D E F G H                       A 25 21 18 0 0 0 0 0 B 17 3 0 23 0 0 0 0 C 25 0 15 15 10 0 0 0 D 0 22 5 53 0 10 16 0 E 0 0 0 0 0 0 20 0 F 12 0 0 7 0 0 0 0 G 0 0 0 0 0 4 0 0 H 6 0 0 0 0 0 0 41 Figure 1: Author Citation Matrix for Sample Graph 1: Input: An adjacency matrix c[i][j] representing citation information,author ids,total citations total[i] 2: Output: An equivalence class of authors 3: c[i][j] represents the number of citations done by author i to author j 4: Diagonal entries of c[i][j] represents self citations of author stored as x 5: for every author id iinthematrixc[i][j] do 6: for every author id jinthematrixc[i][j] do 7: if x >= 0.5∗t[i] then 8: corrupt author count+ = 1 9: sel fcite author count+ = 1 10: Forming list for realtionships 11: r sel f = author id 12: end if 13: end for 14: end for 15: for every author id iinthematrixc[i][j] do 16: for every author id jinthematrixc[i][j] do 17: if c[i][j] > 0.5∗t[j] then 18: k = k +1 19: if c[i][j] > 0.5∗t[i] then 20: s = s+1 21: corrupt author count+ = 2 22: r bidirectional = author id i : author id j 23: else 24: r unidirectional = author id i 25: end if 26: end if 27: end for 28: end for 29: if k=s=z then 30: Forming dictionary for Mafia Network 31: rmafia=author id i:z 1,z 2,z 3 ......z n 32: end if 33: Forming relationship using the output r self,r unidirectional,r bidirectional and rmafia gives the community network. • This algorithm checks for every author id i if the number of selfcitation of i is greater than threshold of its total citations. • When the number of self citation of an author are greater than threshold percentage of total citations,the author is said to be corrupt and incremented by 1 when the self citation is greater than threshold percentage of total citations then self cite author count is incremented.
  • 3. qA qB qC qD qE qF qG qH 21 17 18 25 7 10 20 4 710 16 15 5 12 6 2223 25 3 15 53 41 Figure 2: Sample Author Network • r self is a list of authors who have selfcitied more than the given threshold value author id i which satisfies the if condition is added to this list • Variable k keeps count of authors who have been cited more than the threshold value by author j • Variable s keeps count of authors who have been cited more than the threshold value by author i • For every author i cited more than the threshold value by author j ,increment k • For every author j cited more than the threshold value by author i ,increment s • If bidirectional relationship exists between author i and author j ,corrupt author count is incremented by 2 and author ids of i and j are added to the dictionary r bidirectional as a key value pair where author id of i is the key and the author id of j is value • If unidirectional relationship exists between author i and author j ,corrupt author count is incremented by 1 and author id of i is added to the list r unidirectional which keeps track of unidirectional relationships • If k and s are equal to the numeric parameter z then they are added to the dictionary rmafia where any one of the author ids of the network acts as a key to access all the other authors • LOCAL CITEConsider an author has 200 citations.Out of 200 citations if 70 percent of author citation is from siblings then list all the citations who collabroated with this author and also list all citations with others. • If author A cities author B and also author B cities author A then it is said to exist binary realtion between the author A and author B .This information is represented in the form of matrix c[i][j] and the binary realtion is represented with 1 • SUSPECTED AUTHORThe Author holding the comparable binary realtionships is said to be suspected author
  • 4. A B C D E F G H                       A 25 21 18 0 0 0 0 0 B 17 3 0 23 0 0 0 0 C 25 0 15 15 10 0 0 0 D 0 22 5 53 0 10 16 0 E 0 0 0 0 0 0 20 0 F 12 0 0 7 0 0 0 0 G 0 0 0 0 0 4 0 0 H 6 0 0 0 0 0 0 41 Figure 3: Author Citation Matrix for Sample Graph Algorithm 1 MAFIA IDENTIFICATION 0 1: Input: Collection of large data sets for citation information reprsented by matrix M 2: Output: Calculating Threshold and identifying binary Realtions between Suspected authors 3: To calculate threshold for an author in suspected list L 4: for doi in list L 5: for do j in list L 6: Threshold ← ∑c[i][j]÷suspected authors 7: if thenThreshold < x x is calculated from trend among siblings 8: c[i][j] > Definedvalue Obtained from trend algorithm 9: the author is said to be involved in mafia 10: end if 11: end for 12: end for