This article (presented at the ER2016 conference) proposes a conceptual schema providing a holistic view of conference-related information (e.g., authors, papers, committees and topics). This schema is automatically and incrementally populated with data available online.
A number of data analysis and visualization algorithms are applied on top of this data to provide meaningful information to prospective authors, PC members and conference steering committeees
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
MetaScience: Holistic Approach for Research Modeling and Analysis
1. MetaScience
An Holistic Approach for Research Modeling
Valerio Cosentino, Javier Cánovas & Jordi Cabot
ICREA – Open University of Catalonia
@softmodeling modeling-languages.com
8. -Data vs Information
-Non-trivial analysis: Integration of different sources
-Conceptual Schema as unifying representation
-Separation of data collection from data analysis
21. Historical / temporal view, e.g. PC Analysis
ARE PC MEMBERS ACTIVE IN THE CONFERENCE?
60 out of the 99 members from 2015 did not publish in the previous 3 editions
ARE ACTIVE MEMBERS BEING IGNORED?
Only 7 researchers published constantly from 2012 to 2014
3 of them were PC members in 2015, while the remaining 4 were not
22. Newcomers
% of papers with all authors
new to the main track
% of papers with all authors
new to the conf
23. NL/ Topic analysis, e.g. Top-30 keywords for last 10 edts
From paper abstracts From topics of interest
34. And even more challenges….
Paper classification
o Not clear distinction of paper types
o Changes on the characteristics from one edition to another (e.g. number of
pages for short papers)
Committee / topics data
o Conference edition web sites may be not available anymore
Partial solution: WayBack Machine
o Committee data similar but there is no common “standard”
Entity resolution
o Researchers can use different names
Partial solution: DBLP provides aliases
o Researcher names may appear misspelled (mostly in committee data)
There is a duality living inside every one of us. Every member of the research community plays two different roles: the role of the researcher and the role of the evaluator of the research work made by others
And like it or not both are very important for the health of the community
Example for the SC to make the right decisions regarding the future of the conference but also for authors in order to choose a conference to target
But today we’re going to talk about another thing . How did we get involved on this?
Even if many of my academic colleagues don’t believe so there is life beyond papers so we got interested in understanding how we could have a greater impact
If you can do Science, I can do meta science. We can eat our own dog food. In the rest of the paper we’ll show we use conceptual modeling to represent and then analyze communities
Size of the community doesn’t really matter. Or at least it’s much more important how the community is internally structured
Raw community data is not enough to get any meaningful information
Still, looking at raw community data is a mess. A good community analysis is not trivial to do.
For any
Mention that it’s incremental
Gitana https://github.com/SOM-Research/Gitana
The Web Crawler relies on Selenium
Not all of them implemented!!! Just to give an idea of the challenges
Microsoft offers the academic knowledge API
I’ll show now some of the analysis that can be done. Obviously these are just examples, once you have the data, you can calculate anything you want to know.
Our point is that these analysis are useful for the SC to make the right decisions regarding the future of the conference but also for authors in order to choose a conference to target (e.g. How easy is to enter a conference for new authors, how easy is to become a PC member,...)
Example of a single metric and its positive trend
We’re not getting any younger
Gitana https://github.com/SOM-Research/Gitana
Caution: PC members not publishing may still publish in workshops or had other responsiblities
Active members being ignored may have their co-authors in the PC so their expertise can be
Whether these numbers are good or bad also depend on the comparison with other conferences.
Also it depends a lot of what you exactly consider a newcomer
Process and business do not show up so strongly in the call for papers
While reverse and enterprise only appear in the enterprise
This can be helpful to evaluate whether the call for papers respond to the reality of the conference
They have more advanced mathematical models that rely on bipartite graphs to calculate more advanced emergent properties of graph
Gitana https://github.com/SOM-Research/Gitana
Then, we have other research tools that can help to actually improve WordPress itself
MetaScience reuses some components of another of our tools Gitana – for analysis of software projects (presented here last year)
GEXT: Graph exchange XML format
Mecana calculates some metrics on the database data but others are easier to calculate on the graph data
There are several exportesr depending on what we wawnt to calculate
We can use gephi to directly visusalize the graphs but we are also developing our own visualization component
This can be installed and deployed on your own server
Gitana https://github.com/SOM-Research/Gitana
Partial online service
Then, we have other research tools that can help to actually improve WordPress itself
Gitana https://github.com/SOM-Research/Gitana
We managed to get the whole university blocked from google scholar. As you can imagine we didn’t do many friends. But no, this was not the reason why I escaped from France and get back to Barcelona
Without APIs we are limited regarding the information we can use. Sure, it would need to be anonymized but it would really useful to have data on the review process and the rejected papers
I hope we can then work together to solve some of these challenges and built the tools we need to better understand ourselves and make sure ER continues being a great conference for many years