Early Detection and Forecasting of Research Trends

•Als PPTX, PDF herunterladen•

1 gefällt mir•1,131 views

Angelo Salatino

Presentation associated with the paper I presented at ISWC 2015

Wissenschaft

Early Detection and Forecasting of
Research Trends
Angelo Antonio Salatino
@angelosalatino
Advisors:
Prof. Enrico Motta
Dr. Francesco Osborne
ISWC 2015 – Doctoral Consortium

Who cares?
• Researchers: following the evolution of
the research environment
• Academic publishers: promoting up-to-
date and interesting contents
• Companies: early intelligence on
potentially important research trends to
remain at the forefront of innovation
• Funding bodies: improved understanding
of the research landscape

State of the art: Trend detection
• Topic evolution using bibliometric analysis:
– Content analysis
• Topics extraction
• Main terms in documents
– Citation analysis
– Main limitation: cannot detect new trends
early enough in the lifecycle
[Wu et al. 2011, Bolelli et al. 2009, He et al. 2009]

State of the art: Forecasting impact
• Impact based on number of publications and
authors associated with topics
• Approaches based on exponential
smoothing, simple medium average and
machine learning
• Limitations:
– These approaches don’t work at embryonic and
early stages
– They only use a limited set of data sources
[Budi al. 2012, Jun et al. 2010, Tseng et al. 2009]

Planned approach
Wider range of data sources:
comprehensive knowledge base integrating
both scholarly data and social media

Planned approach
– For example, before the Semantic Web
emerged explicitly as research area we
could identify new interesting dynamics
involving authors from different research
areas such as knowledge representation,
agent systems, hypertext and databases.
– Creation of a model that takes into
account all the discovered patterns which
may involve different entities (e.g.,
authors, venues, topics, communities)
Focus on discovering patterns emerging from the
research dynamics:

Initial study
• Goal: To identify the dynamics that may
indicate the emergence of a new topic
• Approach:
– Integration of Keywords network and Semantic
topics network (Klink-2, Osborne et al. @ ISWC
2015)
– Analysis of the evolution in time of sub-networks
that will generate new topics vs. a control
group of establish topics.
• Debutant group (new topics)
• Non-debutant group (established topics)

Preliminary results
• My analysis indicates that for Debutant Topics there is
an intense activity between the most co-occurring
keywords which would normally be established topics
• My hypothesis is that I can use this understanding for
the early detection of new topics on the basis of the
activity of established topics
Student’s t-test on the two distributions:
• p-value = 2.81*10-83
• null hypothesis can be rejected

Evaluation plan
• Quantitative: retrospective analysis and
detection of historical trends
• Qualitative: informal feedback from
domain experts, including senior editors
and publishers at Springer, on the system
suggestions for future trends

Reflections
• So far, my initial experiments provided
promising results which confirm the initial
hypotheses
• The adoption of semantic technologies
has been beneficial to improve these
results

Next steps
• Analyse dynamics in other networks (e.g.,
authors, communities and venues)
• Integration of social media data

Early Detection and Forecasting of Research Trends

Weitere ähnliche Inhalte

Was ist angesagt?

Analyzing User Reviews in Tourism with Topic ModelsInternational Federation for Information Technologies in Travel and Tourism (IFITT)

Algorithms for the thematic analysis of twitter datasetsaneeshabakharia

The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...Angelo Salatino

Data wrangling week 9Ferdin Joe John Joseph PhD

Mining from Open Answers in Questionnaire Datafeiwin

Cluster stabilityNees Jan van Eck

Social Phrases Having Impact in Altmetrics - SOPHIAInsight_Altmetrics

Detecting Incongruity Between News Headline and Body Text via a Deep Hierarch...Seoul National University

[poster] Detecting Incongruity Between News Headline and Body Text via a Deep...Seoul National University

Intra- and interdisciplinary cross-concordances for information retrieval GESIS

Navigation through citation network based on content similarity using cosine ...Salam Shah

WIDS 2021--An Introduction to Network ScienceColleen Farrelly

Topic modelLiam Bui

Relation-wise Automatic Domain-Range Information Management for Knowledge Ent...National Inistitute of Informatics (NII), Tokyo, Japann

Sybrandt Thesis Proposal PresentationJustin Sybrandt, Ph.D.

Assigning semantic labels to data sourcesCraig Knoblock

resume_dataSy Yeu Joe Chou

A scalable architecture for extracting, aligning, linking, and visualizing mu...Craig Knoblock

Proposing a Scientific Paper Retrieval and Recommender FrameworkAravind Sesagiri Raamkumar

What papers should I cite from my reading list? User evaluation of a manuscri...Aravind Sesagiri Raamkumar

Was ist angesagt? (20)

Analyzing User Reviews in Tourism with Topic Models

Algorithms for the thematic analysis of twitter datasets

The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...

Data wrangling week 9

Mining from Open Answers in Questionnaire Data

Cluster stability

Social Phrases Having Impact in Altmetrics - SOPHIA

Detecting Incongruity Between News Headline and Body Text via a Deep Hierarch...

[poster] Detecting Incongruity Between News Headline and Body Text via a Deep...

Intra- and interdisciplinary cross-concordances for information retrieval

Navigation through citation network based on content similarity using cosine ...

WIDS 2021--An Introduction to Network Science

Topic model

Relation-wise Automatic Domain-Range Information Management for Knowledge Ent...

Sybrandt Thesis Proposal Presentation

Assigning semantic labels to data sources

resume_data

A scalable architecture for extracting, aligning, linking, and visualizing mu...

Proposing a Scientific Paper Retrieval and Recommender Framework

What papers should I cite from my reading list? User evaluation of a manuscri...

Ähnlich wie Early Detection and Forecasting of Research Trends

19 2mahesar hidayatullah

researchmethodologyi-140707092303-phpapp02.pdfMdali657802

Research Methodology Part IAnwar Siddiqui

chapter 1 Course Overview.pptxYoniYoni7

Applied research methodology lecture 1Pulchowk Campus

Introduction to research methodologyYogeshSorot

zero.pptxMrunmayee Manjari

Trends in-connecting-research-sgd-2013Sanjeev Deshmukh

Research methodologyNyirenda Junior

Modern political Science.pptxROSHANRAI52

Early Detection of Research Trends [thesis defence]Angelo Salatino

Introduction to research methodologyASIM MANZOOR

research process in nursing nursing process.ppsxlovedhaliwal1

INTELLECTUAL AND PROPERTY RIGHTSunit 1 R23 (1).pptxSamuelAbragham

Introduction to ResearchJo Balucanag - Bitonio

Types of researchMebrahtuBeyene

Part 1 research and evaluation editedYISMAW MENGGISTU

lec1.pdfjeys3

Chapter 3 The Research Process: The broad problem area and defining the pro...Nardin A

BRM PPT 1.pptxbufyf6f7f6fydyddddfftsr6sidfgAMANPathak744625

Ähnlich wie Early Detection and Forecasting of Research Trends (20)

19 2

researchmethodologyi-140707092303-phpapp02.pdf

Research Methodology Part I

chapter 1 Course Overview.pptx

Applied research methodology lecture 1

Introduction to research methodology

zero.pptx

Trends in-connecting-research-sgd-2013

Research methodology

Modern political Science.pptx

Early Detection of Research Trends [thesis defence]

Introduction to research methodology

research process in nursing nursing process.ppsx

INTELLECTUAL AND PROPERTY RIGHTSunit 1 R23 (1).pptx

Introduction to Research

Types of research

Part 1 research and evaluation edited

lec1.pdf

Chapter 3 The Research Process: The broad problem area and defining the pro...

BRM PPT 1.pptxbufyf6f7f6fydyddddfftsr6sidfg

Mehr von Angelo Salatino

Scientific Knowledge Graphs: an OverviewAngelo Salatino

Applying machine learning techniques to big data in the scholarly domainAngelo Salatino

ResearchFlow: Understanding the Knowledge Flow between Academia and IndustryAngelo Salatino

The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino

Invited Talk: Early Detection of Research Topics Angelo Salatino

AUGUR: Forecasting the Emergence of New Research TopicsAngelo Salatino

Tesi Triennale SlideAngelo Salatino

Introductory Lecture to Audio Signal ProcessingAngelo Salatino

Mehr von Angelo Salatino (9)

Scientific Knowledge Graphs: an Overview

Applying machine learning techniques to big data in the scholarly domain

ResearchFlow: Understanding the Knowledge Flow between Academia and Industry

The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas

Invited Talk: Early Detection of Research Topics

AUGUR: Forecasting the Emergence of New Research Topics

Tesi Triennale Slide

Introductory Lecture to Audio Signal Processing

Kürzlich hochgeladen

Chemistry 4th semester series (krishna).pdfSumit Kumar yadav

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P

The Philosophy of ScienceUniversity of Hertfordshire

Animal Communication- Auditory and Visual.pptxUmerFayaz5

Formation of low mass protostars and their circumstellar disksSérgio Sacani

Natural Polymer Based NanomaterialsAArockiyaNisha

Green chemistry and Sustainable development.pptxRajatChauhan518211

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani

Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju

CELL -Structural and Functional unit of life.pdfNistarini College, Purulia (W.B) India

Biological Classification BioHack (3).pdfmuntazimhurra

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk

GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji

Botany 4th semester series (krishna).pdfSumit Kumar yadav

GBSN - Biochemistry (Unit 1)Areesha Ahmad

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani

Kürzlich hochgeladen (20)

Chemistry 4th semester series (krishna).pdf

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE

The Philosophy of Science

Animal Communication- Auditory and Visual.pptx

Formation of low mass protostars and their circumstellar disks

Natural Polymer Based Nanomaterials

Green chemistry and Sustainable development.pptx

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...

Spermiogenesis or Spermateleosis or metamorphosis of spermatid

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf

CELL -Structural and Functional unit of life.pdf

Biological Classification BioHack (3).pdf

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx

GFP in rDNA Technology (Biotechnology).pptx

Botany 4th semester series (krishna).pdf

GBSN - Biochemistry (Unit 1)

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...

Early Detection and Forecasting of Research Trends

1. Early Detection and Forecasting of Research Trends Angelo Antonio Salatino @angelosalatino Advisors: Prof. Enrico Motta Dr. Francesco Osborne ISWC 2015 – Doctoral Consortium

2. Problem

3. Who cares? • Researchers: following the evolution of the research environment • Academic publishers: promoting up-to- date and interesting contents • Companies: early intelligence on potentially important research trends to remain at the forefront of innovation • Funding bodies: improved understanding of the research landscape

4. State of the art: Trend detection • Topic evolution using bibliometric analysis: – Content analysis • Topics extraction • Main terms in documents – Citation analysis – Main limitation: cannot detect new trends early enough in the lifecycle [Wu et al. 2011, Bolelli et al. 2009, He et al. 2009]

5. State of the art: Forecasting impact • Impact based on number of publications and authors associated with topics • Approaches based on exponential smoothing, simple medium average and machine learning • Limitations: – These approaches don’t work at embryonic and early stages – They only use a limited set of data sources [Budi al. 2012, Jun et al. 2010, Tseng et al. 2009]

6. Planned approach Wider range of data sources: comprehensive knowledge base integrating both scholarly data and social media

7. Planned approach – For example, before the Semantic Web emerged explicitly as research area we could identify new interesting dynamics involving authors from different research areas such as knowledge representation, agent systems, hypertext and databases. – Creation of a model that takes into account all the discovered patterns which may involve different entities (e.g., authors, venues, topics, communities) Focus on discovering patterns emerging from the research dynamics:

8. Initial study • Goal: To identify the dynamics that may indicate the emergence of a new topic • Approach: – Integration of Keywords network and Semantic topics network (Klink-2, Osborne et al. @ ISWC 2015) – Analysis of the evolution in time of sub-networks that will generate new topics vs. a control group of establish topics. • Debutant group (new topics) • Non-debutant group (established topics)

9. Preliminary results • My analysis indicates that for Debutant Topics there is an intense activity between the most co-occurring keywords which would normally be established topics • My hypothesis is that I can use this understanding for the early detection of new topics on the basis of the activity of established topics Student’s t-test on the two distributions: • p-value = 2.81*10-83 • null hypothesis can be rejected

10. Evaluation plan • Quantitative: retrospective analysis and detection of historical trends • Qualitative: informal feedback from domain experts, including senior editors and publishers at Springer, on the system suggestions for future trends

11. Reflections • So far, my initial experiments provided promising results which confirm the initial hypotheses • The adoption of semantic technologies has been beneficial to improve these results

12. Next steps • Analyse dynamics in other networks (e.g., authors, communities and venues) • Integration of social media data

Hinweis der Redaktion

Nowadays we are experiencing that the research environment evolves rapidly. New research areas emerge meanwhile others fade out, making difficult to keep up with these dynamics. At the moment, the task of understanding the main emergent area is accomplished either in an automatic or in a semi-automatic way using systems such as rexplore, saffron, arnetminer, MAS, google scholar, faceted dblp and citeseer. Taking as an example the evolution in time of a topic based on the number of papers, like for example the semantic web in figure, we can recognize three main stages: embryonic, early stage and recognised. In fact, it can be argued that a number of topics start to exist in an embryonic way, often as a combination of other topics, before being officially identified and then named by researchers. For example, the Semantic Web emerged as a common area for researchers working on Artificial Intelligence, WWW and Knowledge-Based Systems, before being acknowledged and labelled in the 2001 paper by Tim Berners-Lee. The early stage phase starts when a group of scientists agree with some theories related to the topic, build their own conceptual framework, and potentially give birth to a new scientific community. Finally, in the recognized phase, many authors are aware of this topic and then they start to work on it, producing results and then publish research papers. The problem is that all the aforementioned systems are capable of performing the detection of trends only when the research area is already recognised and not before. They actually need some years to make sense of these new trends. Moreover there are no systems able to forecast their impact in the early stage. I am interested in identifying, making sense and forecasting the impact of research trends.
Who is really interested? Well, Researchers need to be updated regularly on the evolution of research environments because they are interested in new trends related to their topics. Academic publishers or editors knowing in advance new emerging topics is crucial for offering the most up to date and interesting contents. For example, an editor can gain a competitive advantage by being the first one to recognize the importance of a new trend and publish a special issue or a journal about it. And actually my PhD project is supported from Springer-Verlag. Institutional funding bodies and companies need also to be aware of research developments and promising research trends. For example, being aware of the future research trends will allow them to move in advance for making some important investments.
This problem can be analysed from two point of view that are the topic trend detection and the forecast of the impact of topics. For what concerns the trend detection, all the current approaches do use bibliometric analysis aiming to extract either topics or main terms from the text and then the evolution of these topics is analysed investigating the citation network. The main limitation of these approaches is that the content for specific topic need first to be produced and then cited taking years before they can realise it.
On the other hand, for forecasting the impact there are approaches that define the impact as number of publications and authors associated with topics and they are mainly based on statistical techniques like exponential smoothing, simple medium average and also machine learning algorithms. In this case, the main limitations of these approaches is that they do not work in the first phases of the evolution of topics and also they employ limited set of features. However it can argued that a different definition of the impact based also on social media data can improve the forecasting phase and will allow us to perform it in a short timescale.
Initially, I will aim to integrate a variety of heterogeneous data sources including scholarly data and social media data in order to create a comprehensive knowledge base. This knowledge based will make use of an ontology to describe all the relationships between the research elements.
Afterwards I will focus on analysing pattern that can lead to the emergence of a new research topic. For example, before Tim Berners Lee named officially the semantic web as a research area, we were already able to identify that the AI, the WWW and KBS were sharing their knowledge in this new common area. An interesting fact about scholarly data is that they store information about papers, therefore many research elements like topics, author, communities, venues, organizations can be inferred and all these research elements are inherently interconnected because an author writes paper about certain topics, an author belong to a community that is connected to a topic. These relationships can be analysed diachronically to derive new dynamics that can lead to the emergence of new topics, and then I can design a comprehensive model that takes into account all the discovered patterns.
I conducted an initial study aiming to identify the dynamics that may lead to the emergence of a new topic using only scholarly data. In order to do so, I firstly combined the keywords network and the semantic topic network available in REXPLORE database. The keywords network as the name suggests is a network in which nodes represent keywords tagged in paper and the link between two keywords represent the amount of paper in which these two keyword co-occur per each year. The semantic topic network is also a network of keywords but in this case they are connected by semantic relationships subAreaOf, sameAs and so on that creates then a hierarchy of research topics. As a next step, I conducted a diachronic analysis on some portion of this joint network that are related to two different kind of topics: debutant and non debutant
As a result I obtained that for the portion of network related to the debutant group of topics the pace of collaboration between topics is higher than the portion of network related to the non-debutant group. In this picture we can see two different distribution of the pace of collaboration of topics in time. The green line is for the topics belonging to the non debutant group while the blue line is for topics belonging to the debutant group. We can see that the distribution of the pace in collaboration for the non debutant group is centred in zero which means that on overall this group doesn’t show any increase in collaboration, while for the debutant group the distribution is shifted toward positive values showing that in this case the pace of collaboration is increasing. Moreover, applying the Student’s t-test on the two distributions allows us to reject the null hypothesis indicating that there is no relationship between the two measured phenomena. For this reason I believe that the acquired know-how can be applied for understanding the emergence of new topics based on the established ones. As preliminary results, I joined the Keywords Network that is a co-occurrences graphs with nodes representing topics and links representing the number of co-occurrences between them and the Semantic Topic Network that is a taxonomy of topic connected by semantic relationships extracted by Klink. I conducted a diachronic analysis on some portions of this joined graphs to confirm if the creation of novel topics is actually correlated to an increase in the pace of collaboration of already existing ones. These portions of graph were related to two different groups of topics: debutant and non-debutant.
I plan to evaluate my work on both quantitative and qualitative perspective. From a quantitative point of view, I will use historical data to estimate statistical indexes like precision, recall, f-measure and so on. While from the qualitative perspecive, it is intended to receive informal feedback about future trend from domain experts, such as senior editors and publishers at Springer
It can be said that the initial experiments provided promising result confirming also the initial hypotheses about the emergence of new topics. And, the adoption of semantic technologies like the semantic topic network has been beneficial to improve these results.
As a next step I aim to analyse the dynamics of other research elements like authors, communities and venues that can lead to the emergence of a new research topics and also integrate entities from social media like tweets and blog posts.

Early Detection and Forecasting of Research Trends

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Early Detection and Forecasting of Research Trends

Ähnlich wie Early Detection and Forecasting of Research Trends (20)

Mehr von Angelo Salatino

Mehr von Angelo Salatino (9)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Early Detection and Forecasting of Research Trends

Hinweis der Redaktion