SlideShare ist ein Scribd-Unternehmen logo
1 von 27
@openaire_eu
Explore,model,analyzeandvisualize
systematicresearchinOpenAIRE
… via text and data mining (topic modeling)
A bird’s eye view
NataliaManola
UniversityofAthens
AthenaResearch&InnovationCenter
Open Science FAIR, Athens, 6-8 Sept, 2017
• The global research community generates ~2.5 million new
scholarly articles per year (English only)The STM report
(2015)
• … one paper published every 12 seconds…
• 70,000 papers published on a single protein, the tumor
suppressor p53 Spangler et al, Automated Hypothesis
Generation based on Mining Scientific Literature, 2014
Big volumes of data (publications ARE data in TDM)
Open Science FAIR, Athens, 6-8 Sept, 2017 2
Meta research:
Research analytics
Open Science FAIR, Athens, 6-8 Sept, 2017 3
Is Related
Mining scientific/scholarly literature
4
Name
Institution
Author
Title
Key Words
Topics
Words
(BoWs)
Venue
Queries
Downloads
Sessions
Paper
User
Writes
Search for
Paper
Paper
Citing
Cited
User
User
Author
Author
?
?
? ?
Name
Grant No
Start -
End Funding
Is Funded
Thematic analysis: What are the topics / concepts?
Entity Resolution: Do they refer to the same person?
Similarity analysis & link prediction: Is it related?
Analyze the role of funding
Get or recommend relevant content: ranking & similarity analysis
Structuring Effects: Identify & model research communities
Attribute Prediction: What could be the (possible) venue?
Research impact & timeliness
WHY
Open Science FAIR, Athens, 6-8 Sept, 2017
Numberofpublications rising…
Newmodels newinsights betterdecisions
RealOutput vsproject &calldescriptions
Analyzelargecollections ofdocuments, andmeta-data to:
• Assess research collaboration: authorship network analysis
• Identify active areas of research: discover hidden themes (topics)
• Understand what is actually produced
• Discover clusters and communities
• Identify emerging research areas
• Assess coverage, identify gaps or new challenges
Mining scientific/scholarly literature WHY
Open Science FAIR, Athens, 6-8 Sept, 2017 5
 Interconnected (linked) entities characterized by TEXT
 Related side information & links (e.g., taxonomies,
venues, projects / research areas, citations, authors)
 Side-information:
 structured or unstructured attributes, links / relations and meta-data
 form networks: e.g., authorship network, citation network, …
 incomplete or missing, noisy or not related to textual attributes
ProbabilisticMulti-ViewTopicModelingofText-Augmented
HeterogeneousInformationNetworks
HOW
Open Science FAIR, Athens, 6-8 Sept, 2017 6
Multi-View vs Text only: interopretability and coverage
MV_HDP
Text topic, latent, lda, document, dirichlet,
probabilistic, mining, semantic,
allocation, generative, word, mixture,
topical, corpus, plsa, bayesian,
unsupervised,..
mapreduce, big, hadoop, analytics, cluster, map,
scalable, datasets, queries, cloud, intensive, jobs,
databases, massive, google, job, scalability, node,
computations, mining, hdfs, hive, machine,
workloads, volume,…
Citations
(ranked list
of citation
net nodes)
“Dynamic topic models”, “Topics over time”
“Joint latent topic models for text and
citations”
“Topic modeling”, “Probabilistic topic
models”
“Probabilistic latent semantic indexing”,…
“ A comparison of approaches to large-scale data analysis”,
“Pig latin”, “Mesos”, “DryadLINQ”, “PREGEL”, “CIEL”,
“Improving MapReduce performance in heterogeneous
environments”, “MapReduce Online”, “MapReduce Merge” ,..
Taxonomy H.3.3 IR: Information Search and Retrieval,
H.3.1 IR: Content Analysis and Indexing, H.2.8
DB MNGMT: Database Applications, I.2.6 AI:
Learning, I.2.7 AI: Natural Language
Processing, I.5.1 PAT.REC.: Models
H.2.4 DB MNGMT: Systems, D.1.3 PROGR.TECHNIQUES:
Concurrent Programming, C.2.4 COMP.- COMM. NETS:
Distributed Systems, H.2.8 DB MNGMT: Database Applications,
H.3.4 INFO STORAGE AND RETRIEVAL: Systems and
Software
Keywords topic modeling, latent dirichlet
allocation, latent semantic analysis,
generative model, text mining
big data, Map-Reduce, hadoop, cloud computing,
distributed computing, data analytics, machine
learning, parallel processing
Venues SIGKDD, WSDM, CIKM SIGMOD, BigSystem, CloudCP, EUROSEC,
EUROSYS,..
topic: “Topic Modeling” “Cloud/Distributed computing & Big Data Analytics”
Good metadata is
important
Open Science FAIR, Athens, 6-8 Sept, 2017 7
Extract features and annotate (enrich)
content using NLP, Named Entity
Recognition & Semantic Annotation
Tokenize, remove stop words
Refine stop words for
specific domain
1
ENRICH &
PRE-
PROCESS
Identify topics: distribution over words
& “side” information
Automatic topic curation & entitling
Assign topics to publications
Evaluate & categorize
topics
Assess topic labels
2
FIND
TOPICS
Calculate topic proportions & trends
of objects based on their publications
Calculate similarity among different
entities based on various metrics
Analyze & Validate the
results
3
CALCULATE
TRENDS &
SIMILARITIES
Create WEB interactive visualization
with data driven graphs, charts and
layouts
Design optimal views
Validate modeling results
4
VISUALIZ
E
What isinvolved?
Open Science FAIR, Athens, 6-8 Sept, 2017 8
What is the result?
Open Science FAIR, Athens, 6-8 Sept, 2017 9
1. Linked information
Open Science FAIR, Athens, 6-8 Sept, 2017 10
How often is “Topic Modeling” encountered?
Rank TopicId Title Weight
230 18 Data management & file systems 0.0028
231 132 Image processing: Face & emotion recognition, facial animation 0.0027
232 373 Project management & software development 0.0027
233 138 Self-adaptive systems & autonomic computing 0.0027
234 360 S/W development, management & maintenance 0.0026
235 96 Gender differences (analysis, studies) 0.0026
236 271 Haptic technology, feedback & multimodal user interaction 0.0025
237 322
Information extraction, Named entity recognition, disambiguation,
cleaning 0.0025
238 348 cognitive psychology, cognitive and mental models 0.0025
240 74 HCI: Touch screen interaction & interactive surfaces 0.0025
241 382 Topic Modelling 0.0025
242 230
Trust & reputation analysis and management (IOT, Web, recom.
systems) 0.0025
243 2 Wikipedia & collaborative editing 0.0025
245 15 Crowdsourcing & human computation 0.0025
246 273 Automatic programming, refactoring & transformations 0.0024
248 323 Reliability, fault tolerance and recovery 0.0024
249 113 Online / computational advertising 0.0024
Out of 382
Open Science FAIR, Athens, 6-8 Sept, 2017 11
Is it trendy?
TopicId Title WeightTrend Journal Confer
15Crowdsourcing & human computation 0.003 27.89 0.068 0.035
194Cloud Computing, Storage & Virtualization 0.004 23.56 0.077 0.011
201
Social network analysis: influence, info diffusion,
communities 0.004 10.82 0.119 0.066
350Distributed (Big) Data analytics (cloud, MapReduce) 0.006 10.54 0.057 0.022
41Mobile applications 0.005 9.86 0.135 0.019
68Social media analysis (twitter, blogs, news feed) 0.004 9.72 0.078 0.049
366Persuasive technologies, gamification, user engagement 0.003 8.65 0.126 0.070
61Wearable computing, technology & activity recognition 0.003 8.24 0.135 0.044
40ICT in developing countries (India) 0.002 7.72 0.096 0.100
341GPU computing 0.004 6.78 0.120 0.029
133
Recommendation, personalization and collaborative
filtering 0.006 6.27 0.096 0.085
134Flash memory structures, storage & systems 0.002 6.2 0.144 0.077
22HCI: Organic & Flexible user interfaces 0.001 6.04 0.123 0.101
74HCI: Touch screen interaction & interactive surfaces 0.003 5.87 0.205 0.118
2Wikipedia & collaborative editing 0.003 5.33 0.079 0.083
52HCI design & user experience 0.013 5.15 0.156 0.082
266Sentiment analysis & opinion mining 0.002 4.95 0.057 0.047
10Image retrieval & object recognition 0.006 4.91 0.082 0.048
382Topic Modelling 0.003 4.57 0.111 0.069
228Software product line engineering 0.003 3.92 0.128 0.094
100Social tagging, annotation & tag recommendation 0.005 3.88 0.115 0.037
294Robotics, human-robot interaction, anthropomorphism 0.005 3.34 0.066 0.170
Top 20
Open Science FAIR, Athens, 6-8 Sept, 2017 12
Concept driven search
Open Science FAIR, Athens, 6-8 Sept, 2017 13
PubId Weight Title
1646242 0.72Dynamic hyperparameter optimization for bayesian topical trend analysis
1871521 0.67Latent interest-topic model
2505555 0.64On handling textual errors in latent document modeling
2398646 0.63Automatic labeling hierarchical topics
1458337 0.63Combining concept hierarchies and statistical topic models
2348335 0.63Group matrix factorization for scalable topic modeling
2009977 0.63Mining topics on participations for community discovery
1835890 0.62Topic models with power-law using Pitman-Yor process
2398483 0.61Hierarchical topic integration through semi-supervised hierarchical topic modeling
1150482 0.60A mixture model for contextual text mining
1963244 0.60Investigating topic models for social media user recommendation
1281249 0.60Multiscale topic tomography
2086739 0.59Sequential Modeling of Topic Dynamics with Multiple Timescales
1572095 0.59A latent topic model for linked documents
2188143 0.59Latent contextual indexing of annotated documents
1859210 0.58Topic models vs. unstructured data
1487045 0.58Linked Topic and Interest Model for Web Forums
2609471 0.58Probabilistic text modeling with orthogonalized topics
2396861 0.57Modeling topic hierarchies with the recursive chinese restaurant process
2433438 0.57Group sparse topical coding
1935880 0.57Trend analysis model
1390546 0.56Improving text classification accuracy using topic modeling over an additional corpus
1553410 0.55Accounting for burstiness in topic models
View top 23 most related publications to “Topic Modeling”
Visualization
Open Science FAIR, Athens, 6-8 Sept, 2017 14
Trendy, old-fashion, common topics
Open Science FAIR, Athens, 6-8 Sept, 2017 15
Trendy topics
Distributed (Big) Data
analytics
HCI design & user
experience
GPU
Open Science FAIR, Athens, 6-8 Sept, 2017 16
Trendy topics
Trendy
HCI design & user
experience
GPU
Distributed (Big) Data
analytics
Compare topics
Open Science FAIR, Athens, 6-8 Sept, 2017 17
Relational DBs
Programming
Old-fashion topics
Open Science FAIR, Athens, 6-8 Sept, 2017 18
Do we need another venue?Trendy, but evenly spread across many journals AND conferences
TopicI
d Title WeightTrend
Journa
l Confer
15Crowdsourcing & human computation 0.003 27.89 0.068 0.035
194Cloud Computing, Storage & Virtualization 0.004 23.56 0.077 0.011
201
Social network analysis: influence, info diffusion,
communities 0.004 10.82 0.119 0.066
350Distributed (Big) Data analytics (cloud, MapReduce) 0.006 10.54 0.057 0.022
41Mobile applications 0.005 9.86 0.135 0.019
68Social media analysis (twitter, blogs, news feed) 0.004 9.72 0.078 0.049
366Persuasive technologies, gamification, user engagement 0.003 8.65 0.126 0.070
61
Wearable computing, technology & activity
recognition 0.003 8.24 0.135 0.044
40ICT in developing countries (India) 0.002 7.72 0.096 0.100
341GPU computing 0.004 6.78 0.120 0.029
133
Recommendation, personalization and collaborative
filtering 0.006 6.27 0.096 0.085
134Flash memory structures, storage & systems 0.002 6.2 0.144 0.077
22HCI: Organic & Flexible user interfaces 0.001 6.04 0.123 0.101
74HCI: Touch screen interaction & interactive surfaces 0.003 5.87 0.205 0.118
2Wikipedia & collaborative editing 0.003 5.33 0.079 0.083
52HCI design & user experience 0.013 5.15 0.156 0.082
266Sentiment analysis & opinion mining 0.002 4.95 0.057 0.047
10Image retrieval & object recognition 0.006 4.91 0.082 0.048
382Topic Modelling 0.003 4.57 0.111 0.069
228Software product line engineering 0.003 3.92 0.128 0.094
Exclusiv
ity
0.103
0.088
0.185
0.079
0.154
0.127
0.196
0.179
0.196
0.149
0.181
0.221
0.224
0.323
0.162
0.238
0.104
0.130
0.180
0.222
0.152
0.236
+
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30
Exclusivity
Trend
Exclusivity vs Trend
OpenScienceFAIR,Athens,6-8Sept,201719
Genetic
algorithms
P2P networks & content
distribution
Important but declining (?)
Open Science FAIR, Athens, 6-8 Sept, 2017 20
Genetic
algorithms
Topic birth, death & fluctuation over time
Open Science FAIR, Athens, 6-8 Sept, 2017 21
Root ACM
Categorie
s (level 0)
LINKS represent
topic based
similarity
NODES represent Authors
Similar
Authors
Topics
Highlighted
Author
+FEATURES
Zoom for drill down
Search and filtering
Dynamic configuration of thresholds
Authors Similarity Analysis
Open Science FAIR, Athens, 6-8 Sept, 2017 22
Categories correlations
Open Science FAIR, Athens, 6-8 Sept, 2017 23
What is the potential?
Open Science FAIR, Athens, 6-8 Sept, 2017 24
• Funders and institutions to assess research impact over time
• Especiallyusefulwhencombinedwithnon-researchdata
• OpenAIREdataandservicesalreadyusedbyECforex-postFP7evaluation
• Policy makers
• Bindingresearchtosocietalpolicydecisions
• Scholarly societies
• Determinenewconferences/mergeexisting ones.Introducenewthemes…
• Newportalservices(conceptsearch)
• Publishers (incl. institutional publications)
• Create,adaptjournals…
Scratching the surface…
Open Science FAIR, Athens, 6-8 Sept, 2017 25
Easingtechnologicalbarriers
Hubsof scientificcontentfor
TDM
OpenAIREnewdashboardsto
facilitatefullcontentretrieval
OpenAccess
Licensingissues
Termsofagreement
Open Science FAIR, Athens, 6-8 Sept, 2017 26
Thank you!
Natalia Manola
natalia@di.uoa.gr
+30 210 9876 432
Skype: natalia.manola

Weitere ähnliche Inhalte

Was ist angesagt?

The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is here
Connected Data World
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
Attila Barta
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social Media
IJERA Editor
 

Was ist angesagt? (20)

Introduction to Big Data: Smart Factory
Introduction to Big Data: Smart FactoryIntroduction to Big Data: Smart Factory
Introduction to Big Data: Smart Factory
 
Structured Content Meets Taxonomy
Structured Content Meets TaxonomyStructured Content Meets Taxonomy
Structured Content Meets Taxonomy
 
GTU GeekDay Data Science and Applications
GTU GeekDay Data Science and ApplicationsGTU GeekDay Data Science and Applications
GTU GeekDay Data Science and Applications
 
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data PlatformPredictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
Predictive Analysis for Airbnb Listing Rating using Scalable Big Data Platform
 
Data science innovations
Data science innovations Data science innovations
Data science innovations
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
 
Data Science applications in business
Data Science applications in businessData Science applications in business
Data Science applications in business
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is here
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data Analysis
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Scalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AIScalable Predictive Analysis and The Trend with Big Data & AI
Scalable Predictive Analysis and The Trend with Big Data & AI
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
Yahoo's Knowledge Graph - 2014 slides
Yahoo's Knowledge Graph - 2014 slidesYahoo's Knowledge Graph - 2014 slides
Yahoo's Knowledge Graph - 2014 slides
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social Media
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge Graph
 
“Semantic Technologies for Smart Services”
“Semantic Technologies for Smart Services” “Semantic Technologies for Smart Services”
“Semantic Technologies for Smart Services”
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best PracticesNeo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
 
Understanding Cognitive Applications: A Framework - Sue Feldman
Understanding Cognitive Applications:  A Framework - Sue FeldmanUnderstanding Cognitive Applications:  A Framework - Sue Feldman
Understanding Cognitive Applications: A Framework - Sue Feldman
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
Massive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and ApplicationsMassive Data Analysis- Challenges and Applications
Massive Data Analysis- Challenges and Applications
 

Ähnlich wie OSFair2017 training | Explore, model, analyze and visualize systematic research in OpenAIRE

Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Semantic Web Company
 
Vellino presentationtocisti
Vellino presentationtocistiVellino presentationtocisti
Vellino presentationtocisti
Andre Vellino
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SAC
webuploader
 

Ähnlich wie OSFair2017 training | Explore, model, analyze and visualize systematic research in OpenAIRE (20)

[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
[DSC Croatia 22] Writing scientific papers about data science projects - Mirj...
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an Overview
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Official resume titash_mandal_
Official resume titash_mandal_Official resume titash_mandal_
Official resume titash_mandal_
 
Vellino presentationtocisti
Vellino presentationtocistiVellino presentationtocisti
Vellino presentationtocisti
 
Semantic Text Processing Powered by Wikipedia
Semantic Text Processing Powered by WikipediaSemantic Text Processing Powered by Wikipedia
Semantic Text Processing Powered by Wikipedia
 
Data science technology overview
Data science technology overviewData science technology overview
Data science technology overview
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madness
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SAC
 
Text mining and machine learning
Text mining and machine learningText mining and machine learning
Text mining and machine learning
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of Scientists
 
Linked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so farLinked Open Data about Springer Nature conferences. The story so far
Linked Open Data about Springer Nature conferences. The story so far
 
unit 1 DATA MINING.ppt
unit 1 DATA MINING.pptunit 1 DATA MINING.ppt
unit 1 DATA MINING.ppt
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital Libraries
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 

Mehr von Open Science Fair

OSFair2017 Worksop | NUCLEUS project - Are you ready to perform in RRI ecosys...
OSFair2017 Worksop | NUCLEUS project - Are you ready to perform in RRI ecosys...OSFair2017 Worksop | NUCLEUS project - Are you ready to perform in RRI ecosys...
OSFair2017 Worksop | NUCLEUS project - Are you ready to perform in RRI ecosys...
Open Science Fair
 
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
Open Science Fair
 
OSFair2017 Workshop | Research lifecycle in Arts, Humanities and Social Sciences
OSFair2017 Workshop | Research lifecycle in Arts, Humanities and Social SciencesOSFair2017 Workshop | Research lifecycle in Arts, Humanities and Social Sciences
OSFair2017 Workshop | Research lifecycle in Arts, Humanities and Social Sciences
Open Science Fair
 

Mehr von Open Science Fair (20)

OSFair2017 workshop | Monitoring open science trends in europe
OSFair2017 workshop | Monitoring open science trends in europeOSFair2017 workshop | Monitoring open science trends in europe
OSFair2017 workshop | Monitoring open science trends in europe
 
OSFair2017 Worksop | NUCLEUS project - Are you ready to perform in RRI ecosys...
OSFair2017 Worksop | NUCLEUS project - Are you ready to perform in RRI ecosys...OSFair2017 Worksop | NUCLEUS project - Are you ready to perform in RRI ecosys...
OSFair2017 Worksop | NUCLEUS project - Are you ready to perform in RRI ecosys...
 
OSFair2017 Workshop | Data Analytics meets Social Sciences: New Frontiers of ...
OSFair2017 Workshop | Data Analytics meets Social Sciences: New Frontiers of ...OSFair2017 Workshop | Data Analytics meets Social Sciences: New Frontiers of ...
OSFair2017 Workshop | Data Analytics meets Social Sciences: New Frontiers of ...
 
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
 
OSFair2017 Workshop | Research lifecycle in Arts, Humanities and Social Sciences
OSFair2017 Workshop | Research lifecycle in Arts, Humanities and Social SciencesOSFair2017 Workshop | Research lifecycle in Arts, Humanities and Social Sciences
OSFair2017 Workshop | Research lifecycle in Arts, Humanities and Social Sciences
 
OSFair2017 Workshop | Towards a Policy Framework for the European Open Scienc...
OSFair2017 Workshop | Towards a Policy Framework for the European Open Scienc...OSFair2017 Workshop | Towards a Policy Framework for the European Open Scienc...
OSFair2017 Workshop | Towards a Policy Framework for the European Open Scienc...
 
OSFair2017 Workshop | Big Mechanism: deep reading for cancer biology
OSFair2017 Workshop | Big Mechanism: deep reading for cancer biologyOSFair2017 Workshop | Big Mechanism: deep reading for cancer biology
OSFair2017 Workshop | Big Mechanism: deep reading for cancer biology
 
OSFair2017 Workshop | Text mining
OSFair2017 Workshop | Text miningOSFair2017 Workshop | Text mining
OSFair2017 Workshop | Text mining
 
OSFair2017 Workshop | EOSCpilot governance
OSFair2017 Workshop | EOSCpilot governanceOSFair2017 Workshop | EOSCpilot governance
OSFair2017 Workshop | EOSCpilot governance
 
OSFair2017 Workshop | Brokering services facilitating interoperability and da...
OSFair2017 Workshop | Brokering services facilitating interoperability and da...OSFair2017 Workshop | Brokering services facilitating interoperability and da...
OSFair2017 Workshop | Brokering services facilitating interoperability and da...
 
OSFair2017 Workshop | Service provisioning for excellent sciences
OSFair2017 Workshop | Service provisioning for excellent sciencesOSFair2017 Workshop | Service provisioning for excellent sciences
OSFair2017 Workshop | Service provisioning for excellent sciences
 
OSFair2017 Theatrical Workshop | Are you ready to perform in the rri ecosystem
OSFair2017 Theatrical Workshop | Are you ready to perform in the rri ecosystemOSFair2017 Theatrical Workshop | Are you ready to perform in the rri ecosystem
OSFair2017 Theatrical Workshop | Are you ready to perform in the rri ecosystem
 
OSFair2017 Theatrical Workshop | Nucleus H2020 EU project
OSFair2017 Theatrical Workshop | Nucleus H2020 EU projectOSFair2017 Theatrical Workshop | Nucleus H2020 EU project
OSFair2017 Theatrical Workshop | Nucleus H2020 EU project
 
OSFair2017 Workshop | Open Knowledge Maps, A visual interface to the world's ...
OSFair2017 Workshop | Open Knowledge Maps, A visual interface to the world's ...OSFair2017 Workshop | Open Knowledge Maps, A visual interface to the world's ...
OSFair2017 Workshop | Open Knowledge Maps, A visual interface to the world's ...
 
OSFair2017 Training | Reproducibility in critical care research
OSFair2017 Training | Reproducibility in critical care researchOSFair2017 Training | Reproducibility in critical care research
OSFair2017 Training | Reproducibility in critical care research
 
OSFair2017 Training | Big data and evidence-based medicine in Greece
OSFair2017 Training | Big data and evidence-based medicine in GreeceOSFair2017 Training | Big data and evidence-based medicine in Greece
OSFair2017 Training | Big data and evidence-based medicine in Greece
 
OSFair2017 Training | What is Open Science and why should I care?
OSFair2017 Training | What is Open Science and why should I care?OSFair2017 Training | What is Open Science and why should I care?
OSFair2017 Training | What is Open Science and why should I care?
 
OSFair2017 Training | OpenAIRE monitoring services, EC FP7 & H2020 & other na...
OSFair2017 Training | OpenAIRE monitoring services, EC FP7 & H2020 & other na...OSFair2017 Training | OpenAIRE monitoring services, EC FP7 & H2020 & other na...
OSFair2017 Training | OpenAIRE monitoring services, EC FP7 & H2020 & other na...
 
OSFair2017 Training | Designing & implementing open access, open data & open ...
OSFair2017 Training | Designing & implementing open access, open data & open ...OSFair2017 Training | Designing & implementing open access, open data & open ...
OSFair2017 Training | Designing & implementing open access, open data & open ...
 
OSFair2017 Training | Best practice in Open Science
OSFair2017 Training | Best practice in Open ScienceOSFair2017 Training | Best practice in Open Science
OSFair2017 Training | Best practice in Open Science
 

Kürzlich hochgeladen

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Kürzlich hochgeladen (20)

Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 

OSFair2017 training | Explore, model, analyze and visualize systematic research in OpenAIRE

  • 1. @openaire_eu Explore,model,analyzeandvisualize systematicresearchinOpenAIRE … via text and data mining (topic modeling) A bird’s eye view NataliaManola UniversityofAthens AthenaResearch&InnovationCenter Open Science FAIR, Athens, 6-8 Sept, 2017
  • 2. • The global research community generates ~2.5 million new scholarly articles per year (English only)The STM report (2015) • … one paper published every 12 seconds… • 70,000 papers published on a single protein, the tumor suppressor p53 Spangler et al, Automated Hypothesis Generation based on Mining Scientific Literature, 2014 Big volumes of data (publications ARE data in TDM) Open Science FAIR, Athens, 6-8 Sept, 2017 2
  • 3. Meta research: Research analytics Open Science FAIR, Athens, 6-8 Sept, 2017 3
  • 4. Is Related Mining scientific/scholarly literature 4 Name Institution Author Title Key Words Topics Words (BoWs) Venue Queries Downloads Sessions Paper User Writes Search for Paper Paper Citing Cited User User Author Author ? ? ? ? Name Grant No Start - End Funding Is Funded Thematic analysis: What are the topics / concepts? Entity Resolution: Do they refer to the same person? Similarity analysis & link prediction: Is it related? Analyze the role of funding Get or recommend relevant content: ranking & similarity analysis Structuring Effects: Identify & model research communities Attribute Prediction: What could be the (possible) venue? Research impact & timeliness WHY Open Science FAIR, Athens, 6-8 Sept, 2017
  • 5. Numberofpublications rising… Newmodels newinsights betterdecisions RealOutput vsproject &calldescriptions Analyzelargecollections ofdocuments, andmeta-data to: • Assess research collaboration: authorship network analysis • Identify active areas of research: discover hidden themes (topics) • Understand what is actually produced • Discover clusters and communities • Identify emerging research areas • Assess coverage, identify gaps or new challenges Mining scientific/scholarly literature WHY Open Science FAIR, Athens, 6-8 Sept, 2017 5
  • 6.  Interconnected (linked) entities characterized by TEXT  Related side information & links (e.g., taxonomies, venues, projects / research areas, citations, authors)  Side-information:  structured or unstructured attributes, links / relations and meta-data  form networks: e.g., authorship network, citation network, …  incomplete or missing, noisy or not related to textual attributes ProbabilisticMulti-ViewTopicModelingofText-Augmented HeterogeneousInformationNetworks HOW Open Science FAIR, Athens, 6-8 Sept, 2017 6
  • 7. Multi-View vs Text only: interopretability and coverage MV_HDP Text topic, latent, lda, document, dirichlet, probabilistic, mining, semantic, allocation, generative, word, mixture, topical, corpus, plsa, bayesian, unsupervised,.. mapreduce, big, hadoop, analytics, cluster, map, scalable, datasets, queries, cloud, intensive, jobs, databases, massive, google, job, scalability, node, computations, mining, hdfs, hive, machine, workloads, volume,… Citations (ranked list of citation net nodes) “Dynamic topic models”, “Topics over time” “Joint latent topic models for text and citations” “Topic modeling”, “Probabilistic topic models” “Probabilistic latent semantic indexing”,… “ A comparison of approaches to large-scale data analysis”, “Pig latin”, “Mesos”, “DryadLINQ”, “PREGEL”, “CIEL”, “Improving MapReduce performance in heterogeneous environments”, “MapReduce Online”, “MapReduce Merge” ,.. Taxonomy H.3.3 IR: Information Search and Retrieval, H.3.1 IR: Content Analysis and Indexing, H.2.8 DB MNGMT: Database Applications, I.2.6 AI: Learning, I.2.7 AI: Natural Language Processing, I.5.1 PAT.REC.: Models H.2.4 DB MNGMT: Systems, D.1.3 PROGR.TECHNIQUES: Concurrent Programming, C.2.4 COMP.- COMM. NETS: Distributed Systems, H.2.8 DB MNGMT: Database Applications, H.3.4 INFO STORAGE AND RETRIEVAL: Systems and Software Keywords topic modeling, latent dirichlet allocation, latent semantic analysis, generative model, text mining big data, Map-Reduce, hadoop, cloud computing, distributed computing, data analytics, machine learning, parallel processing Venues SIGKDD, WSDM, CIKM SIGMOD, BigSystem, CloudCP, EUROSEC, EUROSYS,.. topic: “Topic Modeling” “Cloud/Distributed computing & Big Data Analytics” Good metadata is important Open Science FAIR, Athens, 6-8 Sept, 2017 7
  • 8. Extract features and annotate (enrich) content using NLP, Named Entity Recognition & Semantic Annotation Tokenize, remove stop words Refine stop words for specific domain 1 ENRICH & PRE- PROCESS Identify topics: distribution over words & “side” information Automatic topic curation & entitling Assign topics to publications Evaluate & categorize topics Assess topic labels 2 FIND TOPICS Calculate topic proportions & trends of objects based on their publications Calculate similarity among different entities based on various metrics Analyze & Validate the results 3 CALCULATE TRENDS & SIMILARITIES Create WEB interactive visualization with data driven graphs, charts and layouts Design optimal views Validate modeling results 4 VISUALIZ E What isinvolved? Open Science FAIR, Athens, 6-8 Sept, 2017 8
  • 9. What is the result? Open Science FAIR, Athens, 6-8 Sept, 2017 9
  • 10. 1. Linked information Open Science FAIR, Athens, 6-8 Sept, 2017 10
  • 11. How often is “Topic Modeling” encountered? Rank TopicId Title Weight 230 18 Data management & file systems 0.0028 231 132 Image processing: Face & emotion recognition, facial animation 0.0027 232 373 Project management & software development 0.0027 233 138 Self-adaptive systems & autonomic computing 0.0027 234 360 S/W development, management & maintenance 0.0026 235 96 Gender differences (analysis, studies) 0.0026 236 271 Haptic technology, feedback & multimodal user interaction 0.0025 237 322 Information extraction, Named entity recognition, disambiguation, cleaning 0.0025 238 348 cognitive psychology, cognitive and mental models 0.0025 240 74 HCI: Touch screen interaction & interactive surfaces 0.0025 241 382 Topic Modelling 0.0025 242 230 Trust & reputation analysis and management (IOT, Web, recom. systems) 0.0025 243 2 Wikipedia & collaborative editing 0.0025 245 15 Crowdsourcing & human computation 0.0025 246 273 Automatic programming, refactoring & transformations 0.0024 248 323 Reliability, fault tolerance and recovery 0.0024 249 113 Online / computational advertising 0.0024 Out of 382 Open Science FAIR, Athens, 6-8 Sept, 2017 11
  • 12. Is it trendy? TopicId Title WeightTrend Journal Confer 15Crowdsourcing & human computation 0.003 27.89 0.068 0.035 194Cloud Computing, Storage & Virtualization 0.004 23.56 0.077 0.011 201 Social network analysis: influence, info diffusion, communities 0.004 10.82 0.119 0.066 350Distributed (Big) Data analytics (cloud, MapReduce) 0.006 10.54 0.057 0.022 41Mobile applications 0.005 9.86 0.135 0.019 68Social media analysis (twitter, blogs, news feed) 0.004 9.72 0.078 0.049 366Persuasive technologies, gamification, user engagement 0.003 8.65 0.126 0.070 61Wearable computing, technology & activity recognition 0.003 8.24 0.135 0.044 40ICT in developing countries (India) 0.002 7.72 0.096 0.100 341GPU computing 0.004 6.78 0.120 0.029 133 Recommendation, personalization and collaborative filtering 0.006 6.27 0.096 0.085 134Flash memory structures, storage & systems 0.002 6.2 0.144 0.077 22HCI: Organic & Flexible user interfaces 0.001 6.04 0.123 0.101 74HCI: Touch screen interaction & interactive surfaces 0.003 5.87 0.205 0.118 2Wikipedia & collaborative editing 0.003 5.33 0.079 0.083 52HCI design & user experience 0.013 5.15 0.156 0.082 266Sentiment analysis & opinion mining 0.002 4.95 0.057 0.047 10Image retrieval & object recognition 0.006 4.91 0.082 0.048 382Topic Modelling 0.003 4.57 0.111 0.069 228Software product line engineering 0.003 3.92 0.128 0.094 100Social tagging, annotation & tag recommendation 0.005 3.88 0.115 0.037 294Robotics, human-robot interaction, anthropomorphism 0.005 3.34 0.066 0.170 Top 20 Open Science FAIR, Athens, 6-8 Sept, 2017 12
  • 13. Concept driven search Open Science FAIR, Athens, 6-8 Sept, 2017 13 PubId Weight Title 1646242 0.72Dynamic hyperparameter optimization for bayesian topical trend analysis 1871521 0.67Latent interest-topic model 2505555 0.64On handling textual errors in latent document modeling 2398646 0.63Automatic labeling hierarchical topics 1458337 0.63Combining concept hierarchies and statistical topic models 2348335 0.63Group matrix factorization for scalable topic modeling 2009977 0.63Mining topics on participations for community discovery 1835890 0.62Topic models with power-law using Pitman-Yor process 2398483 0.61Hierarchical topic integration through semi-supervised hierarchical topic modeling 1150482 0.60A mixture model for contextual text mining 1963244 0.60Investigating topic models for social media user recommendation 1281249 0.60Multiscale topic tomography 2086739 0.59Sequential Modeling of Topic Dynamics with Multiple Timescales 1572095 0.59A latent topic model for linked documents 2188143 0.59Latent contextual indexing of annotated documents 1859210 0.58Topic models vs. unstructured data 1487045 0.58Linked Topic and Interest Model for Web Forums 2609471 0.58Probabilistic text modeling with orthogonalized topics 2396861 0.57Modeling topic hierarchies with the recursive chinese restaurant process 2433438 0.57Group sparse topical coding 1935880 0.57Trend analysis model 1390546 0.56Improving text classification accuracy using topic modeling over an additional corpus 1553410 0.55Accounting for burstiness in topic models View top 23 most related publications to “Topic Modeling”
  • 14. Visualization Open Science FAIR, Athens, 6-8 Sept, 2017 14
  • 15. Trendy, old-fashion, common topics Open Science FAIR, Athens, 6-8 Sept, 2017 15
  • 16. Trendy topics Distributed (Big) Data analytics HCI design & user experience GPU Open Science FAIR, Athens, 6-8 Sept, 2017 16
  • 17. Trendy topics Trendy HCI design & user experience GPU Distributed (Big) Data analytics Compare topics Open Science FAIR, Athens, 6-8 Sept, 2017 17
  • 18. Relational DBs Programming Old-fashion topics Open Science FAIR, Athens, 6-8 Sept, 2017 18
  • 19. Do we need another venue?Trendy, but evenly spread across many journals AND conferences TopicI d Title WeightTrend Journa l Confer 15Crowdsourcing & human computation 0.003 27.89 0.068 0.035 194Cloud Computing, Storage & Virtualization 0.004 23.56 0.077 0.011 201 Social network analysis: influence, info diffusion, communities 0.004 10.82 0.119 0.066 350Distributed (Big) Data analytics (cloud, MapReduce) 0.006 10.54 0.057 0.022 41Mobile applications 0.005 9.86 0.135 0.019 68Social media analysis (twitter, blogs, news feed) 0.004 9.72 0.078 0.049 366Persuasive technologies, gamification, user engagement 0.003 8.65 0.126 0.070 61 Wearable computing, technology & activity recognition 0.003 8.24 0.135 0.044 40ICT in developing countries (India) 0.002 7.72 0.096 0.100 341GPU computing 0.004 6.78 0.120 0.029 133 Recommendation, personalization and collaborative filtering 0.006 6.27 0.096 0.085 134Flash memory structures, storage & systems 0.002 6.2 0.144 0.077 22HCI: Organic & Flexible user interfaces 0.001 6.04 0.123 0.101 74HCI: Touch screen interaction & interactive surfaces 0.003 5.87 0.205 0.118 2Wikipedia & collaborative editing 0.003 5.33 0.079 0.083 52HCI design & user experience 0.013 5.15 0.156 0.082 266Sentiment analysis & opinion mining 0.002 4.95 0.057 0.047 10Image retrieval & object recognition 0.006 4.91 0.082 0.048 382Topic Modelling 0.003 4.57 0.111 0.069 228Software product line engineering 0.003 3.92 0.128 0.094 Exclusiv ity 0.103 0.088 0.185 0.079 0.154 0.127 0.196 0.179 0.196 0.149 0.181 0.221 0.224 0.323 0.162 0.238 0.104 0.130 0.180 0.222 0.152 0.236 + 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 5 10 15 20 25 30 Exclusivity Trend Exclusivity vs Trend OpenScienceFAIR,Athens,6-8Sept,201719
  • 20. Genetic algorithms P2P networks & content distribution Important but declining (?) Open Science FAIR, Athens, 6-8 Sept, 2017 20
  • 21. Genetic algorithms Topic birth, death & fluctuation over time Open Science FAIR, Athens, 6-8 Sept, 2017 21
  • 22. Root ACM Categorie s (level 0) LINKS represent topic based similarity NODES represent Authors Similar Authors Topics Highlighted Author +FEATURES Zoom for drill down Search and filtering Dynamic configuration of thresholds Authors Similarity Analysis Open Science FAIR, Athens, 6-8 Sept, 2017 22
  • 23. Categories correlations Open Science FAIR, Athens, 6-8 Sept, 2017 23
  • 24. What is the potential? Open Science FAIR, Athens, 6-8 Sept, 2017 24
  • 25. • Funders and institutions to assess research impact over time • Especiallyusefulwhencombinedwithnon-researchdata • OpenAIREdataandservicesalreadyusedbyECforex-postFP7evaluation • Policy makers • Bindingresearchtosocietalpolicydecisions • Scholarly societies • Determinenewconferences/mergeexisting ones.Introducenewthemes… • Newportalservices(conceptsearch) • Publishers (incl. institutional publications) • Create,adaptjournals… Scratching the surface… Open Science FAIR, Athens, 6-8 Sept, 2017 25
  • 27. Thank you! Natalia Manola natalia@di.uoa.gr +30 210 9876 432 Skype: natalia.manola

Hinweis der Redaktion

  1. Examples of two multi-view topics from ACM corpus analysis demonstrating interpretability and coherence. Proposed MV_HDP (above) analyzes 5 views: text, relational (citation network) and side information (ACM CCS, Keywords & Venues) as shown on the left column. Multi-View topics are described using information from all views and include a ranked list of citation network nodes that are related to that topic. Text only HDP-LDA baseline (bottom) cannot uncover specific topics like “Topic Modeling” or “Cloud/Distributed computing & Big Data Analytics” mixing either text mining related words in the first topic, or similarity search, indexing and MapReduce related words in the second.