SlideShare a Scribd company logo
1 of 31
Download to read offline
PhD Report
Svitlana Vakulenko
TU Wien
February 15, 2016
Overview
Status of thesis
Relation to other work
Next steps and ideas
Status of thesis
So far so. . .
2014
Topic Modeling
Event Extraction
2015
Target-dependent Sentiment Analysis
Information Diffusion
2016
Breaking News Detection
. . .
Topic Modeling
[Vakulenko et al., 2014, Herbst et al., 2014, Reuter et al., 2014]
@ University of Liechtenstein
Method: Latent Dirichlet Allocation (LDA) [Blei et al., 2003]
Datasets: iTunes, case studies, sustainability reports
Topic Modeling: Results [Vakulenko et al., 2014] 1
Figure : Correspondence chart showing the overlap of LDA topics and
iTunes categories
1
https://ai.wu.ac.at/~vakulenko/
Event Extraction
[Katsios et al., 2015]
Summer School @ NCSR Demokritos
Project: REVEAL EU-FP7 2013-2016
Method: Relation Extraction (ClausIE)
Datasets: FACup, SNOW, World Cup (tweets)
Event Extraction: Results [Katsios et al., 2015]
Figure : Relations extracted from FACup dataset
Target-dependent Sentiment Analysis
@ MODUL University Vienna
Method: POS-, Dependency parsing, ML Classifier (Logistic
Regression)
Datasets: MPQA (news articles), JDPA (product reviews)
Target-dependent Sentiment Analysis: Results
Information Diffusion
@ MODUL University Vienna
Project: PHEME EU-FP7 2014-2017
Method: Relation Extraction
Dataset: news articles, tweets
Information Diffusion: Results
Figure : s: president barack obama – p: state D – o:
Breaking News Detection
@ MODUL University Vienna
Project: InVID EU-Horizon 2016-2019
WP: Social Media Mining
Task: Emergent Topic Detection
Dataset: tweets
Status of thesis
Topics Events
Breaking News Sentiment Analysis
Information Diffusion
Relation to other work
State of the Art
Requirements
Newsworthiness
Scalability
Methodology
Data acquisition
Topic modeling
Event extraction
First story detection
State of the Art
SNOW 2014 Data Challenge confirmed newsworthy topic detection
to be a challenging task [Papadopoulos et al., 2014]2:
F-score: 0.4 Precision: 0.56 Recall: 0.36 [Ifrim et al., 2014]
The limitations of the current state-of-the-art approaches include
early topic detection
topic relevance
topic representation
performance evaluation of the topic detection methods.
The most recent results reported in the related
work [Martin et al., 2015]
2
[Van Canneyt et al., 2014, Martin and G¨oker, 2014, Burnside et al., 2014,
Petkos et al., 2014]
Requirements: Newsworthiness
a set of topics for a given time slot ‘covered in mainstream
news sites’ [Papadopoulos et al., 2014]
’the combination of novelty and
significance‘ [Martin et al., 2015]
One common method to find novel (emerging or recent trending)
topics from a data stream is looking for bursts in frequent
occurrences of keywords and phrases
(n-grams) [Martin et al., 2015, Martin and G¨oker, 2014,
Fujiki et al., 2004, Cataldi et al., 2010, Aiello et al., 2013].
Requirements: Scalability
an important requirement when dealing with the data streams
of a high volume and velocity, e.g. Twitter
BNgram approach [Martin and G¨oker, 2014]: 2 minutes per
topic model for a 15-minutes dataset of tweets
Methodology: Data acquisition
Twitter is the major source of news stream
data [Hu et al., 2012].
Only a few studies focus on other data sources than Twitter
stream, e.g.
Wikipedia [Osborne et al., 2012, Steiner et al., 2013].
New: integration of other social media APIs and cross-media
retrieval, e.g.:
tweets → topics(events) → (youtube) → videos
Methodology: Topic modeling
Topic detection approaches often involve
topic clustering
topic ranking
topic labeling
[Petkos et al., 2014, Martin and G¨oker, 2014,
Van Canneyt et al., 2014, Martin et al., 2015, Ifrim et al., 2014,
Elbagoury et al., 2015].
Methodology: Event extraction
News are often centered around specific events (happenings),
which provide a natural way to group the news
stories [Wu et al., 2015].
There exist several on-line services that mine events from news
articles in different languages:
European Media Monitor3 [Pouliquen et al., 2008];
GDELT project4 [Leetaru and Schrodt, 2013];
Event Registry5 [Leban et al., 2014, Rupnik et al., 2015]
A few approaches to extract open-domain events from tweets were
proposed [Popescu et al., 2011, Ritter et al., 2012,
Katsios et al., 2015], but neither of them supports cross-lingual
linking.
3
http://emm.newsbrief.eu
4
http://www.gdeltproject.org/
5
http://eventregistry.org
Methodology: First story detection
The task of first story detection (FSD) was proposed to identify
the first story about a certain event from a document
stream [Petrovic et al., 2012]. The state-of-the-art FSD
approaches use similarity metrics over documents, such as TF-IDF
vectors or Locality Sensitive Hashing (LSH)
[Petrovic et al., 2012, Phuvipadawat and Murata, 2010], to
determine if candidate documents are close to existing documents
or could constitute a new event.
Next steps and ideas
Project: InVID EU-Horizon 2016-2019
WP: Social Media Mining
Deadline: June 2016 (deliverable)
Agenda:
Data acquisition
Breaking news detection
Evaluation framework: Twitter Trends, [Ifrim et al., 2014]
[Martin et al., 2015]
Methodology: topic modeling, event extraction, (semantic and
cross-lingual) ontology-based integration (e.g. BabelNet)
Progress: social media APIs integration proposal
Bibliography I
Aiello, L., Petkos, G., Martin, C., Corney, D., Papadopoulos,
S., Skraba, R., G¨oker, A., Kompatsiaris, I., and Jaimes, A.
(2013).
Sensing Trending Topics in Twitter.
IEEE Transactions on Multimedia, 15(6):1268–1282.
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003).
Latent Dirichlet Allocation.
The Journal of Machine Learning Research, 3:993–1022.
Burnside, G., Milioris, D., and Jacquet, P. (2014).
One Day in Twitter: Topic Detection Via Joint Complexity.
Cataldi, M., Di Caro, L., and Schifanella, C. (2010).
Emerging Topic Detection on Twitter Based on Temporal and
Social Terms Evaluation.
MDMKDD, pages 4:1–4:10.
Bibliography II
Elbagoury, A., Ibrahim, R., Farahat, A., Kamel, M., and
Karray, F. (2015).
Exemplar-Based Topic Detection in Twitter Streams.
In International AAAI Conference on Web and Social Media.
Fujiki, T., Nanno, T., Suzuki, Y., and Okumura, M. (2004).
Identification of bursts in a document stream.
In International Workshop on Knowledge Discovery in Data
Streams, pages 55–64.
Herbst, A., Simons, A., Brocke, J. v., Mller, O., Debortoli, S.,
and Vakulenko, S. (2014).
Identifying and Characterizing Topics in Enterprise Content
Management: a Latent Semantic Analysis of Vendor Case
studies.
In 22st European Conference on Information Systems, ECIS.
Bibliography III
Hu, M., Liu, S., Wei, F., Wu, Y., Stasko, J., and Ma, K.-L.
(2012).
Breaking news on twitter.
In Conference on Human Factors in Computing Systems,
pages 2751–2754.
Ifrim, G., Shi, B., and Brigadir, I. (2014).
Event detection in twitter using aggressive filtering and
hierarchical tweet clustering.
In SNOW-DC@ WWW, pages 33–40.
Katsios, G., Vakulenko, S., Krithara, A., and Paliouras, G.
(2015).
Towards open domain event extraction from twitter: Revealing
entity relations.
In DeRiVE@ ESWC, pages 35–46.
Bibliography IV
Leban, G., Fortuna, B., Brank, J., and Grobelnik, M. (2014).
Cross-lingual detection of world events from news articles.
In Proceedings of the ISWC, pages 21–24.
Leetaru, K. and Schrodt, P. A. (2013).
Gdelt: Global data on events, location, and tone, 1979–2012.
In ISA Annual Convention, volume 2, page 4.
Martin, C., Corney, D., and G¨oker, A. (2015).
Mining Newsworthy Topics from Social Media.
In Advances in Social Media Analysis, pages 21–43.
Martin, C. and G¨oker, A. (2014).
Real-time topic detection with bursty n-grams: RGU’s
submission to the 2014 SNOW challenge.
In SNOW-DC@ WWW.
Bibliography V
Osborne, M., Petrovic, S., McCreadie, R., Macdonald, C., and
Ounis, I. (2012).
Bieber no more: First story detection using Twitter and
Wikipedia.
In TAIA.
Papadopoulos, S., Corney, D., and Aiello, L. M. (2014).
Snow 2014 data challenge: Assessing the performance of news
topic detection methods in social media.
In SNOW-DC@ WWW, pages 1–8.
Petkos, G., Papadopoulos, S., and Kompatsiaris, Y. (2014).
Two-level message clustering for topic detection in twitter.
In SNOW-DC@ WWW, pages 49–56.
Bibliography VI
Petrovic, S., Osborne, M., and Lavrenko, V. (2012).
Using paraphrases for improving first story detection in news
and Twitter.
In Conference of the North American Chapter of the
Association for Computational Linguistics, pages 338–346.
Phuvipadawat, S. and Murata, T. (2010).
Breaking News Detection and Tracking in Twitter.
In International Conference on Web Intelligence and Intelligent
Agent Technology (WI-IAT), volume 3, pages 120–123.
Popescu, A.-M., Pennacchiotti, M., and Paranjpe, D. (2011).
Extracting events and event descriptions from twitter.
In WWW, pages 105–106.
Bibliography VII
Pouliquen, B., Steinberger, R., and Deguernel, O. (2008).
Story tracking: linking similar news over time and across
languages.
In Proceedings of the workshop on Multi-source Multilingual
Information Extraction and Summarization, pages 49–56.
Reuter, N., Vakulenko, S., Brocke, J. v., Debortoli, S., and
Mller, O. (2014).
Identifying the Role of Information Systems in Achieving
Energy-Related Environmental Sustainability using Text
Mining.
In 22st European Conference on Information Systems, ECIS.
Ritter, A., Etzioni, O., Clark, S., and others (2012).
Open domain event extraction from twitter.
In SIGKDD, pages 1104–1112.
Bibliography VIII
Rupnik, J., Muhic, A., Leban, G., Skraba, P., Fortuna, B., and
Grobelnik, M. (2015).
News Across Languages-Cross-Lingual Document Similarity
and Event Tracking.
arXiv preprint arXiv:1512.07046.
Steiner, T., van Hooland, S., and Summers, E. (2013).
MJ No More: Using Concurrent Wikipedia Edit Spikes with
Social Network Plausibility Checks for Breaking News
Detection.
In WWW, pages 791–794.
Vakulenko, S., Mller, O., and Brocke, J. v. (2014).
Enriching iTunes App Store Categories via Topic Modeling.
In Proceedings of the International Conference on Information
Systems ICIS.
Bibliography IX
Van Canneyt, S., Feys, M., Schockaert, S., Demeester, T.,
Develder, C., and Dhoedt, B. (2014).
Detecting newsworthy topics in Twitter.
In SNOW-DC@ WWW, pages 1–8.
Wu, Z., Chen, L., and Giles, C. L. (2015).
Storybase: Towards Building a Knowledge Base for News
Events.
In ACL, pages 133–138.

More Related Content

What's hot

Dc health communication
Dc health communicationDc health communication
Dc health communicationBradford Hesse
 
Gender, Academic Position & Publishing: a bibliometric analysis of the oeuvre...
Gender, Academic Position & Publishing: a bibliometric analysis of the oeuvre...Gender, Academic Position & Publishing: a bibliometric analysis of the oeuvre...
Gender, Academic Position & Publishing: a bibliometric analysis of the oeuvre...Inge van der Weijden
 
Conducting Online Surveys during Pandemic
Conducting Online Surveys during PandemicConducting Online Surveys during Pandemic
Conducting Online Surveys during PandemicRyan Michael Oducado
 
Complexity and Chaos in remote schools
Complexity and Chaos in remote schoolsComplexity and Chaos in remote schools
Complexity and Chaos in remote schoolsjohn_c_guenther
 
performance task
performance taskperformance task
performance tasksusan70
 
Ch5 e research and scholarly community in the humanities
Ch5 e research and scholarly community in the humanitiesCh5 e research and scholarly community in the humanities
Ch5 e research and scholarly community in the humanitiesWebometrics Class
 
H1N1 Information Sharing ICCH 2011
H1N1 Information Sharing ICCH 2011H1N1 Information Sharing ICCH 2011
H1N1 Information Sharing ICCH 2011Sara Locatelli
 
Picot question introduction technology keeps adva
Picot question introduction technology keeps advaPicot question introduction technology keeps adva
Picot question introduction technology keeps advaJUST36
 
Literature evaluation table student name change topic (2
Literature evaluation table student name change topic (2Literature evaluation table student name change topic (2
Literature evaluation table student name change topic (2ADDY50
 
Riding current and future 'global' trends in medical education
Riding current and future 'global' trends in medical education  Riding current and future 'global' trends in medical education
Riding current and future 'global' trends in medical education Poh-Sun Goh
 
Research Summaries: An Evolving Tool in the KMb Tool Box
Research Summaries: An Evolving Tool in the KMb Tool BoxResearch Summaries: An Evolving Tool in the KMb Tool Box
Research Summaries: An Evolving Tool in the KMb Tool BoxShawna Reibling
 
Populomics, personalized medicine
Populomics, personalized medicinePopulomics, personalized medicine
Populomics, personalized medicineBradford Hesse
 
Grds international conference on social science (7)
Grds international conference on social science (7)Grds international conference on social science (7)
Grds international conference on social science (7)Global R & D Services
 
00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...
00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...
00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...Duke Network Analysis Center
 
Byu smart & connected 2012
Byu smart & connected 2012Byu smart & connected 2012
Byu smart & connected 2012Bradford Hesse
 
Real Time Delphi Briefing 8/08
Real Time Delphi Briefing 8/08Real Time Delphi Briefing 8/08
Real Time Delphi Briefing 8/08Frank Catanzaro
 
Bc14042, bc14029
Bc14042, bc14029Bc14042, bc14029
Bc14042, bc14029Ali Mughal
 

What's hot (20)

Dc health communication
Dc health communicationDc health communication
Dc health communication
 
Digital experiences in technical higher education
Digital experiences in technical higher educationDigital experiences in technical higher education
Digital experiences in technical higher education
 
Gender, Academic Position & Publishing: a bibliometric analysis of the oeuvre...
Gender, Academic Position & Publishing: a bibliometric analysis of the oeuvre...Gender, Academic Position & Publishing: a bibliometric analysis of the oeuvre...
Gender, Academic Position & Publishing: a bibliometric analysis of the oeuvre...
 
Conducting Online Surveys during Pandemic
Conducting Online Surveys during PandemicConducting Online Surveys during Pandemic
Conducting Online Surveys during Pandemic
 
Complexity and Chaos in remote schools
Complexity and Chaos in remote schoolsComplexity and Chaos in remote schools
Complexity and Chaos in remote schools
 
performance task
performance taskperformance task
performance task
 
Ch5 e research and scholarly community in the humanities
Ch5 e research and scholarly community in the humanitiesCh5 e research and scholarly community in the humanities
Ch5 e research and scholarly community in the humanities
 
Picot question
Picot questionPicot question
Picot question
 
H1N1 Information Sharing ICCH 2011
H1N1 Information Sharing ICCH 2011H1N1 Information Sharing ICCH 2011
H1N1 Information Sharing ICCH 2011
 
Picot question introduction technology keeps adva
Picot question introduction technology keeps advaPicot question introduction technology keeps adva
Picot question introduction technology keeps adva
 
SACNAS poster
SACNAS posterSACNAS poster
SACNAS poster
 
Literature evaluation table student name change topic (2
Literature evaluation table student name change topic (2Literature evaluation table student name change topic (2
Literature evaluation table student name change topic (2
 
Riding current and future 'global' trends in medical education
Riding current and future 'global' trends in medical education  Riding current and future 'global' trends in medical education
Riding current and future 'global' trends in medical education
 
Research Summaries: An Evolving Tool in the KMb Tool Box
Research Summaries: An Evolving Tool in the KMb Tool BoxResearch Summaries: An Evolving Tool in the KMb Tool Box
Research Summaries: An Evolving Tool in the KMb Tool Box
 
Populomics, personalized medicine
Populomics, personalized medicinePopulomics, personalized medicine
Populomics, personalized medicine
 
Grds international conference on social science (7)
Grds international conference on social science (7)Grds international conference on social science (7)
Grds international conference on social science (7)
 
00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...
00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...
00 Social Networks of Youth and Young People Who Misuse Prescription Opiods a...
 
Byu smart & connected 2012
Byu smart & connected 2012Byu smart & connected 2012
Byu smart & connected 2012
 
Real Time Delphi Briefing 8/08
Real Time Delphi Briefing 8/08Real Time Delphi Briefing 8/08
Real Time Delphi Briefing 8/08
 
Bc14042, bc14029
Bc14042, bc14029Bc14042, bc14029
Bc14042, bc14029
 

Similar to Vakulenko PhD Status Report - 16 February 2016

Classifying Twitter Content
Classifying Twitter ContentClassifying Twitter Content
Classifying Twitter ContentStephen Dann
 
COURSE CODE COURSE NAME BRM221 Research Method IICourse
COURSE CODE COURSE NAME BRM221 Research Method IICourseCOURSE CODE COURSE NAME BRM221 Research Method IICourse
COURSE CODE COURSE NAME BRM221 Research Method IICourseCruzIbarra161
 
10. MOOCs context in the world – the main drivers behind MOOCs - Eamon Costel...
10. MOOCs context in the world – the main drivers behind MOOCs - Eamon Costel...10. MOOCs context in the world – the main drivers behind MOOCs - Eamon Costel...
10. MOOCs context in the world – the main drivers behind MOOCs - Eamon Costel...Tiberio Feliz Murias
 
Event detection in twitter using text and image fusion
Event detection in twitter using text and image fusionEvent detection in twitter using text and image fusion
Event detection in twitter using text and image fusioncsandit
 
Computational Social Science – what is it and what can(‘t) it do?
Computational Social Science – what is it and what can(‘t) it do?Computational Social Science – what is it and what can(‘t) it do?
Computational Social Science – what is it and what can(‘t) it do?Christian Bokhove
 
Fusing text and image for event
Fusing text and image for eventFusing text and image for event
Fusing text and image for eventijma
 
Ullmann
UllmannUllmann
Ullmannanesah
 
A presentation on Applications of ICT in Research.pptx
A presentation on Applications of ICT in Research.pptxA presentation on Applications of ICT in Research.pptx
A presentation on Applications of ICT in Research.pptxROHITSHARMA779690
 
Weller social media as research data_psm15
Weller social media as research data_psm15Weller social media as research data_psm15
Weller social media as research data_psm15Katrin Weller
 
POLITICAL PREDICTION ANALYSIS USING TEXT MINING
POLITICAL PREDICTION ANALYSIS USING TEXT MININGPOLITICAL PREDICTION ANALYSIS USING TEXT MINING
POLITICAL PREDICTION ANALYSIS USING TEXT MININGVishwambhar Deshpande
 
Digital Scholarship and Impact Factors: Methods to Connect Your Research
Digital Scholarship and Impact Factors: Methods to Connect Your ResearchDigital Scholarship and Impact Factors: Methods to Connect Your Research
Digital Scholarship and Impact Factors: Methods to Connect Your ResearchLaura Pasquini
 
Challenges in-archiving-twitter
Challenges in-archiving-twitterChallenges in-archiving-twitter
Challenges in-archiving-twitterKatrin Weller
 
#ELearn14 Digital Scholarship
#ELearn14 Digital Scholarship#ELearn14 Digital Scholarship
#ELearn14 Digital ScholarshipLaura Pasquini
 
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...IRJET Journal
 
Pasquinietall2014 digital scholarship_e_learn14
Pasquinietall2014 digital scholarship_e_learn14Pasquinietall2014 digital scholarship_e_learn14
Pasquinietall2014 digital scholarship_e_learn14Jenny S Wakefield, Ph.D.
 
Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53
Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53
Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53IRJET Journal
 
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...Liliana Bounegru
 
Digital Transformation in Higher Education – New Cohorts, New Requirements?. ...
Digital Transformation in Higher Education – New Cohorts, New Requirements?. ...Digital Transformation in Higher Education – New Cohorts, New Requirements?. ...
Digital Transformation in Higher Education – New Cohorts, New Requirements?. ...eraser Juan José Calderón
 

Similar to Vakulenko PhD Status Report - 16 February 2016 (20)

Classifying Twitter Content
Classifying Twitter ContentClassifying Twitter Content
Classifying Twitter Content
 
COURSE CODE COURSE NAME BRM221 Research Method IICourse
COURSE CODE COURSE NAME BRM221 Research Method IICourseCOURSE CODE COURSE NAME BRM221 Research Method IICourse
COURSE CODE COURSE NAME BRM221 Research Method IICourse
 
10. MOOCs context in the world – the main drivers behind MOOCs - Eamon Costel...
10. MOOCs context in the world – the main drivers behind MOOCs - Eamon Costel...10. MOOCs context in the world – the main drivers behind MOOCs - Eamon Costel...
10. MOOCs context in the world – the main drivers behind MOOCs - Eamon Costel...
 
Event detection in twitter using text and image fusion
Event detection in twitter using text and image fusionEvent detection in twitter using text and image fusion
Event detection in twitter using text and image fusion
 
Final_report6
Final_report6Final_report6
Final_report6
 
Computational Social Science – what is it and what can(‘t) it do?
Computational Social Science – what is it and what can(‘t) it do?Computational Social Science – what is it and what can(‘t) it do?
Computational Social Science – what is it and what can(‘t) it do?
 
Fusing text and image for event
Fusing text and image for eventFusing text and image for event
Fusing text and image for event
 
Ullmann
UllmannUllmann
Ullmann
 
A presentation on Applications of ICT in Research.pptx
A presentation on Applications of ICT in Research.pptxA presentation on Applications of ICT in Research.pptx
A presentation on Applications of ICT in Research.pptx
 
Weller social media as research data_psm15
Weller social media as research data_psm15Weller social media as research data_psm15
Weller social media as research data_psm15
 
Sub1557
Sub1557Sub1557
Sub1557
 
POLITICAL PREDICTION ANALYSIS USING TEXT MINING
POLITICAL PREDICTION ANALYSIS USING TEXT MININGPOLITICAL PREDICTION ANALYSIS USING TEXT MINING
POLITICAL PREDICTION ANALYSIS USING TEXT MINING
 
Digital Scholarship and Impact Factors: Methods to Connect Your Research
Digital Scholarship and Impact Factors: Methods to Connect Your ResearchDigital Scholarship and Impact Factors: Methods to Connect Your Research
Digital Scholarship and Impact Factors: Methods to Connect Your Research
 
Challenges in-archiving-twitter
Challenges in-archiving-twitterChallenges in-archiving-twitter
Challenges in-archiving-twitter
 
#ELearn14 Digital Scholarship
#ELearn14 Digital Scholarship#ELearn14 Digital Scholarship
#ELearn14 Digital Scholarship
 
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
 
Pasquinietall2014 digital scholarship_e_learn14
Pasquinietall2014 digital scholarship_e_learn14Pasquinietall2014 digital scholarship_e_learn14
Pasquinietall2014 digital scholarship_e_learn14
 
Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53
Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53
Irjet v4 i73A Survey on Student’s Academic Experiences using Social Media Data53
 
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
Improving the Coverage of Complex Issues with Data Journalism and Digital Met...
 
Digital Transformation in Higher Education – New Cohorts, New Requirements?. ...
Digital Transformation in Higher Education – New Cohorts, New Requirements?. ...Digital Transformation in Higher Education – New Cohorts, New Requirements?. ...
Digital Transformation in Higher Education – New Cohorts, New Requirements?. ...
 

Recently uploaded

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Vakulenko PhD Status Report - 16 February 2016

  • 1. PhD Report Svitlana Vakulenko TU Wien February 15, 2016
  • 2. Overview Status of thesis Relation to other work Next steps and ideas
  • 3. Status of thesis So far so. . . 2014 Topic Modeling Event Extraction 2015 Target-dependent Sentiment Analysis Information Diffusion 2016 Breaking News Detection . . .
  • 4. Topic Modeling [Vakulenko et al., 2014, Herbst et al., 2014, Reuter et al., 2014] @ University of Liechtenstein Method: Latent Dirichlet Allocation (LDA) [Blei et al., 2003] Datasets: iTunes, case studies, sustainability reports
  • 5. Topic Modeling: Results [Vakulenko et al., 2014] 1 Figure : Correspondence chart showing the overlap of LDA topics and iTunes categories 1 https://ai.wu.ac.at/~vakulenko/
  • 6. Event Extraction [Katsios et al., 2015] Summer School @ NCSR Demokritos Project: REVEAL EU-FP7 2013-2016 Method: Relation Extraction (ClausIE) Datasets: FACup, SNOW, World Cup (tweets)
  • 7. Event Extraction: Results [Katsios et al., 2015] Figure : Relations extracted from FACup dataset
  • 8. Target-dependent Sentiment Analysis @ MODUL University Vienna Method: POS-, Dependency parsing, ML Classifier (Logistic Regression) Datasets: MPQA (news articles), JDPA (product reviews)
  • 10. Information Diffusion @ MODUL University Vienna Project: PHEME EU-FP7 2014-2017 Method: Relation Extraction Dataset: news articles, tweets
  • 11. Information Diffusion: Results Figure : s: president barack obama – p: state D – o:
  • 12. Breaking News Detection @ MODUL University Vienna Project: InVID EU-Horizon 2016-2019 WP: Social Media Mining Task: Emergent Topic Detection Dataset: tweets
  • 13. Status of thesis Topics Events Breaking News Sentiment Analysis Information Diffusion
  • 14. Relation to other work State of the Art Requirements Newsworthiness Scalability Methodology Data acquisition Topic modeling Event extraction First story detection
  • 15. State of the Art SNOW 2014 Data Challenge confirmed newsworthy topic detection to be a challenging task [Papadopoulos et al., 2014]2: F-score: 0.4 Precision: 0.56 Recall: 0.36 [Ifrim et al., 2014] The limitations of the current state-of-the-art approaches include early topic detection topic relevance topic representation performance evaluation of the topic detection methods. The most recent results reported in the related work [Martin et al., 2015] 2 [Van Canneyt et al., 2014, Martin and G¨oker, 2014, Burnside et al., 2014, Petkos et al., 2014]
  • 16. Requirements: Newsworthiness a set of topics for a given time slot ‘covered in mainstream news sites’ [Papadopoulos et al., 2014] ’the combination of novelty and significance‘ [Martin et al., 2015] One common method to find novel (emerging or recent trending) topics from a data stream is looking for bursts in frequent occurrences of keywords and phrases (n-grams) [Martin et al., 2015, Martin and G¨oker, 2014, Fujiki et al., 2004, Cataldi et al., 2010, Aiello et al., 2013].
  • 17. Requirements: Scalability an important requirement when dealing with the data streams of a high volume and velocity, e.g. Twitter BNgram approach [Martin and G¨oker, 2014]: 2 minutes per topic model for a 15-minutes dataset of tweets
  • 18. Methodology: Data acquisition Twitter is the major source of news stream data [Hu et al., 2012]. Only a few studies focus on other data sources than Twitter stream, e.g. Wikipedia [Osborne et al., 2012, Steiner et al., 2013]. New: integration of other social media APIs and cross-media retrieval, e.g.: tweets → topics(events) → (youtube) → videos
  • 19. Methodology: Topic modeling Topic detection approaches often involve topic clustering topic ranking topic labeling [Petkos et al., 2014, Martin and G¨oker, 2014, Van Canneyt et al., 2014, Martin et al., 2015, Ifrim et al., 2014, Elbagoury et al., 2015].
  • 20. Methodology: Event extraction News are often centered around specific events (happenings), which provide a natural way to group the news stories [Wu et al., 2015]. There exist several on-line services that mine events from news articles in different languages: European Media Monitor3 [Pouliquen et al., 2008]; GDELT project4 [Leetaru and Schrodt, 2013]; Event Registry5 [Leban et al., 2014, Rupnik et al., 2015] A few approaches to extract open-domain events from tweets were proposed [Popescu et al., 2011, Ritter et al., 2012, Katsios et al., 2015], but neither of them supports cross-lingual linking. 3 http://emm.newsbrief.eu 4 http://www.gdeltproject.org/ 5 http://eventregistry.org
  • 21. Methodology: First story detection The task of first story detection (FSD) was proposed to identify the first story about a certain event from a document stream [Petrovic et al., 2012]. The state-of-the-art FSD approaches use similarity metrics over documents, such as TF-IDF vectors or Locality Sensitive Hashing (LSH) [Petrovic et al., 2012, Phuvipadawat and Murata, 2010], to determine if candidate documents are close to existing documents or could constitute a new event.
  • 22. Next steps and ideas Project: InVID EU-Horizon 2016-2019 WP: Social Media Mining Deadline: June 2016 (deliverable) Agenda: Data acquisition Breaking news detection Evaluation framework: Twitter Trends, [Ifrim et al., 2014] [Martin et al., 2015] Methodology: topic modeling, event extraction, (semantic and cross-lingual) ontology-based integration (e.g. BabelNet) Progress: social media APIs integration proposal
  • 23. Bibliography I Aiello, L., Petkos, G., Martin, C., Corney, D., Papadopoulos, S., Skraba, R., G¨oker, A., Kompatsiaris, I., and Jaimes, A. (2013). Sensing Trending Topics in Twitter. IEEE Transactions on Multimedia, 15(6):1268–1282. Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent Dirichlet Allocation. The Journal of Machine Learning Research, 3:993–1022. Burnside, G., Milioris, D., and Jacquet, P. (2014). One Day in Twitter: Topic Detection Via Joint Complexity. Cataldi, M., Di Caro, L., and Schifanella, C. (2010). Emerging Topic Detection on Twitter Based on Temporal and Social Terms Evaluation. MDMKDD, pages 4:1–4:10.
  • 24. Bibliography II Elbagoury, A., Ibrahim, R., Farahat, A., Kamel, M., and Karray, F. (2015). Exemplar-Based Topic Detection in Twitter Streams. In International AAAI Conference on Web and Social Media. Fujiki, T., Nanno, T., Suzuki, Y., and Okumura, M. (2004). Identification of bursts in a document stream. In International Workshop on Knowledge Discovery in Data Streams, pages 55–64. Herbst, A., Simons, A., Brocke, J. v., Mller, O., Debortoli, S., and Vakulenko, S. (2014). Identifying and Characterizing Topics in Enterprise Content Management: a Latent Semantic Analysis of Vendor Case studies. In 22st European Conference on Information Systems, ECIS.
  • 25. Bibliography III Hu, M., Liu, S., Wei, F., Wu, Y., Stasko, J., and Ma, K.-L. (2012). Breaking news on twitter. In Conference on Human Factors in Computing Systems, pages 2751–2754. Ifrim, G., Shi, B., and Brigadir, I. (2014). Event detection in twitter using aggressive filtering and hierarchical tweet clustering. In SNOW-DC@ WWW, pages 33–40. Katsios, G., Vakulenko, S., Krithara, A., and Paliouras, G. (2015). Towards open domain event extraction from twitter: Revealing entity relations. In DeRiVE@ ESWC, pages 35–46.
  • 26. Bibliography IV Leban, G., Fortuna, B., Brank, J., and Grobelnik, M. (2014). Cross-lingual detection of world events from news articles. In Proceedings of the ISWC, pages 21–24. Leetaru, K. and Schrodt, P. A. (2013). Gdelt: Global data on events, location, and tone, 1979–2012. In ISA Annual Convention, volume 2, page 4. Martin, C., Corney, D., and G¨oker, A. (2015). Mining Newsworthy Topics from Social Media. In Advances in Social Media Analysis, pages 21–43. Martin, C. and G¨oker, A. (2014). Real-time topic detection with bursty n-grams: RGU’s submission to the 2014 SNOW challenge. In SNOW-DC@ WWW.
  • 27. Bibliography V Osborne, M., Petrovic, S., McCreadie, R., Macdonald, C., and Ounis, I. (2012). Bieber no more: First story detection using Twitter and Wikipedia. In TAIA. Papadopoulos, S., Corney, D., and Aiello, L. M. (2014). Snow 2014 data challenge: Assessing the performance of news topic detection methods in social media. In SNOW-DC@ WWW, pages 1–8. Petkos, G., Papadopoulos, S., and Kompatsiaris, Y. (2014). Two-level message clustering for topic detection in twitter. In SNOW-DC@ WWW, pages 49–56.
  • 28. Bibliography VI Petrovic, S., Osborne, M., and Lavrenko, V. (2012). Using paraphrases for improving first story detection in news and Twitter. In Conference of the North American Chapter of the Association for Computational Linguistics, pages 338–346. Phuvipadawat, S. and Murata, T. (2010). Breaking News Detection and Tracking in Twitter. In International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), volume 3, pages 120–123. Popescu, A.-M., Pennacchiotti, M., and Paranjpe, D. (2011). Extracting events and event descriptions from twitter. In WWW, pages 105–106.
  • 29. Bibliography VII Pouliquen, B., Steinberger, R., and Deguernel, O. (2008). Story tracking: linking similar news over time and across languages. In Proceedings of the workshop on Multi-source Multilingual Information Extraction and Summarization, pages 49–56. Reuter, N., Vakulenko, S., Brocke, J. v., Debortoli, S., and Mller, O. (2014). Identifying the Role of Information Systems in Achieving Energy-Related Environmental Sustainability using Text Mining. In 22st European Conference on Information Systems, ECIS. Ritter, A., Etzioni, O., Clark, S., and others (2012). Open domain event extraction from twitter. In SIGKDD, pages 1104–1112.
  • 30. Bibliography VIII Rupnik, J., Muhic, A., Leban, G., Skraba, P., Fortuna, B., and Grobelnik, M. (2015). News Across Languages-Cross-Lingual Document Similarity and Event Tracking. arXiv preprint arXiv:1512.07046. Steiner, T., van Hooland, S., and Summers, E. (2013). MJ No More: Using Concurrent Wikipedia Edit Spikes with Social Network Plausibility Checks for Breaking News Detection. In WWW, pages 791–794. Vakulenko, S., Mller, O., and Brocke, J. v. (2014). Enriching iTunes App Store Categories via Topic Modeling. In Proceedings of the International Conference on Information Systems ICIS.
  • 31. Bibliography IX Van Canneyt, S., Feys, M., Schockaert, S., Demeester, T., Develder, C., and Dhoedt, B. (2014). Detecting newsworthy topics in Twitter. In SNOW-DC@ WWW, pages 1–8. Wu, Z., Chen, L., and Giles, C. L. (2015). Storybase: Towards Building a Knowledge Base for News Events. In ACL, pages 133–138.