SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Literature Recommendation Software
Faruk Cankaya
Melike Keskin
Supervisor: Florian Schramm
Professor: Prof. Dr. Jürgen Ernstberger
April 15, 2021
Agenda
➢ Introduction
➢ Related Works
➢ Methodology
○ Data preparation
○ Topic Extraction
○ Finding similar papers
➢ Results
➢ Conclusion and Future Work
➢ Questions
Introduction
➢ Problem Statement
○ No preliminary data
○ Paragraph input
Introduction
➢ Keyword based input (X)
➢ Reference based recommendation (X)
➢ Mostly cited papers (X)

https://images.unsplash.com/photo-1526721940322-10fb6e3ae94a?utm_medium=medium&w=700&q=50&auto=format
https://cdn-images-1.medium.com/max/880/0*LHnFAic3Jw4N_IdP
https://images.unsplash.com/photo-1532012197267-da84d127e765?utm_medium=medium&w=700&q=50&auto=format
Introduction
➢ Problem Statement
○ No preliminary data
○ Paragraph input
➢ Motivation
○ First recommender system based on just a paragraph input
○ Specific area based paper recommendation
○ Wide area to try different technique combinations
○ Make easier the writing thesis
○ Time saving
○ Specific domain
Agenda
➢ Introduction
➢ Related Works
➢ Methodology
○ Data preparation
○ Topic Extraction
○ Finding similar papers
➢ Results
➢ Conclusion and Future Work
➢ Questions
➢ Related Works
○ Scienstein: A Research Paper Recommender System
■ Paper recommender
■ Hybrid filtering
■ Citation, author and source analysis
■ Preliminary data (citation analysis, author analysis, source analysis )
○ Science Concierge: A Fast Content-Based Recommendation System for
Scientific Publications
■ Paper recommender
■ Content-based filtering
■ Topic Modeling
■ Preliminary data (users’ votes)
○ ScienceDirect: Topic Modeling Driven Content-Based Jobs Recommendation
Engine for Recruitment Industry
■ Job recommender
■ Content-based filtering
■ Topic Modeling
■ Preliminary data (job description, user details)
Related Works
Agenda
➢ Introduction
➢ Related Works
➢ Methodology
○ Data preparation
○ Topic Extraction
○ Finding similar papers
➢ Results
➢ Conclusion and Future Work
➢ Questions
Methodology
Methodology
➢ Used Method
○ Content-based
○ Data Preprocessing
■ Cleaning + Tokenization + Stop Word Removing + Lemmatization
○ Topic modelling
■ LDA
■ NMF
○ Similarity Function
■ Cosine Similarity
➢ Data preparation
○ Number of documents: ~12.000 papers
○ Tokenization, Cleaning text, Stop word removal, Stemming,
Lemmatization, Synonym replacement, POS, etc.
Our Model:
Cleaning + Tokenization + Stop Word Removing + Lemmatization
Methodology
Methodology
➢ Vectorization
Vectorization
● Bag of words
● TF-IDF……...
Preprocessed input
text
Vectorized data
Methodology
➢ Vectorization
○ Bag-Of-Words
○ TF-IDF
terms, features or corpus
items or
documents
Methodology
➢ Topic Extraction
○ Applied Topic Modeling Technique
■ LDA
■ NMF
Methodology
Vectorized data
➢ Topic Extraction
Terms in each topic
Topic Probability of each document
Methodology
➢ Prediction / Recommendation
○ based on Cosine Similarity
Topic Probability
Matrix of dataset
Topic Probability
Vector of input
Agenda
➢ Introduction
➢ Related Works
➢ Methodology
○ Data preparation
○ Topic Extraction
○ Finding Similarity
➢ Results
➢ Conclusion and Future Work
➢ Questions
Results
➢ Data preprocessing steps effect
Results
➢ Data preprocessing steps effect
Results
➢ Model Comparisons
Results
➢ Number of Words in User Input
Results
➢ Validation with user feedback
○ Before user feedback
■ Accuracy with content 3
● LDA is better than NMF
■ Accuracy with content 10
● NMF is better than LDA
○ After user feedback
■ NMF is better than LDA
Agenda
➢ Introduction
○ Problem Statement
○ Motivation
➢ Related Works
➢ Methodology
○ Data preparation
○ Topic Extraction
○ Finding Similarity
➢ Results
➢ Conclusion and Future Work
➢ Questions
Conclusion & Future Works
➢ Conclusion
○ Found optimal data preprocessing model
■ Cleaning + Tokenization + Stop Word Removing + Lemmatization
○ Compared 2 different topic modelling techniques
■ LDA, and NMF
○ Compared model accuracies
○ User ratings
■ Models with LDA, and NMF
➢ Future Works
➢ Try another techniques such as BERT and check if the result of these
techniques give better result on user rating feedback.
➢ Use user ratings to improve recommendation system
➢ Add new features to the website
➢ Try different topic modellings
➢ Try different similarity functions
➢ Train a model use the extracted topics
➢ Tune the hyperparameters according to new techniques
Conclusion & Future Works
Agenda
➢ Introduction
➢ Related Works
➢ Methodology
○ Data preparation
○ Topic Extraction
○ Finding Similarity
➢ Results
➢ Conclusion and Future Work
➢ Questions
DEMO
➢ Web Site
Thank You
Questions?

Weitere ähnliche Inhalte

Ähnlich wie Literature Recommendation Software

Curtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahooCurtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahoo羽祈 張
 
Henning agt talk-caise-semnet
Henning agt   talk-caise-semnetHenning agt   talk-caise-semnet
Henning agt talk-caise-semnetcaise2013vlc
 
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...Daniel Davis
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringTraian Rebedea
 
Data Management for Quantitative Biology - Lecture 1, Apr 16, 2015
Data Management for Quantitative Biology - Lecture 1, Apr 16, 2015Data Management for Quantitative Biology - Lecture 1, Apr 16, 2015
Data Management for Quantitative Biology - Lecture 1, Apr 16, 2015QBiC_Tue
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesBesnik Fetahu
 
Balancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PMBalancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PMProduct School
 
Open Education 2016: Leveraging Open Educational Resources to Expand Access t...
Open Education 2016: Leveraging Open Educational Resources to Expand Access t...Open Education 2016: Leveraging Open Educational Resources to Expand Access t...
Open Education 2016: Leveraging Open Educational Resources to Expand Access t...Dan Blickensderfer
 
1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional data1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional dataSantosConleyha
 
Ai and ml study group lecture 1 and 2
Ai and ml study group   lecture 1 and 2Ai and ml study group   lecture 1 and 2
Ai and ml study group lecture 1 and 2Ashley Davis
 
Making Sense of It All: Analyzing Qualitative Data
Making Sense of It All: Analyzing Qualitative DataMaking Sense of It All: Analyzing Qualitative Data
Making Sense of It All: Analyzing Qualitative DataGeorge Hayhoe
 
Research Methods in Medical Informatics
Research Methods in Medical InformaticsResearch Methods in Medical Informatics
Research Methods in Medical InformaticsSerkan Turkeli
 
PFCC special lecture on materials informatics_nanotech2023
PFCC special lecture on materials informatics_nanotech2023PFCC special lecture on materials informatics_nanotech2023
PFCC special lecture on materials informatics_nanotech2023Matlantis
 
Coursera data science specialization
Coursera data science specializationCoursera data science specialization
Coursera data science specializationMengshu Liu
 

Ähnlich wie Literature Recommendation Software (20)

Curtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahooCurtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahoo
 
Henning agt talk-caise-semnet
Henning agt   talk-caise-semnetHenning agt   talk-caise-semnet
Henning agt talk-caise-semnet
 
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
Enabling Real-Time Adaptivity in MOOCs with a Personalized Next-Step Recommen...
 
Research Problem
Research ProblemResearch Problem
Research Problem
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
Data Management for Quantitative Biology - Lecture 1, Apr 16, 2015
Data Management for Quantitative Biology - Lecture 1, Apr 16, 2015Data Management for Quantitative Biology - Lecture 1, Apr 16, 2015
Data Management for Quantitative Biology - Lecture 1, Apr 16, 2015
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
 
Balancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PMBalancing PM & Software Development Practices by Splunk Sr PM
Balancing PM & Software Development Practices by Splunk Sr PM
 
A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-...
A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-...A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-...
A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-...
 
Seminar2017
Seminar2017Seminar2017
Seminar2017
 
Pmp session 1
Pmp session 1Pmp session 1
Pmp session 1
 
Open Education 2016: Leveraging Open Educational Resources to Expand Access t...
Open Education 2016: Leveraging Open Educational Resources to Expand Access t...Open Education 2016: Leveraging Open Educational Resources to Expand Access t...
Open Education 2016: Leveraging Open Educational Resources to Expand Access t...
 
Intro
IntroIntro
Intro
 
first_seminar.pdf
first_seminar.pdffirst_seminar.pdf
first_seminar.pdf
 
1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional data1.2 Motivating Challenges As mentioned earlier, traditional data
1.2 Motivating Challenges As mentioned earlier, traditional data
 
Ai and ml study group lecture 1 and 2
Ai and ml study group   lecture 1 and 2Ai and ml study group   lecture 1 and 2
Ai and ml study group lecture 1 and 2
 
Making Sense of It All: Analyzing Qualitative Data
Making Sense of It All: Analyzing Qualitative DataMaking Sense of It All: Analyzing Qualitative Data
Making Sense of It All: Analyzing Qualitative Data
 
Research Methods in Medical Informatics
Research Methods in Medical InformaticsResearch Methods in Medical Informatics
Research Methods in Medical Informatics
 
PFCC special lecture on materials informatics_nanotech2023
PFCC special lecture on materials informatics_nanotech2023PFCC special lecture on materials informatics_nanotech2023
PFCC special lecture on materials informatics_nanotech2023
 
Coursera data science specialization
Coursera data science specializationCoursera data science specialization
Coursera data science specialization
 

Kürzlich hochgeladen

Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...ronahami
 
Geometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfGeometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfJNTUA
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailingAshishSingh1301
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxCHAIRMAN M
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentationsj9399037128
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxMustafa Ahmed
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxkalpana413121
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelDrAjayKumarYadav4
 
一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样
一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样
一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样A
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxMustafa Ahmed
 
Databricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdfDatabricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdfVinayVadlagattu
 
一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书
一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书
一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书c3384a92eb32
 
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...Amil baba
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfJNTUA
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsVIEW
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationEmaan Sharma
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docxrahulmanepalli02
 
Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfSkNahidulIslamShrabo
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdfAlexander Litvinenko
 

Kürzlich hochgeladen (20)

Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
Geometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfGeometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdf
 
handbook on reinforce concrete and detailing
handbook on reinforce concrete and detailinghandbook on reinforce concrete and detailing
handbook on reinforce concrete and detailing
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentation
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...
 
Autodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptxAutodesk Construction Cloud (Autodesk Build).pptx
Autodesk Construction Cloud (Autodesk Build).pptx
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
 
一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样
一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样
一比一原版(NEU毕业证书)东北大学毕业证成绩单原件一模一样
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
Databricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdfDatabricks Generative AI FoundationCertified.pdf
Databricks Generative AI FoundationCertified.pdf
 
一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书
一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书
一比一原版(Griffith毕业证书)格里菲斯大学毕业证成绩单学位证书
 
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, Functions
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & Modernization
 
21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx21P35A0312 Internship eccccccReport.docx
21P35A0312 Internship eccccccReport.docx
 
Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdf
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 

Literature Recommendation Software

  • 1. Literature Recommendation Software Faruk Cankaya Melike Keskin Supervisor: Florian Schramm Professor: Prof. Dr. Jürgen Ernstberger April 15, 2021
  • 2. Agenda ➢ Introduction ➢ Related Works ➢ Methodology ○ Data preparation ○ Topic Extraction ○ Finding similar papers ➢ Results ➢ Conclusion and Future Work ➢ Questions
  • 3. Introduction ➢ Problem Statement ○ No preliminary data ○ Paragraph input
  • 4. Introduction ➢ Keyword based input (X) ➢ Reference based recommendation (X) ➢ Mostly cited papers (X)  https://images.unsplash.com/photo-1526721940322-10fb6e3ae94a?utm_medium=medium&w=700&q=50&auto=format https://cdn-images-1.medium.com/max/880/0*LHnFAic3Jw4N_IdP https://images.unsplash.com/photo-1532012197267-da84d127e765?utm_medium=medium&w=700&q=50&auto=format
  • 5. Introduction ➢ Problem Statement ○ No preliminary data ○ Paragraph input ➢ Motivation ○ First recommender system based on just a paragraph input ○ Specific area based paper recommendation ○ Wide area to try different technique combinations ○ Make easier the writing thesis ○ Time saving ○ Specific domain
  • 6. Agenda ➢ Introduction ➢ Related Works ➢ Methodology ○ Data preparation ○ Topic Extraction ○ Finding similar papers ➢ Results ➢ Conclusion and Future Work ➢ Questions
  • 7. ➢ Related Works ○ Scienstein: A Research Paper Recommender System ■ Paper recommender ■ Hybrid filtering ■ Citation, author and source analysis ■ Preliminary data (citation analysis, author analysis, source analysis ) ○ Science Concierge: A Fast Content-Based Recommendation System for Scientific Publications ■ Paper recommender ■ Content-based filtering ■ Topic Modeling ■ Preliminary data (users’ votes) ○ ScienceDirect: Topic Modeling Driven Content-Based Jobs Recommendation Engine for Recruitment Industry ■ Job recommender ■ Content-based filtering ■ Topic Modeling ■ Preliminary data (job description, user details) Related Works
  • 8. Agenda ➢ Introduction ➢ Related Works ➢ Methodology ○ Data preparation ○ Topic Extraction ○ Finding similar papers ➢ Results ➢ Conclusion and Future Work ➢ Questions
  • 10. Methodology ➢ Used Method ○ Content-based ○ Data Preprocessing ■ Cleaning + Tokenization + Stop Word Removing + Lemmatization ○ Topic modelling ■ LDA ■ NMF ○ Similarity Function ■ Cosine Similarity
  • 11. ➢ Data preparation ○ Number of documents: ~12.000 papers ○ Tokenization, Cleaning text, Stop word removal, Stemming, Lemmatization, Synonym replacement, POS, etc. Our Model: Cleaning + Tokenization + Stop Word Removing + Lemmatization Methodology
  • 12. Methodology ➢ Vectorization Vectorization ● Bag of words ● TF-IDF……... Preprocessed input text Vectorized data
  • 13. Methodology ➢ Vectorization ○ Bag-Of-Words ○ TF-IDF terms, features or corpus items or documents
  • 14. Methodology ➢ Topic Extraction ○ Applied Topic Modeling Technique ■ LDA ■ NMF
  • 15. Methodology Vectorized data ➢ Topic Extraction Terms in each topic Topic Probability of each document
  • 16. Methodology ➢ Prediction / Recommendation ○ based on Cosine Similarity Topic Probability Matrix of dataset Topic Probability Vector of input
  • 17. Agenda ➢ Introduction ➢ Related Works ➢ Methodology ○ Data preparation ○ Topic Extraction ○ Finding Similarity ➢ Results ➢ Conclusion and Future Work ➢ Questions
  • 21. Results ➢ Number of Words in User Input
  • 22. Results ➢ Validation with user feedback ○ Before user feedback ■ Accuracy with content 3 ● LDA is better than NMF ■ Accuracy with content 10 ● NMF is better than LDA ○ After user feedback ■ NMF is better than LDA
  • 23. Agenda ➢ Introduction ○ Problem Statement ○ Motivation ➢ Related Works ➢ Methodology ○ Data preparation ○ Topic Extraction ○ Finding Similarity ➢ Results ➢ Conclusion and Future Work ➢ Questions
  • 24. Conclusion & Future Works ➢ Conclusion ○ Found optimal data preprocessing model ■ Cleaning + Tokenization + Stop Word Removing + Lemmatization ○ Compared 2 different topic modelling techniques ■ LDA, and NMF ○ Compared model accuracies ○ User ratings ■ Models with LDA, and NMF
  • 25. ➢ Future Works ➢ Try another techniques such as BERT and check if the result of these techniques give better result on user rating feedback. ➢ Use user ratings to improve recommendation system ➢ Add new features to the website ➢ Try different topic modellings ➢ Try different similarity functions ➢ Train a model use the extracted topics ➢ Tune the hyperparameters according to new techniques Conclusion & Future Works
  • 26. Agenda ➢ Introduction ➢ Related Works ➢ Methodology ○ Data preparation ○ Topic Extraction ○ Finding Similarity ➢ Results ➢ Conclusion and Future Work ➢ Questions