SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
DATA
SCIENCE LAB
PROJECT
Master Degree: Data Science
Accomplished by:
A. Portaluppi & L. Ravazzi &
M. Spandri
A.A. 2019-2020
1
INTRODUCTION
DATA
SET
DEMS Publications
(Dipartimento di
Economia, Metodi
Quantitativi e Strategie
di Impresa).
Find topics studied
by DEMS universitary
researcher.
Multidimensional
Scaling techniques
and Cluster
Analysis.
2
PURPOSES TOOLS
DATA
MANAGEMENT:
1. Exploration
2. Preprocessing
3. Data Cleaning
4. NLP
MULTI-
DIMENSIONAL
SCALING:
1. Common
Multidimension
al Scaling
2. Metric Scaling
3. Sammon
Mapping
CLUSTER
ANALYSIS:
Prototype-Based:
Fuzzy Algorithm
DATA
VISUALIZATION:
RShiny
Application
STEP BY STEP
3
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
ID TITLE JOURNAL
ABSTRAC
T
ABSTRAC
T_ENG
KEYWOR
DS
KEYWOR
DS_ENG
235 … … …
ID DEMS_AUTHORS
235 …
235 …
235 …
…
HOW TO MANAGE THE DATA SETS?
4
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
HOW TO CHOOSE
RECORDS?
30% DOCUMENTS LEFT
5
JOURNAL ARTICLES
WRITTEN BY
ASSISTANT
PROFESSORS ETC.
ENGLISH
LANGUAGE
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
What does it means in English?
Perfect! There is a field
which specifies the
language.
The language of an article is
the language of the abstract
(textcat function).
6
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
NATURAL LANGUAGE PROCESSING
Start with mixed texts
(title, abstract,
keywords and journal)
Bag of
words
1. Drop out punctuation,
stop-words, non-letter
character
2. All in lower case
3. Stemming process
1 2 3 4
Compute
tf_idf
7
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
MULTIDIMENSIONAL SCALING
WHAT?
A function to
project data from
a N-dimensional
space to 2 or 3
dimensions
WHY?
• Graphical
approach
(Clustering)
• Increase
Interpretability
HOW?
• Metric
• Non Metric
8
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
Table with text and the
number of terms into the
bag of words.
Choose a proximity measures.
Apply the desidered
technique.
9
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
1. Common Multidimensional Scaling
(Euclidean distance)
2. Metric Scaling
3. Sammon Mapping (Manhattan
distance)
We applied three
techniques:
and will describe only the last
one due to the good results.
10
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
SAMMON MAPPING
Minimize Sammon Stress:
where is the distance
between the i-th and j-th
observation in the initial space,
while refers to the final
space.
For metric
and non
metric data
Non-linear
trasformation
approach
(different from
PCA)
11
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
Since an article can touch
different topics, clustering must
be of fuzzy type.
CLUSTERING
Labels of clusters rely on the
fifteen words most frequent
in the bag of words.
12
Manhattan distance is used
in order to build clusters.
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
LABELS
13
DEMS
ECONOMICS
STATISTICS
BUSINESS
STRATEGY
• Finance and
Energy
• Economic policy
• Macroeconomics
• Income
Distribution
• Game Theory
• Health Statistics
• Pure Statistics
• Statistics and
Finance
• Social Issues
• Industrial Economic
• Corporate Finance
DATA
MANAGEMENT
DATA
VISUALIZATION
CLUSTER
ANALYSIS
MULTIDIMENSIONAL
SCALING
14
MOVE TO
RSHINY!
1
2
3
SUMMARY
CONCLUSIONS
15
Multidimensional
scaling is a
powerful tool to
visualize data.
We found the main
topics studied by
DEMS researches.
MDS and
Clustering can
show interesting
patterns in data.
FUTURE DEVELOPMENTS
Other techniques for scaling, such as
Self Organizing Maps.
Other proximity measures for MDS.
Consider not only singleton into the
bag of words (Association Analysis).
16
17
THANK YOU
FOR YOUR
ATTENTION

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

2.6 Curve Sketching Rcbhs
2.6 Curve Sketching Rcbhs2.6 Curve Sketching Rcbhs
2.6 Curve Sketching Rcbhs
 
Presentation on application of matrix
Presentation on application of matrixPresentation on application of matrix
Presentation on application of matrix
 
Matrix and it's Application
Matrix and it's ApplicationMatrix and it's Application
Matrix and it's Application
 
Spatial Data Model 2
Spatial Data Model 2Spatial Data Model 2
Spatial Data Model 2
 
Geographical information system unit 5
Geographical information  system unit 5Geographical information  system unit 5
Geographical information system unit 5
 
Matrix in software engineering
Matrix in software engineeringMatrix in software engineering
Matrix in software engineering
 
Applications of Matrix
Applications of MatrixApplications of Matrix
Applications of Matrix
 
How to train your mind to think like the ai machine you are training
How to train your mind to think like the ai machine you are trainingHow to train your mind to think like the ai machine you are training
How to train your mind to think like the ai machine you are training
 
Application of calculus in cse
Application of calculus in cseApplication of calculus in cse
Application of calculus in cse
 
Uses Of Calculus is Computer Science
Uses Of Calculus is Computer ScienceUses Of Calculus is Computer Science
Uses Of Calculus is Computer Science
 
Data Visualisation using SSRS: Euclid's Royal Road to the numbers
Data Visualisation using SSRS: Euclid's Royal Road to the numbersData Visualisation using SSRS: Euclid's Royal Road to the numbers
Data Visualisation using SSRS: Euclid's Royal Road to the numbers
 
Use of matrix in daily life
Use of matrix in daily lifeUse of matrix in daily life
Use of matrix in daily life
 
Application of matrices in real life and matrix
Application of matrices in real life and matrixApplication of matrices in real life and matrix
Application of matrices in real life and matrix
 
datamodel_vector
datamodel_vectordatamodel_vector
datamodel_vector
 
Spatial data analysis
Spatial data analysisSpatial data analysis
Spatial data analysis
 
2.6b scatter plots and lines of best fit
2.6b scatter plots and lines of best fit2.6b scatter plots and lines of best fit
2.6b scatter plots and lines of best fit
 
Calculus
CalculusCalculus
Calculus
 
Geo-spatial Analysis and Modelling
Geo-spatial Analysis and ModellingGeo-spatial Analysis and Modelling
Geo-spatial Analysis and Modelling
 
Applications of Linear Algebra in Computer Sciences
Applications of Linear Algebra in Computer SciencesApplications of Linear Algebra in Computer Sciences
Applications of Linear Algebra in Computer Sciences
 
Applications of matrices in Real\Daily life
Applications of matrices in Real\Daily lifeApplications of matrices in Real\Daily life
Applications of matrices in Real\Daily life
 

Ähnlich wie Data science lab project

Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
butest
 
Screening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptxScreening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptx
NitishChoudhary23
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 

Ähnlich wie Data science lab project (20)

UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHONUNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
UNIT - 5 : 20ACS04 – PROBLEM SOLVING AND PROGRAMMING USING PYTHON
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
 
Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...
 
KIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdfKIT-601 Lecture Notes-UNIT-2.pdf
KIT-601 Lecture Notes-UNIT-2.pdf
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
 
Nguyen - Science of Information, Computation and Fusion - Spring Review 2013
Nguyen - Science of Information, Computation and Fusion - Spring Review 2013Nguyen - Science of Information, Computation and Fusion - Spring Review 2013
Nguyen - Science of Information, Computation and Fusion - Spring Review 2013
 
Intro & Applications of Discrete Math
Intro & Applications of Discrete MathIntro & Applications of Discrete Math
Intro & Applications of Discrete Math
 
Screening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptxScreening of Mental Health in Adolescents using ML.pptx
Screening of Mental Health in Adolescents using ML.pptx
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLESANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
 
Ijatcse71852019
Ijatcse71852019Ijatcse71852019
Ijatcse71852019
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Data Analytics Career Paths
Data Analytics Career PathsData Analytics Career Paths
Data Analytics Career Paths
 
Data analytics career path
Data analytics career pathData analytics career path
Data analytics career path
 
Linear Regression with R programming.pptx
Linear Regression with R programming.pptxLinear Regression with R programming.pptx
Linear Regression with R programming.pptx
 
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSSCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Data science lab project