Cmpe 255 Short Story Assignment

•Als PPTX, PDF herunterladen•

0 gefällt mir•23 views

San Jose State University

A brief review of one of the IEEE Research Paper.

Daten & Analysen

Authors
● Rani Megasari
● Erna Piantari
● Rizki Nugraha

Data
● collected from communication medias within the Computer Science Education
Department of FPMIPA UPI.
● Data used in this research are a collection of descriptions of job vacancies in
Indonesian. If written in English, then it will be translated first.

Data Preprocessing
● Case Folding : to change preliminary data into cleaner data, for example
uniforming capital letters into lowercase, removing numbers, punctuation, and
emails on a job vacancy
● Tokenization : to break down a sentence into individual words.
● Filtering : to filter words that are considered to be unimportant (stop words)
from data inputs.
● Stemming : to remove affixes in a word to turn it into its base form

TF-IDF
● TF*IDF algorithm is used to weigh a keyword in any content and assign
importance to that keyword based on the number of times it appears in the
document.
● Scenario 1 : Without Min_df
● Scenario 2 : min_df = 10 words

Feature Extraction with PCA
● Principal Component Analysis, or PCA, is a dimensionality-reduction method
that is often used to reduce the dimensionality of large data sets, by
transforming a large set of variables into a smaller one that still contains most
of the information in the large set
● Dimension reduction into 2 components

K-Means Clustering
1. Initialize number of cluster, k-cluster.
2. Randomly select k-data as centroid of each cluster
3. Calculate the distance of each data to all centroid, the closest distance determines
cluster membership.
4. Calculate the average distance of all members in the cluster to the centroid to
determine the new centroid.
Do the same for the whole cluster.
5. Repeat step 3 and 4 until the members in the cluster are not changed anymore.

Best Cluster amount analysis using K-Means
● Elbow Method
● Silhouette Coefficient

Table of number of Cluster and Iteration that are required
to converge

Elbow Method for Scenario 1 (without min_df)

Silhouette Coefficient Method for Scenario 1 (Without min_df)

Silhouette Coefficient Method for Scenario 2 (min_df = 10)

Conclusion
● The effect of setting minimum document frequency on the weighting step affects the result
each cluster receives, in this research scenario 2 with a minimum document frequency of 10
words shows a better result.
● On the second scenario, it is found that cluster evaluation result using a consistent Elbow
and Silhouette Coefficient method refers to an optimum value of k=3, and with a small
amount of clustering iteration of 5 iterations.
● The smaller the produced dimension is during weighting, the better the clustering result
obtained will be.
● Although, the silhouette of both scenarios is still at around 0.5, which means that grouping is
still not done perfectly because a good value is the one that approaches 1.

Empfohlen

Parallel Processing Concepts Dr Shashikant Athawale

Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...EuroIoTa

Stegnography of high embedding efficiency by using an extended matrix encodin...eSAT Publishing House

Chap3 slidesDivya Grover

Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...Jinho Choi

Unsupervised LearningSAHEEL FAL DESAI

Mining the social web 6HyeonSeok Choi

On the cross domain reusability of neural modules for general video game playingAlexander Braylan

Empfohlen

Parallel Processing Concepts Dr Shashikant Athawale

Variable neighborhood Prediction of temporal collective profiles by Keun-Woo ...EuroIoTa

Stegnography of high embedding efficiency by using an extended matrix encodin...eSAT Publishing House

Chap3 slidesDivya Grover

Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...Jinho Choi

Unsupervised LearningSAHEEL FAL DESAI

Mining the social web 6HyeonSeok Choi

On the cross domain reusability of neural modules for general video game playingAlexander Braylan

Unsupervised learning (clustering)Pravinkumar Landge

ausgrid 2005Nay Lin Soe

A minimization approach for two level logic synthesis using constrained depth...IAEME Publication

Chapter 3 pcHanif Durad

DATA MINING:Clustering TypesAshwin Shenoy M

Density based Clustering Algorithms(DB SCAN, Mean shift )Utkarsh Sharma

Chap3 slidesBaliThorat1

K means and dbscanYan Xu

SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...ijmnct

Improving K-NN Internet Traffic Classification Using Clustering and Principle...journalBEEI

K-means ClusteringAnna Fensel

AN ADAPTIVE PSEUDORANDOM STEGO-CRYPTO TECHNIQUE FOR DATA COMMUNICATIONIJCNCJournal

Adbms 25 distributed database systemsVaibhav Khanna

Unsupervised LearningAlia Hamwi

50120140505013IAEME Publication

Learning to Learn by Gradient Descent by Gradient DescentKaty Lee

3.6 constraint based cluster analysisKrish_ver2

Scheduling Using Multi Objective Genetic Algorithmiosrjce

Experimental study of Data clustering using k- Means and modified algorithmsIJDKP

Parallel KNN for Big Data using Adaptive IndexingIRJET Journal

Density Based Clustering Approach for Solving the Software Component Restruct...IRJET Journal

Weitere ähnliche Inhalte

Was ist angesagt?

Unsupervised learning (clustering)Pravinkumar Landge

ausgrid 2005Nay Lin Soe

A minimization approach for two level logic synthesis using constrained depth...IAEME Publication

Chapter 3 pcHanif Durad

DATA MINING:Clustering TypesAshwin Shenoy M

Density based Clustering Algorithms(DB SCAN, Mean shift )Utkarsh Sharma

Chap3 slidesBaliThorat1

K means and dbscanYan Xu

SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...ijmnct

Improving K-NN Internet Traffic Classification Using Clustering and Principle...journalBEEI

K-means ClusteringAnna Fensel

AN ADAPTIVE PSEUDORANDOM STEGO-CRYPTO TECHNIQUE FOR DATA COMMUNICATIONIJCNCJournal

Adbms 25 distributed database systemsVaibhav Khanna

Unsupervised LearningAlia Hamwi

50120140505013IAEME Publication

Learning to Learn by Gradient Descent by Gradient DescentKaty Lee

3.6 constraint based cluster analysisKrish_ver2

Scheduling Using Multi Objective Genetic Algorithmiosrjce

Was ist angesagt? (19)

Unsupervised learning (clustering)

ausgrid 2005

A minimization approach for two level logic synthesis using constrained depth...

Chapter 3 pc

DATA MINING:Clustering Types

Density based Clustering Algorithms(DB SCAN, Mean shift )

Chap3 slides

K means and dbscan

SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...

Improving K-NN Internet Traffic Classification Using Clustering and Principle...

K-means Clustering

AN ADAPTIVE PSEUDORANDOM STEGO-CRYPTO TECHNIQUE FOR DATA COMMUNICATION

Adbms 25 distributed database systems

Unsupervised Learning

50120140505013

Learning to Learn by Gradient Descent by Gradient Descent

3.6 constraint based cluster analysis

Scheduling Using Multi Objective Genetic Algorithm

Ähnlich wie Cmpe 255 Short Story Assignment

Experimental study of Data clustering using k- Means and modified algorithmsIJDKP

Parallel KNN for Big Data using Adaptive IndexingIRJET Journal

Density Based Clustering Approach for Solving the Software Component Restruct...IRJET Journal

An improvement in k mean clustering algorithm using better time and accuracyijpla

Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL

K means reportGaurav Handa

Noura2Dr-mahmoud Algamel

Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...IRJET Journal

Dynamic approach to k means clustering algorithm-2IAEME Publication

Premeditated Initial Points for K-Means ClusteringIJCSIS Research Publications

Vol 16 No 2 - July-December 2016ijcsbi

IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET Journal

SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONSijscmcj

FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETScsandit

Dimensionality ReductionSaad Elbeleidy

H04564550IOSR-JEN

New Approach for K-mean and K-medoids AlgorithmEditor IJCATR

A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant

A046010107IJERA Editor

Ähnlich wie Cmpe 255 Short Story Assignment (20)

Experimental study of Data clustering using k- Means and modified algorithms

Parallel KNN for Big Data using Adaptive Indexing

Density Based Clustering Approach for Solving the Software Component Restruct...

An improvement in k mean clustering algorithm using better time and accuracy

Extended pso algorithm for improvement problems k means clustering algorithm

K means report

Noura2

Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...

Dynamic approach to k means clustering algorithm-2

Premeditated Initial Points for K-Means Clustering

Vol 16 No 2 - July-December 2016

IRJET- Diverse Approaches for Document Clustering in Product Development Anal...

SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS

FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS

Dimensionality Reduction

H04564550

New Approach for K-mean and K-medoids Algorithm

A Comparative Study Of Various Clustering Algorithms In Data Mining

A046010107

Kürzlich hochgeladen

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Capstone Project on IBM Data Analytics ProgramMoniSankarHazra

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls

Zuja dropshipping via API with DroFx.pptxolyaivanovalion

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal

Invezz.com - Grow your wealth with trading signalsInvezz1

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

Midocean dropshipping via API with DroFxolyaivanovalion

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

Kürzlich hochgeladen (20)

BigBuy dropshipping via API with DroFx.pptx

100-Concepts-of-AI by Anupama Kate .pptx

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Capstone Project on IBM Data Analytics Program

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779

Zuja dropshipping via API with DroFx.pptx

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

Invezz.com - Grow your wealth with trading signals

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Midocean dropshipping via API with DroFx

Log Analysis using OSSEC sasoasasasas.pptx

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Schema on read is obsolete. Welcome metaprogramming..pdf

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...

CebaBaby dropshipping via API with DroFX.pptx

Cmpe 255 Short Story Assignment

1. CMPE 255 Short Story Assignment

2. About the Paper ● Paper Link : https://ieeexplore.ieee.org/document/9392067 ● Paper Title : Graduates Profile Mapping based on Job Vacancy Information Clustering ● Published in: 2020 6th International Conference on Science in Information Technology (ICSITech) ● Date of Conference: 21-22 Oct. 2020 ● Date Added to IEEE Xplore: 05 April 2021

3. Authors ● Rani Megasari ● Erna Piantari ● Rizki Nugraha

4. Purpose of the Paper ● The paper proposes a solution using clustering as a text mining technique to cluster the social media data related to the job vacancies and map vacancies to the eligible graduates. ● The purpose is to find a specific pattern related to the profile of graduates often demanded by companies such that the graduates search for talents within their alma mater

5. Data ● collected from communication medias within the Computer Science Education Department of FPMIPA UPI. ● Data used in this research are a collection of descriptions of job vacancies in Indonesian. If written in English, then it will be translated first.

6. Steps that are performed

8. Data Preprocessing ● Case Folding : to change preliminary data into cleaner data, for example uniforming capital letters into lowercase, removing numbers, punctuation, and emails on a job vacancy ● Tokenization : to break down a sentence into individual words. ● Filtering : to filter words that are considered to be unimportant (stop words) from data inputs. ● Stemming : to remove affixes in a word to turn it into its base form

9. TF-IDF ● TF*IDF algorithm is used to weigh a keyword in any content and assign importance to that keyword based on the number of times it appears in the document. ● Scenario 1 : Without Min_df ● Scenario 2 : min_df = 10 words

10. Feature Extraction with PCA ● Principal Component Analysis, or PCA, is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information in the large set ● Dimension reduction into 2 components

11. K-Means Clustering 1. Initialize number of cluster, k-cluster. 2. Randomly select k-data as centroid of each cluster 3. Calculate the distance of each data to all centroid, the closest distance determines cluster membership. 4. Calculate the average distance of all members in the cluster to the centroid to determine the new centroid. Do the same for the whole cluster. 5. Repeat step 3 and 4 until the members in the cluster are not changed anymore.

12. Best Cluster amount analysis using K-Means ● Elbow Method ● Silhouette Coefficient

13. Table of number of Cluster and Iteration that are required to converge

14. Elbow Method for Scenario 1 (without min_df)

15. Elbow Method Scenario 2 (min_df = 10)

16. Silhouette Coefficient Method for Scenario 1 (Without min_df)

17. Silhouette Coefficient Method for Scenario 2 (min_df = 10)

18. Conclusion ● The effect of setting minimum document frequency on the weighting step affects the result each cluster receives, in this research scenario 2 with a minimum document frequency of 10 words shows a better result. ● On the second scenario, it is found that cluster evaluation result using a consistent Elbow and Silhouette Coefficient method refers to an optimum value of k=3, and with a small amount of clustering iteration of 5 iterations. ● The smaller the produced dimension is during weighting, the better the clustering result obtained will be. ● Although, the silhouette of both scenarios is still at around 0.5, which means that grouping is still not done perfectly because a good value is the one that approaches 1.