2. Contents
Introduction
Algorithms
Feature SelectionTechniques
Dataset
Expected Output Format
Proposed Project Pipeline
FeasibilityAnalysis
System Environment
System Design
Results and Discussions
Model Deployment
Git History
Conclusion
References
3. Introduction
• Proposed system on Machine learning helps students in computer science
stream to choose right career path
• Based on their scores in various subjects, skills in communication, coding
etc.
4. Literature Review
Title of the paper Tanya V Yadalam, Vaishnavi M Gowda, Vanditha Shiva Kumar, Disha
Girish “Career Recommendation Systems using Content based
Filtering”Proceedings of the Fifth International Conference on
Communication and Electronics Systems (ICCES 2020)
Area of work Career Recommendation for registered computer stream students
Dataset Dataset contains 17 columns and 20000 entries
Methodology/Strategy Recommendation takes a form based input from the user and
recommends an accurate and apt career. It recommends three best
career options. Career prediction is based on the data filled by the user
and perform cosine similarity on that. By applying cosine similarity
function, get the similarity between previous user preference and the
available jobs and finally get the top recommended jobs according to
the score of the similarity.The aim is to implement a feedback and a
comment and perform NLP on the given feedback and determine
whether it’s a positive, negative or neutral comment to provide better
results to the students using the recommender system.
Algorithm NLP, Cosine Similarity,Content Based Filtering
Result/Accuracy Recommend top three jobs by applying cosine similarity.
NLP determine whether the comment or feedback is positive, negative
or neutral.
Advantages Introduce features of security, reliability and transparency.
Recommend 3 top jobs.
Can give feedbacks and comments.
Limitations It is mainly for Engineering students.
No data Encryption mechanism.
Future Proposal In future it can be developed for other branches such as business, arts.
System can be implemented using collaborative approach.
5. Paper 2
Title of the paper Vignesh S, Shivani Priyanka C, Shree Manju H, Mythili K “An Intelligent
Career Guidance System using Machine Learning “. 2021 7th International
Conference on Advanced Computing & Communication Systems (ICACCS)
Area of work Engineering department prediction for students after plustwo.
Dataset The dataset used for the machine learning model is developed manually. In the
dataset, there are five different target labels available each representing a
specific department. The dataset contains more than 500 rows which means
500 unique values with several features and target variables. There are seven
different features available on the dataset.
Methodology/Strategy The framework totally consists of three modules where the whole process take
place.
First module is skill set assessment module.
The second module is the prediction module where with the help of the scores
obtained by the candidate the prediction takes place. The third and final
module is the result analysis module. In this module a detailed analysis of the
candidate’s performance will be represented in various formats
Algorithm KNN, SVM, Naïve Bayes, K-Means Clustering
Result/Accuracy KNN Accuracy: 0.9410
SVM: 0.8632
Naïve Bayes: 0.8714
Advantages Lower the chances of selecting a department by a candidate where the
candidate has higher chance of failure rate.
Limitations The skill set analysis need not be with exact knowledge of students.
The system is for plustwo students and they have only knowledge within their
academics.
Future Proposal In the near future the framework’s accuracy rate will be enhanced and
additional features can be used for recommending a suitable department and
also the outliers of the framework will be removed gradually.
6. Paper 3
Title of the paper Manar Qamhieh, Haya Sammaneh, Mona Nabil Demaidi “PCRS: Personalized Career-Path
Recommender System for Engineering Students. Supported by the An-Najah National
University, Palestine, Research Project, under grant ANNU-1920-Sc004. Published on 2020.
Area of work Guidance system for high school students in Palestinian community to choose engineering
discipline.
Dataset The dataset is mainly collected from research survey and stored in database and analysed to
create an association between personality types and engineering disciplines in Palestine. The
MBTI personality test consists of 21 questions randomly chosen from a dataset of 70 questions.
Methodology/Strategy There are 4 main phases:
Obtaining student’s personal information including gender, high school grades in STEM courses,
and a list of extra-curricular interests.
Determining student’s personality type based on a self-administered personality test.
Processing input data to construct a personal and academic profile for each student.
Build a fuzzy recommender system to provide students with personalized and user-specific
ranking of engineering disciplines.
Algorithm Fuzzy Logic
Result/Accuracy The output of the PCRS application is a bar plot to show the suitability rates of engineering
disciplines after applying the fuzzy logic of the system.
Advantages Helpful for high school students in developing countries where educational and professional
guidance in schools is limited.
The bar plot representation provides the user with clear results in a simple way.
User can be compare the suitability rates of engineering disciplines.
Limitations Time consuming because for each student corresponding processed data of a specific
engineering discipline is entered into fuzzy logic and the fuzzy logic determine a personalized
rate for it. It should be repeater for all seven engineering disciplines considered in PCRS.
Future Proposal In the future, PCRS can be extended to consider more university departments and disciplines
other than engineering.
The recommendation can be enhanced to consider social-economic factors such as employment
rates, economical situation and parent’s background specially in developing countries such as
Palestine
7. Dataset
• Dataset contains 20000 records
• Students have to answer 24 questions related to their ability in academics ,
personality, coding
• These 24 questions are feature list
• Suggested job role is the class label
11. Algorithms
• DecisionTree: Classification technique used to classify records in pictorial
format
• Attribute Selection Measures: Gini Index (a cost function used to evaluate
splits in the dataset. It is calculated by subtracting the sum of the squared
probabilities of each class from one.), Entropy(measure of the randomness
in the information being processed)
12. • XG Boost: It works on gradient boosting algorithm
• Gradient boosting algorithm works on the basic principle gradient descent.This
model is built using tree-based learners(Decision Trees)
• XGradient boosting Algorithm:
• Final prediction=Base value(the starting prediction from basic decision
tree)+LR*w1+LR*w2+..+LR*wn
• Where LR= learning rate=eta
• w1=residual predicted value by 1st residual model
• wn=residual predicted value by nth residual model
• Xgboost is different from other gradient boost is because of its tuning parameters
• The main tuning parameters are 1)regularisation parameter(Lambda)
• 2)threshold that defines auto pruning (Gamma)
• 3)Learning rate(eta)
13. • SVM: Perform classification by finding the best hyperplane that classifies
datapoints in a best way.
• Searching for linear optimal separating hyperplane( decision boundary)
• Find hyperplane using support vectors and margins
• Training tuples that fall on hyperplane are support vectors
• Farther a hyperplane from datapoint, larger its margin- optimal hyperplane
• Kernel, gamma, c( regularization parameter), random_state
17. • Data collection: Find an appropriate dataset with appropriate parameters
like academic scores, specialization programs, analytic capabilities, personal
details like hobbies, workshops, certifications, books interested, etc.
• Data Pre-processing: Make the acquired data set in an organized format.
Cleaning the null values, invalid data values, and unwanted data.
• OneHot Encoding: Applying techniques for converting categorical values in
the data into a numerical or ordinal format so that they can be provided to
machine learning algorithms.
18. Feasibility Analysis
• Technical Feasibility: The application is technically feasible because all the technical
resources required for the development and working of the application is easily
available and reliable. The codes are written in Google Colab, therefore all the
libraries will be available, no need to install or import each of those.
• Economic Feasibility: The code is working on Google Colab .So the colab consumes
an amount of internet. The development of the system will not need a huge amount
of money. It will be economically feasible.
• Operational Feasibility:Since the code iswritten on Google Colab, no need for
worrying about importing or installing the libraries required. There is no need
of skill for a new user to open this application and use it
19. System Environment
• Software Environment: Various software used for the development of this application are the
following :
• P a n d a s , P y t h o n , M a t p l o t l i b , N u mp y, L a b e l E n co d er,
O r d i n a l E n co d er, S e l e ct K B e s t , G o o g l e C o l a b , Vi s u a l S t u d i o ,
H T M L & C S S , F l a s k , G i t H u b
• H a r d w a r e E n v i r o n m e n t :
• Processor : 2 GHz or faster (dual-core or quad-core will be much
faster)Memory : 8 GB RAM or greater
• Disk space : 40 GB or greaterGood
internet connectivity
23. Results and Discussions
• First calculated accuracy for XGBoost was 5.75. And after label encoding
techniques and ordinal encoding for categorical values and feature selection
technique Chi-Squared test accuracy was increased to 80.683. The saved
model xgboost.sav was loaded by importing pickle package.
26. Conclusion
• By using this system, we predicted job suited for the student based on
similarity score, gain of branch
• Proposed system gives students the insight towards their career and choose
the one suits for them.
27. References
• Vignesh S, Shivani Priyanka C, Shree Manju H, Mythili K “An Intelligent Career
Guidance System using Machine Learning “. 2021 7th International Conference on
Advanced Computing & Communication Systems (ICACCS)
• TanyaVYadalam,Vaishnavi M Gowda,Vanditha Shiva Kumar, Disha Girish “Career
Recommendation Systems using Content based Filtering” Proceedings of the Fifth
International Conference on Communication and Electronics Systems (ICCES 2020)
• Manar Qamhieh, Haya Sammaneh, Mona Nabil Demaidi “PCRS: Personalized
Career-Path Recommender System for Engineering Students. Supported by the
An-Najah National University, Palestine, Research Project, under grant ANNU-
1920-Sc004. Published on 2020.