SlideShare ist ein Scribd-Unternehmen logo
1 von 4
Spam Mail Prediction
Introduction
In the era of information technology, information sharing has become very easy
and fast. Many platforms are available for users to share information anywhere
across the world. Among all information sharing mediums, email is the simplest,
cheapest, and the most rapid method of information sharing worldwide. But, due
to their simplicity, emails are vulnerable to different kinds of attacks, and the most
common and dangerous one is spam. No one wants to receive emails not related
to their interest because they waste receivers’ time and resources. Besides, these
emails can have malicious content hidden in the form of attachments or URLs that
may lead to the host system’s security breaches. Spam is any irrelevant and
unwanted message or email sent by the attacker to a significant number of
recipients by using emails or any other medium of information sharing. So, it
requires an immense demand for the security of the email system. Spam emails
may carry viruses, rats, and Trojans. Attackers mostly use this technique for luring
users towards online services. They may send spam emails that contain
attachments with the multiple-file extension, packed URLs that lead the user to
malicious and spamming websites and end up with some sort of data or financial
fraud and identify theft. Many email providers allow their users to make keywords
base rules that automatically filter emails. Still, this approach is not very useful
because it is difficult, and users do not want to customize their emails, due to which
spammers attack their email accounts.
In the last few decades, Internet of things (IoT) has become a part of modern life
and is growing rapidly. IoT has become an essential component of smart cities.
There are a lot of IoT-based social media platforms and applications. Due to the
emergence of IoT, spamming problems are increasing at a high rate. The
researchers proposed various spam detection methods to detect and filter spam
and spammers. Mainly, the existing spam detection methods are divided into two
types: behaviour pattern-based approaches and semantic pattern-based
approaches. These approaches have their limitations and drawbacks. There has
been significant growth in spam emails, along with the rise of the Internet and
communication around the globe . Spams are generated from any location of the
world with the Internet’s help by hiding the attacker’s identity. There are a plenty
of antispam tools and techniques, but the spam rate is still very high. The most
dangerous spams are malicious emails containing links to malicious websites that
can harm the victim’s data. Spam emails can also slow down the server response
by filling up the memory or capacity of servers. To accurately detect spam emails
and avoid the rising email spam issues, every organization carefully evaluates the
available tools to tackle spam in their environment. Some famous mechanisms to
identify and analyze the incoming emails for spam detection are Whitelist/Blacklist
, mail header analysis, keyword checking, etc.
Methodology
In the following sections, creating of dataset, training of learning models, and data
preprocessing are explained.
Data Preprocessing
In machine learning (ML), the preprocessing phrase refers to organizing and
managing of raw data before using it to train and test different learning models. In
simplistic words, preprocessing is a ML data mining approach that turns raw data
into a usable and resourceful structure.
The very first step in the construction of a ML model is preprocessing, in which data
from the actual world, typically incomplete, imprecise, and inaccurate owing to
flaws and deficient, is morphed into a precise, accurate, and usable input variables
and trends
Feature Extraction
Feature extraction is the process of converting a large raw dataset into a more
manageable format. Any variable, attribute, or class can be extracted from the
dataset during this step, depending on the original dataset .
Feature extraction is a crucial step in training of the model, which helps in
producing more reliable and accurate results. During the feature extraction
process, out of the possible many attributes, the method of selecting some key
variables that properly characterize data is called feature selection . The model is
then constructed using these selected attributes or variables. If feature selection is
performed properly, in return, the model construction will take less time.
Logistic Regression Model
Logistic regression is one of the most popular Machine Learning algorithms, which
comes under the Supervised Learning technique. It is used for predicting the
categorical dependent variable using a given set of independent variables. Logistic
regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1,
true or False, etc. but instead of giving the exact value as 0 and 1, it gives the
probabilistic values which lie between 0 and 1. Logistic Regression is much similar
to the Linear Regression except that how they are used.
Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems. In Logistic regression,
instead of fitting a regression line, we fit an "S" shaped logistic function, which
predicts two maximum values (0 or 1). The curve from the logistic function indicates
the likelihood of something such as whether the cells are cancerous or not, a mouse
is obese or not based on its weight, etc. Logistic Regression is a significant machine
learning algorithm because it has the ability to provide probabilities and classify
new data using continuous and discrete datasets.
Conclusion
In this study, we reviewed machine learning approaches and their application to
the field of spam filtering. A review of the state of the art algorithms been applied
for classification of messages as either spam or ham is provided. The evolution of
spam messages over the years to evade filters was examined. The basic
architecture of email spam filter and the processes involved in filtering spam emails
were looked into. The study uses machine learning algorithms to detect them. We
have to use Logistic Regression Model here. In the study, a translated emails
dataset including spam and ham emails is generated from Kaggle. Accuracy,
precision, F-measure, and model loss are used as comparative measures to
examine performance. In addition, more recent artificial intelligent approaches
may also be considered to detect spams.

Weitere ähnliche Inhalte

Ähnlich wie Spam Mail Prediction Report.docx

Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailText Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailijsrd.com
 
An analysis on Filter for Spam Mail
An analysis on Filter for Spam MailAn analysis on Filter for Spam Mail
An analysis on Filter for Spam MailAM Publications
 
Identification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingIdentification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingEditor IJCATR
 
Analysis of an image spam in email based on content analysis
Analysis of an image spam in email based on content analysisAnalysis of an image spam in email based on content analysis
Analysis of an image spam in email based on content analysisijnlc
 
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...IJNSA Journal
 
Classification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social NetworkClassification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social NetworkIRJET Journal
 
A review of spam filtering and measures of antispam
A review of spam filtering and measures of antispamA review of spam filtering and measures of antispam
A review of spam filtering and measures of antispamAlexander Decker
 
IRJET- Suspicious Email Detection System
IRJET- Suspicious Email Detection SystemIRJET- Suspicious Email Detection System
IRJET- Suspicious Email Detection SystemIRJET Journal
 
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...ijaia
 
Overview of Anti-spam filtering Techniques
Overview of Anti-spam filtering TechniquesOverview of Anti-spam filtering Techniques
Overview of Anti-spam filtering TechniquesIRJET Journal
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptxAnush90
 
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...IRJET Journal
 
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMEMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMIRJET Journal
 
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYING
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYINGA MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYING
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYINGijaia
 
A Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of CyberbullyingA Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of Cyberbullyinggerogepatton
 
A Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of CyberbullyingA Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of Cyberbullyinggerogepatton
 

Ähnlich wie Spam Mail Prediction Report.docx (20)

Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailText Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
 
Research Report
Research ReportResearch Report
Research Report
 
An analysis on Filter for Spam Mail
An analysis on Filter for Spam MailAn analysis on Filter for Spam Mail
An analysis on Filter for Spam Mail
 
ACO-email spam filtering
ACO-email spam filtering ACO-email spam filtering
ACO-email spam filtering
 
M dgx mde0mde=
M dgx mde0mde=M dgx mde0mde=
M dgx mde0mde=
 
Identification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using VotingIdentification of Spam Emails from Valid Emails by Using Voting
Identification of Spam Emails from Valid Emails by Using Voting
 
Analysis of an image spam in email based on content analysis
Analysis of an image spam in email based on content analysisAnalysis of an image spam in email based on content analysis
Analysis of an image spam in email based on content analysis
 
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
 
Classification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social NetworkClassification Methods for Spam Detection in Online Social Network
Classification Methods for Spam Detection in Online Social Network
 
A review of spam filtering and measures of antispam
A review of spam filtering and measures of antispamA review of spam filtering and measures of antispam
A review of spam filtering and measures of antispam
 
B0940509
B0940509B0940509
B0940509
 
IRJET- Suspicious Email Detection System
IRJET- Suspicious Email Detection SystemIRJET- Suspicious Email Detection System
IRJET- Suspicious Email Detection System
 
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
A DATA MINING APPROACH FOR FILTERING OUT SOCIAL SPAMMERS IN LARGE-SCALE TWITT...
 
Overview of Anti-spam filtering Techniques
Overview of Anti-spam filtering TechniquesOverview of Anti-spam filtering Techniques
Overview of Anti-spam filtering Techniques
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptx
 
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
 
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHMEMAIL SPAM DETECTION USING HYBRID ALGORITHM
EMAIL SPAM DETECTION USING HYBRID ALGORITHM
 
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYING
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYINGA MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYING
A MACHINE LEARNING ENSEMBLE MODEL FOR THE DETECTION OF CYBERBULLYING
 
A Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of CyberbullyingA Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of Cyberbullying
 
A Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of CyberbullyingA Machine Learning Ensemble Model for the Detection of Cyberbullying
A Machine Learning Ensemble Model for the Detection of Cyberbullying
 

Mehr von Shubham Jaybhaye

Stochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptxStochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptxShubham Jaybhaye
 
YOLO ( You Only Look Once) Deep Learning.pptx
YOLO ( You Only Look Once) Deep Learning.pptxYOLO ( You Only Look Once) Deep Learning.pptx
YOLO ( You Only Look Once) Deep Learning.pptxShubham Jaybhaye
 
Banking Management System Report .docx
Banking Management System Report .docxBanking Management System Report .docx
Banking Management System Report .docxShubham Jaybhaye
 
Naïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptxNaïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptxShubham Jaybhaye
 

Mehr von Shubham Jaybhaye (6)

Stochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptxStochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptx
 
YOLO ( You Only Look Once) Deep Learning.pptx
YOLO ( You Only Look Once) Deep Learning.pptxYOLO ( You Only Look Once) Deep Learning.pptx
YOLO ( You Only Look Once) Deep Learning.pptx
 
Banking Management System Report .docx
Banking Management System Report .docxBanking Management System Report .docx
Banking Management System Report .docx
 
WEB Scraping.pptx
WEB Scraping.pptxWEB Scraping.pptx
WEB Scraping.pptx
 
Geopandas.pptx
Geopandas.pptxGeopandas.pptx
Geopandas.pptx
 
Naïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptxNaïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptx
 

Kürzlich hochgeladen

Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfEmmanuel Dauda
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyRafigAliyev2
 
Data analytics courses in Nepal Presentation
Data analytics courses in Nepal PresentationData analytics courses in Nepal Presentation
Data analytics courses in Nepal Presentationanshikakulshreshtha11
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonPayment Village
 

Kürzlich hochgeladen (20)

Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
Data analytics courses in Nepal Presentation
Data analytics courses in Nepal PresentationData analytics courses in Nepal Presentation
Data analytics courses in Nepal Presentation
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 

Spam Mail Prediction Report.docx

  • 1. Spam Mail Prediction Introduction In the era of information technology, information sharing has become very easy and fast. Many platforms are available for users to share information anywhere across the world. Among all information sharing mediums, email is the simplest, cheapest, and the most rapid method of information sharing worldwide. But, due to their simplicity, emails are vulnerable to different kinds of attacks, and the most common and dangerous one is spam. No one wants to receive emails not related to their interest because they waste receivers’ time and resources. Besides, these emails can have malicious content hidden in the form of attachments or URLs that may lead to the host system’s security breaches. Spam is any irrelevant and unwanted message or email sent by the attacker to a significant number of recipients by using emails or any other medium of information sharing. So, it requires an immense demand for the security of the email system. Spam emails may carry viruses, rats, and Trojans. Attackers mostly use this technique for luring users towards online services. They may send spam emails that contain attachments with the multiple-file extension, packed URLs that lead the user to malicious and spamming websites and end up with some sort of data or financial fraud and identify theft. Many email providers allow their users to make keywords base rules that automatically filter emails. Still, this approach is not very useful because it is difficult, and users do not want to customize their emails, due to which spammers attack their email accounts. In the last few decades, Internet of things (IoT) has become a part of modern life and is growing rapidly. IoT has become an essential component of smart cities. There are a lot of IoT-based social media platforms and applications. Due to the emergence of IoT, spamming problems are increasing at a high rate. The researchers proposed various spam detection methods to detect and filter spam and spammers. Mainly, the existing spam detection methods are divided into two
  • 2. types: behaviour pattern-based approaches and semantic pattern-based approaches. These approaches have their limitations and drawbacks. There has been significant growth in spam emails, along with the rise of the Internet and communication around the globe . Spams are generated from any location of the world with the Internet’s help by hiding the attacker’s identity. There are a plenty of antispam tools and techniques, but the spam rate is still very high. The most dangerous spams are malicious emails containing links to malicious websites that can harm the victim’s data. Spam emails can also slow down the server response by filling up the memory or capacity of servers. To accurately detect spam emails and avoid the rising email spam issues, every organization carefully evaluates the available tools to tackle spam in their environment. Some famous mechanisms to identify and analyze the incoming emails for spam detection are Whitelist/Blacklist , mail header analysis, keyword checking, etc. Methodology In the following sections, creating of dataset, training of learning models, and data preprocessing are explained. Data Preprocessing In machine learning (ML), the preprocessing phrase refers to organizing and managing of raw data before using it to train and test different learning models. In simplistic words, preprocessing is a ML data mining approach that turns raw data into a usable and resourceful structure. The very first step in the construction of a ML model is preprocessing, in which data from the actual world, typically incomplete, imprecise, and inaccurate owing to flaws and deficient, is morphed into a precise, accurate, and usable input variables and trends
  • 3. Feature Extraction Feature extraction is the process of converting a large raw dataset into a more manageable format. Any variable, attribute, or class can be extracted from the dataset during this step, depending on the original dataset . Feature extraction is a crucial step in training of the model, which helps in producing more reliable and accurate results. During the feature extraction process, out of the possible many attributes, the method of selecting some key variables that properly characterize data is called feature selection . The model is then constructed using these selected attributes or variables. If feature selection is performed properly, in return, the model construction will take less time. Logistic Regression Model Logistic regression is one of the most popular Machine Learning algorithms, which comes under the Supervised Learning technique. It is used for predicting the categorical dependent variable using a given set of independent variables. Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values which lie between 0 and 1. Logistic Regression is much similar to the Linear Regression except that how they are used. Linear Regression is used for solving Regression problems, whereas Logistic regression is used for solving the classification problems. In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function, which predicts two maximum values (0 or 1). The curve from the logistic function indicates the likelihood of something such as whether the cells are cancerous or not, a mouse is obese or not based on its weight, etc. Logistic Regression is a significant machine learning algorithm because it has the ability to provide probabilities and classify new data using continuous and discrete datasets. Conclusion
  • 4. In this study, we reviewed machine learning approaches and their application to the field of spam filtering. A review of the state of the art algorithms been applied for classification of messages as either spam or ham is provided. The evolution of spam messages over the years to evade filters was examined. The basic architecture of email spam filter and the processes involved in filtering spam emails were looked into. The study uses machine learning algorithms to detect them. We have to use Logistic Regression Model here. In the study, a translated emails dataset including spam and ham emails is generated from Kaggle. Accuracy, precision, F-measure, and model loss are used as comparative measures to examine performance. In addition, more recent artificial intelligent approaches may also be considered to detect spams.