SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Some (Non-)Universal Features
of Web Robot Traffic
Presentation by: Mahdieh Zabihimayvan
Advisor: Dr. Derek Doran
Department of Computer Science and Engineering, Kno.e.sis Research Center, Wright State
University Dayton, OH
Presentation outline:
www.knoesis.org/mahdieh 2
 Introduction
 Related work
 Proposed method
 Experiments
 Conclusion
 Future work
What is Web robot?
www.knoesis.org/mahdieh 3
Great numbers of modern Web-based technologies and services are required to study,
analyze, and collect information from massive web repositories.
Web robots (also called Web crawlers) are employed by such technologies and
services to collect and scrutinize the dynamic content repositories contain.
What is Web robot? (cont…)
www.knoesis.org/mahdieh 4
49.5% 60%
% of Web robot
requests on Web
servers
But why?
To keep the data repositories up-to-date, contemporary Web robots need more
comprehensive searches, more specialized functionality, and more frequent visits.
www.knoesis.org/mahdieh 5
What is Web robot? (cont…)
Benign Web robots carry out
useful tasks including:
• Web content archiving
• link and HTML validation
• search engine indexing
• website mirroring
Malicious Web robots pose a threat to
the performance, privacy of
information, and security of Web
servers. For instance:
• harvesting e-mail addresses,
• performing click fraud,
• accessing information behind ‘pay-
walls’ or login screens
www.knoesis.org/mahdieh 6
• enable researchers to discover and compare the strategies different
robots utilize in their navigation
• improve methods to distinguish between malicious and benign web
robots
• enable synthetic robot workloads for simulation studies to evaluate
the capacity of a Web system
Why should we characterize web robot traffic?
www.knoesis.org/mahdieh 7
• Dikaiakos et al. (2005): analyzing the activity of different robots belonging to Google,
AltaVista, Inktomi, and FastSearch, and CiteSeer
• D. Doran and S. S. Gokhale (2010): examining in more detail heavy-tailed trends in
Web robot traffic of a single Web server
• Calzarossa and Massari (2013): analyzing the properties of the traffic generated by
some commercial Web robots
• Calzarossa and Massari (2013): characterizing the access patterns and navigation
profiles of the clients of two Web servers
• Tan and Kumar (2002): proposing 26 features to distinguish between Web robots and
human users
Related work on robot traffic characterization
www.knoesis.org/mahdieh 8
• Most past studies examine traffic at a single Web server
 Why this is not good
• Present understanding is based on studying a limited, selected number of Web robots
 Why this is not good
• Major studies were carried out at least a half decade ago
 Why this is not good
Limitations of our current understanding
www.knoesis.org/mahdieh 9
We seek to update our understanding of web robot traffic
Study design:
This Work
Data set name # of requests # of sessions
Avg. session length
(Sec)
Avg. # of requests
per session
WSU 5,232,765 25,680 551.15 97
Pav 115,211 7,756 397.83 15
IR 749,278 39,200 94.8 10
www.knoesis.org/mahdieh 10
Sample Features
Feature Name Description
Behavioral Features
%HEAD % of requests using HEAD
%GET % of requests using GET
%POST % of requests using POST
%4XX % of requests receiving 4XX in response
%SF-StatusCode % of switching factor of status code
%SF-HttpMethod % of HTTP methods used in requests
Session Features
#Requests The number of HTTP requests sent
Session time Time difference between the first and last requests
%Night % of requests sent between 12 p.m. and 7 a.m.
%Day % of requests sent between 7 a.m. and 11:59 p.m.
Data Sum of data requested
www.knoesis.org/mahdieh 11
Characterizing web robot traffic
1. We consider a collection of feasible distributions that may characterize different features of web robot traffic.
Distributions are chosen from those that are:
• Distributions with discrete or continuous support
• Symmetric distribution (the mean, median, and mode occur at the same point)
• Asymmetric distribution (the possibility of heavy- and long-tailed trends)
Description Distributions
Discrete support
Binomial, Geometric, Poisson, Discrete
uniform
Infinite, continuous support/Symmetric
Logistic
Normal, Continuous uniform, Gaussian q,
Bimodal
Infinite, continuous support/Asymmetric
Lognormal, Exponential Extreme value,
Gamma, Generalized extreme value,
Weibull, Tlocation-scale, Generalized
Pareto
www.knoesis.org/mahdieh 12
Characterizing Web Robot Traffic
2. Using maximum likelihood estimations to identify the parameters for each
candidate distributions
3. Employing Vuong’s closeness test to evaluate whether one distribution is a
superior fit of the data to another, for all pairs of distributions
www.knoesis.org/mahdieh 13
Vuong’s closeness test
www.knoesis.org/mahdieh 14
Universal Web robot features
Intriguingly, many features of robot traffic follow identical distributions around the world
Distributi
on name
Feature name
GP
Session time, %Night, %Day, %NullReferrer,
#Requests, %HEAD, %GET, %304, %CSR,
%Others
GEV
%Images, %BinaryDocs, %Multimedia,
HTML/Image, %SF-FileType, %SF-csbytes,
%SF-referrer
GEV: Generalized Extreme Value
GP: Generalized Pareto
www.knoesis.org/mahdieh 15
Non-Universal Web robot features
Yet many features follow different types of distributions depending on the web server
Feature name
Distribution name
WSU Pav IR
Data TLS GEV GEV
SD_RPD GP GP LGC
%POST GP GP TLS
%4XX GP GP GEV
%2XX GP GP GEV
%SF-StatusCode GP GP TLS
%SF-HttpMethod GP GP TLS
%Compressed GEV TLS GP
%Exe GP TLS LGC
%RD GP LGC GEV
TLS: T-location Scale
GEV: Generalized Extreme Value
GP: Generalized Pareto
LGC: Logistic
www.knoesis.org/mahdieh 16
Request Type Behaviors
We also note non-uniform request type patterns across the three web servers
Investigating the difference in %POST among three data sets in more details:
• Plot: Markov chains of %POST Examining the http method codes used by Web robots on each server.
WSUPavIR
www.knoesis.org/mahdieh 17
Request Type Behaviors
Universal features:
1. Self-loops of HEAD and GET and transitions between these states are approximately similar, as
expected by robots that simply request information.
2. A small but appreciable number of transitions from HEAD (on all data sets) and GET (except for
IR) to POST exist.
• It is surprising to find robots submitting POST requests, which are used to submit resources to
a Web server.
• Robots are more likely to make a HEAD following a POST request to get information about
other resources before requesting them.
Non-Universal features:
In IR, there are significantly lower transition probabilities from POST to POST. One reason can be
attributed to security policies enforced by this university against some known robots who intend to
submit malicious resources on the server.
www.knoesis.org/mahdieh 18
Summary of Key Findings
• Characterize 30 different features of Web robot traffic across three Web servers
around the world.
• Conducted the experiments on three large data sets from three different countries.
• Finding some features which show similar heavy-tailed models and may well be
universal across all Web robot traffic
• Finding some differences among the Web robots of the data sets
www.knoesis.org/mahdieh 19
Future Work
• Exploring the theoretical implications of the similar and dissimilar features
considered in this paper
• Investigating the intuitive arguments behind the contrast in Web robot traffic
• Extend this study to characterize two categories of benign and malicious Web
robots which can be very useful in detection of malicious Web robots and enhance
the security of Web servers
• Conducting similar characterization study on human Web traffic
Thank you for your attention!
www.knoesis.org/mahdieh 20

Weitere ähnliche Inhalte

Ähnlich wie Presentation mz

AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN S...
AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN S...AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN S...
AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN S...ijwscjournal
 
AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN S...
AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN S...AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN S...
AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN S...ijwscjournal
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Sekhon final 1_ppt
Sekhon final 1_pptSekhon final 1_ppt
Sekhon final 1_pptManant Sweet
 
A Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyA Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyIOSR Journals
 
Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)Denis Shestakov
 
Pdd crawler a focused web
Pdd crawler  a focused webPdd crawler  a focused web
Pdd crawler a focused webcsandit
 
SPEEDING UP THE WEB CRAWLING PROCESS ON A MULTI-CORE PROCESSOR USING VIRTUALI...
SPEEDING UP THE WEB CRAWLING PROCESS ON A MULTI-CORE PROCESSOR USING VIRTUALI...SPEEDING UP THE WEB CRAWLING PROCESS ON A MULTI-CORE PROCESSOR USING VIRTUALI...
SPEEDING UP THE WEB CRAWLING PROCESS ON A MULTI-CORE PROCESSOR USING VIRTUALI...ijwscjournal
 
[LvDuit//Lab] Crawling the web
[LvDuit//Lab] Crawling the web[LvDuit//Lab] Crawling the web
[LvDuit//Lab] Crawling the webVan-Duyet Le
 
Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)OUM SAOKOSAL
 
HIGWGET-A Model for Crawling Secure Hidden WebPages
HIGWGET-A Model for Crawling Secure Hidden WebPagesHIGWGET-A Model for Crawling Secure Hidden WebPages
HIGWGET-A Model for Crawling Secure Hidden WebPagesijdkp
 
Avtar's ppt
Avtar's pptAvtar's ppt
Avtar's pptmak57
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawlervinay arora
 
User Navigation Pattern Prediction from Web Log Data: A Survey
User Navigation Pattern Prediction from Web Log Data:  A SurveyUser Navigation Pattern Prediction from Web Log Data:  A Survey
User Navigation Pattern Prediction from Web Log Data: A SurveyIJMER
 

Ähnlich wie Presentation mz (20)

AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN S...
AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN S...AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN S...
AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN S...
 
AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN S...
AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN S...AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN S...
AN EXTENDED MODEL FOR EFFECTIVE MIGRATING PARALLEL WEB CRAWLING WITH DOMAIN S...
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Sekhon final 1_ppt
Sekhon final 1_pptSekhon final 1_ppt
Sekhon final 1_ppt
 
A Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET TechnologyA Novel Interface to a Web Crawler using VB.NET Technology
A Novel Interface to a Web Crawler using VB.NET Technology
 
Web crawler
Web crawlerWeb crawler
Web crawler
 
Seminar on crawler
Seminar on crawlerSeminar on crawler
Seminar on crawler
 
Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)Intelligent Web Crawling (WI-IAT 2013 Tutorial)
Intelligent Web Crawling (WI-IAT 2013 Tutorial)
 
webcrawler.pptx
webcrawler.pptxwebcrawler.pptx
webcrawler.pptx
 
E3602042044
E3602042044E3602042044
E3602042044
 
Pdd crawler a focused web
Pdd crawler  a focused webPdd crawler  a focused web
Pdd crawler a focused web
 
SPEEDING UP THE WEB CRAWLING PROCESS ON A MULTI-CORE PROCESSOR USING VIRTUALI...
SPEEDING UP THE WEB CRAWLING PROCESS ON A MULTI-CORE PROCESSOR USING VIRTUALI...SPEEDING UP THE WEB CRAWLING PROCESS ON A MULTI-CORE PROCESSOR USING VIRTUALI...
SPEEDING UP THE WEB CRAWLING PROCESS ON A MULTI-CORE PROCESSOR USING VIRTUALI...
 
[LvDuit//Lab] Crawling the web
[LvDuit//Lab] Crawling the web[LvDuit//Lab] Crawling the web
[LvDuit//Lab] Crawling the web
 
HitBand: A Prefetching Model to Increase Hit Rate and Reduce Bandwidth Consum...
HitBand: A Prefetching Model to Increase Hit Rate and Reduce Bandwidth Consum...HitBand: A Prefetching Model to Increase Hit Rate and Reduce Bandwidth Consum...
HitBand: A Prefetching Model to Increase Hit Rate and Reduce Bandwidth Consum...
 
Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)Data preparation for mining world wide web browsing patterns (1999)
Data preparation for mining world wide web browsing patterns (1999)
 
A density based clustering approach for web robot detection
A density based clustering approach for web robot detectionA density based clustering approach for web robot detection
A density based clustering approach for web robot detection
 
HIGWGET-A Model for Crawling Secure Hidden WebPages
HIGWGET-A Model for Crawling Secure Hidden WebPagesHIGWGET-A Model for Crawling Secure Hidden WebPages
HIGWGET-A Model for Crawling Secure Hidden WebPages
 
Avtar's ppt
Avtar's pptAvtar's ppt
Avtar's ppt
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawler
 
User Navigation Pattern Prediction from Web Log Data: A Survey
User Navigation Pattern Prediction from Web Log Data:  A SurveyUser Navigation Pattern Prediction from Web Log Data:  A Survey
User Navigation Pattern Prediction from Web Log Data: A Survey
 

Kürzlich hochgeladen

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 

Kürzlich hochgeladen (20)

SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 

Presentation mz

  • 1. Some (Non-)Universal Features of Web Robot Traffic Presentation by: Mahdieh Zabihimayvan Advisor: Dr. Derek Doran Department of Computer Science and Engineering, Kno.e.sis Research Center, Wright State University Dayton, OH
  • 2. Presentation outline: www.knoesis.org/mahdieh 2  Introduction  Related work  Proposed method  Experiments  Conclusion  Future work
  • 3. What is Web robot? www.knoesis.org/mahdieh 3 Great numbers of modern Web-based technologies and services are required to study, analyze, and collect information from massive web repositories. Web robots (also called Web crawlers) are employed by such technologies and services to collect and scrutinize the dynamic content repositories contain.
  • 4. What is Web robot? (cont…) www.knoesis.org/mahdieh 4 49.5% 60% % of Web robot requests on Web servers But why? To keep the data repositories up-to-date, contemporary Web robots need more comprehensive searches, more specialized functionality, and more frequent visits.
  • 5. www.knoesis.org/mahdieh 5 What is Web robot? (cont…) Benign Web robots carry out useful tasks including: • Web content archiving • link and HTML validation • search engine indexing • website mirroring Malicious Web robots pose a threat to the performance, privacy of information, and security of Web servers. For instance: • harvesting e-mail addresses, • performing click fraud, • accessing information behind ‘pay- walls’ or login screens
  • 6. www.knoesis.org/mahdieh 6 • enable researchers to discover and compare the strategies different robots utilize in their navigation • improve methods to distinguish between malicious and benign web robots • enable synthetic robot workloads for simulation studies to evaluate the capacity of a Web system Why should we characterize web robot traffic?
  • 7. www.knoesis.org/mahdieh 7 • Dikaiakos et al. (2005): analyzing the activity of different robots belonging to Google, AltaVista, Inktomi, and FastSearch, and CiteSeer • D. Doran and S. S. Gokhale (2010): examining in more detail heavy-tailed trends in Web robot traffic of a single Web server • Calzarossa and Massari (2013): analyzing the properties of the traffic generated by some commercial Web robots • Calzarossa and Massari (2013): characterizing the access patterns and navigation profiles of the clients of two Web servers • Tan and Kumar (2002): proposing 26 features to distinguish between Web robots and human users Related work on robot traffic characterization
  • 8. www.knoesis.org/mahdieh 8 • Most past studies examine traffic at a single Web server  Why this is not good • Present understanding is based on studying a limited, selected number of Web robots  Why this is not good • Major studies were carried out at least a half decade ago  Why this is not good Limitations of our current understanding
  • 9. www.knoesis.org/mahdieh 9 We seek to update our understanding of web robot traffic Study design: This Work Data set name # of requests # of sessions Avg. session length (Sec) Avg. # of requests per session WSU 5,232,765 25,680 551.15 97 Pav 115,211 7,756 397.83 15 IR 749,278 39,200 94.8 10
  • 10. www.knoesis.org/mahdieh 10 Sample Features Feature Name Description Behavioral Features %HEAD % of requests using HEAD %GET % of requests using GET %POST % of requests using POST %4XX % of requests receiving 4XX in response %SF-StatusCode % of switching factor of status code %SF-HttpMethod % of HTTP methods used in requests Session Features #Requests The number of HTTP requests sent Session time Time difference between the first and last requests %Night % of requests sent between 12 p.m. and 7 a.m. %Day % of requests sent between 7 a.m. and 11:59 p.m. Data Sum of data requested
  • 11. www.knoesis.org/mahdieh 11 Characterizing web robot traffic 1. We consider a collection of feasible distributions that may characterize different features of web robot traffic. Distributions are chosen from those that are: • Distributions with discrete or continuous support • Symmetric distribution (the mean, median, and mode occur at the same point) • Asymmetric distribution (the possibility of heavy- and long-tailed trends) Description Distributions Discrete support Binomial, Geometric, Poisson, Discrete uniform Infinite, continuous support/Symmetric Logistic Normal, Continuous uniform, Gaussian q, Bimodal Infinite, continuous support/Asymmetric Lognormal, Exponential Extreme value, Gamma, Generalized extreme value, Weibull, Tlocation-scale, Generalized Pareto
  • 12. www.knoesis.org/mahdieh 12 Characterizing Web Robot Traffic 2. Using maximum likelihood estimations to identify the parameters for each candidate distributions 3. Employing Vuong’s closeness test to evaluate whether one distribution is a superior fit of the data to another, for all pairs of distributions
  • 14. www.knoesis.org/mahdieh 14 Universal Web robot features Intriguingly, many features of robot traffic follow identical distributions around the world Distributi on name Feature name GP Session time, %Night, %Day, %NullReferrer, #Requests, %HEAD, %GET, %304, %CSR, %Others GEV %Images, %BinaryDocs, %Multimedia, HTML/Image, %SF-FileType, %SF-csbytes, %SF-referrer GEV: Generalized Extreme Value GP: Generalized Pareto
  • 15. www.knoesis.org/mahdieh 15 Non-Universal Web robot features Yet many features follow different types of distributions depending on the web server Feature name Distribution name WSU Pav IR Data TLS GEV GEV SD_RPD GP GP LGC %POST GP GP TLS %4XX GP GP GEV %2XX GP GP GEV %SF-StatusCode GP GP TLS %SF-HttpMethod GP GP TLS %Compressed GEV TLS GP %Exe GP TLS LGC %RD GP LGC GEV TLS: T-location Scale GEV: Generalized Extreme Value GP: Generalized Pareto LGC: Logistic
  • 16. www.knoesis.org/mahdieh 16 Request Type Behaviors We also note non-uniform request type patterns across the three web servers Investigating the difference in %POST among three data sets in more details: • Plot: Markov chains of %POST Examining the http method codes used by Web robots on each server. WSUPavIR
  • 17. www.knoesis.org/mahdieh 17 Request Type Behaviors Universal features: 1. Self-loops of HEAD and GET and transitions between these states are approximately similar, as expected by robots that simply request information. 2. A small but appreciable number of transitions from HEAD (on all data sets) and GET (except for IR) to POST exist. • It is surprising to find robots submitting POST requests, which are used to submit resources to a Web server. • Robots are more likely to make a HEAD following a POST request to get information about other resources before requesting them. Non-Universal features: In IR, there are significantly lower transition probabilities from POST to POST. One reason can be attributed to security policies enforced by this university against some known robots who intend to submit malicious resources on the server.
  • 18. www.knoesis.org/mahdieh 18 Summary of Key Findings • Characterize 30 different features of Web robot traffic across three Web servers around the world. • Conducted the experiments on three large data sets from three different countries. • Finding some features which show similar heavy-tailed models and may well be universal across all Web robot traffic • Finding some differences among the Web robots of the data sets
  • 19. www.knoesis.org/mahdieh 19 Future Work • Exploring the theoretical implications of the similar and dissimilar features considered in this paper • Investigating the intuitive arguments behind the contrast in Web robot traffic • Extend this study to characterize two categories of benign and malicious Web robots which can be very useful in detection of malicious Web robots and enhance the security of Web servers • Conducting similar characterization study on human Web traffic
  • 20. Thank you for your attention! www.knoesis.org/mahdieh 20

Hinweis der Redaktion

  1. And it suggests that Web robot behaviors may have remarkably changed since characteristics were last studied
  2. symmetric distributions, in which the values of variables occur at regular frequencies, and the mean, median, and mode occur at the same point, to investigate models where most outcomes are clustered relatively close to the distribution's center. We also scrutinize asymmetric or skewed distributions to examine the possibility of heavy- and long-tailed trends in a feature. For example, the Lognormal, Gamma, and Weibull distribution features parameters that can control the skewness of the distribution.