SlideShare ist ein Scribd-Unternehmen logo
1 von 3
Downloaden Sie, um offline zu lesen
Assignment 6 
 
Analyzing response time in Q&A websites 
 
Question and Answer (Q&A) sites like StackOverflow, Yahoo! Answers, Naver, Quora,                     
LiveQnA, WikiAnswers etc. are becoming increasingly popular with the growth of the Web.                         
These are large collaborative production and social computing platforms of the Web, aimed at                           
crowd­sourcing knowledge by allowing users to post and answer questions. They not only                         
provide a platform for experts to share their knowledge and get identified but also help novice                               
users solve their problems effectively. StackOverflow is one such community­driven Q&A                     
website used by more than a million software developers who post and answer questions                           
related to computer programming. It is governed by a reputation system which rewards the                           
users by giving reputation points, badges, extra privileges on the website, etc. by the usefulness                             
of their posts. The usefulness of a question or an answer is largely determined by the number of                                   
votes it receives. 
 
In such a crowd­sourced system driven by a reputation mechanism, response time of                         
questions to receive the first answer plays an important role and would largely determine the                             
popularity of the website. People who post questions would want to know the time by which they                                 
can expect a response to their question. In this assignment, we want to investigate whether                             
besides several other factors, tags of a question have strong correlation with response time.                           
Tagging questions involves askers selecting appropriate keywords (e.g., android, jquery, c#) to                       
broadly identify the domains to which their questions are related. There also exist mechanisms                           
by which other users can subscribe to tags, search via tags, mark tags as favorites, etc. As a                                   
result, tags should play a crucial role in how the questions are answered and hence determining                               
their response time. 
 
Input Dataset: 
 
http://gaming.stackexchange.com/ 
(Dataset­ https://archive.org/download/stackexchange/gaming.stackexchange.com.7z) is     
a sister site of StackOverflow where questions related to Gaming are discussed. We have                           
attached the datadump of the website till 26th September, 2014. Download and Unzip the                           
dataset and you will find the following files 
● Badges.xml 
● Comments.xml 
● PostHistory.xml 
● PostLinks.xml 
● Posts.xml 
● Tags.xml 
● Users.xml 
● Votes.xml 
 
Information about all the posts (questions and answers) and tags can be found in “Posts.xml”                             
and “Tags.xml” files respectively. Examples from each of the files are given below. 
 
Typical Question 
 
<row Id="7" PostTypeId="1" AcceptedAnswerId="10" CreationDate="2014­05­14T00:11:06.457" 
Score="1" ViewCount="185" Body="&lt;p&gt;As a researcher and instructor, I'm looking for 
open­source books (or similar materials) that provide a relatively thorough overview of data 
science from an applied perspective. To be clear, I'm especially interested in a thorough 
overview that provides material suitable for a college­level course, not particular pieces or 
papers.&lt;/p&gt;&#xA;" OwnerUserId="36" LastEditorUserId="97" 
LastEditDate="2014­05­16T13:45:00.237" LastActivityDate="2014­05­16T13:45:00.237" 
Title="What open­source books (or other materials) provide a relatively thorough overview of 
data science?" Tags="&lt;education&gt;&lt;open­source&gt;" AnswerCount="3" 
CommentCount="4" FavoriteCount="1" ClosedDate="2014­05­14T08:40:54.950" ></row> 
  
Typical Answer 
 
<row Id="10" PostTypeId="2" ParentId="7" CreationDate="2014­05­14T00:53:43.273" Score="8" 
Body="&lt;p&gt;One book that's freely available is &quot;The Elements of Statistical 
Learning&quot; by Hastie, Tibshirani, and Friedman (published by Springer): &lt;a 
href=&quot;http://statweb.stanford.edu/~tibs/ElemStatLearn/&quot;&gt;see Tibshirani's 
website&lt;/a&gt;.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;Another fantastic source, although it isn't a book, 
is Andrew Ng's Machine Learning course on Coursera. This has a much more applied­focus 
than the above book, and Prof. Ng does a great job of explaining the thinking behind several 
different machine learning algorithms/situations.&lt;/p&gt;&#xA;" OwnerUserId="22" 
LastActivityDate="2014­05­14T00:53:43.273" CommentCount="1" /> 
  
Typical Tag 
 
<row Id="3" TagName="bigdata" Count="46" ExcerptPostId="66" WikiPostId="65" /> 
  
Output Deliverables: 
 
A. Feature Calculation 
You should use Java to parse these xml files and for each question, calculate the                             
response time and the following tag based features: 
 
1. tag_popularity: We define popularity of a tag t as its frequency, i.e., the number of                             
questions that contains t as one of its tags. For each question, you should compute the                               
average popularity of all its tags.  
2. num_pop_tags: We consider a tag to be popular if its frequency is more than 20. Here                               
you should count the number of popular tags each question contains. There will be atmost                             
6 boxes in plot as each question can contain at max 5 tags. 
3. num_subs_ans: We define an “active subscriber” of a tag t to be a user who has posted                                 
“sufficient” answers in the “recent past” to questions containing t. We say that a user has                               
posted “sufficient” answers when the number of their answers is greater than 5 and by                             
“recent past” we mean answers posted after 7th Jan 2014. After computing the number                           
of active subscribers for every tag, you should compute the average number of active                           
subscribers for individual tags in each question. 
4. percent_subs_ans: For each tag, you should also compute the ratio of the number of                           
“active subscribers” to the total number of subscribers, where the total number of                         
subscribers indicates the number of users who have posted at least one answer to a                             
question containing a particular tag. After computing the ratio for every tag, you should                           
compute the average ratio for individual tags in each question. 
 
B. Feature Analysis 
To analyze the question features and their correlation with response time, you should                         
construct plots of the response time against the values of different features. You should                           
distribute the feature values into ten equal bins and then use gnuplot to produce the following                               
two plots: 
1. Box plots that capture the median, 25% and 75% of the response time distributions, as                             
well as the minimum and maximum values, and  
2. Cumulative distribution function (CDF) plots of the response time. 

Weitere ähnliche Inhalte

Was ist angesagt?

Exploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, LucidworksExploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, LucidworksLucidworks
 
Final Presentation
Final PresentationFinal Presentation
Final PresentationLove Tyagi
 
Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)Andrea Gazzarini
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify RaisAjay Ohri
 
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationAlessandro Benedetti
 
Presentation1.pdf
Presentation1.pdfPresentation1.pdf
Presentation1.pdfZixunZhou
 
Entity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph EmbeddingsEntity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph EmbeddingsSease
 
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachSearch Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachAlessandro Benedetti
 
Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveAndrea Gazzarini
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
 
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @ChorusRated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @ChorusSease
 
K anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseK anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseLeMeniz Infotech
 

Was ist angesagt? (20)

Mdb dn 2016_05_index_tuning
Mdb dn 2016_05_index_tuningMdb dn 2016_05_index_tuning
Mdb dn 2016_05_index_tuning
 
Exploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, LucidworksExploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, Lucidworks
 
BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1
 
BDACA1617s2 - Lecture 2
BDACA1617s2 - Lecture 2BDACA1617s2 - Lecture 2
BDACA1617s2 - Lecture 2
 
BDACA1617s2 - Lecture4
BDACA1617s2 - Lecture4BDACA1617s2 - Lecture4
BDACA1617s2 - Lecture4
 
Final Presentation
Final PresentationFinal Presentation
Final Presentation
 
Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)Rated Ranking Evaluator (FOSDEM 2019)
Rated Ranking Evaluator (FOSDEM 2019)
 
Twitter analysis by Kaify Rais
Twitter analysis by Kaify RaisTwitter analysis by Kaify Rais
Twitter analysis by Kaify Rais
 
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
 
BDACA1516s2 - Lecture3
BDACA1516s2 - Lecture3BDACA1516s2 - Lecture3
BDACA1516s2 - Lecture3
 
Presentation1.pdf
Presentation1.pdfPresentation1.pdf
Presentation1.pdf
 
BDACA1516s2 - Lecture5
BDACA1516s2 - Lecture5BDACA1516s2 - Lecture5
BDACA1516s2 - Lecture5
 
Entity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph EmbeddingsEntity Search on Virtual Documents Created with Graph Embeddings
Entity Search on Virtual Documents Created with Graph Embeddings
 
In3415791583
In3415791583In3415791583
In3415791583
 
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachSearch Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
 
Search Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer PerspectiveSearch Quality Evaluation: a Developer Perspective
Search Quality Evaluation: a Developer Perspective
 
BDACA1516s2 - Lecture2
BDACA1516s2 - Lecture2BDACA1516s2 - Lecture2
BDACA1516s2 - Lecture2
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
 
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @ChorusRated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
 
K anonymity for crowdsourcing database
K anonymity for crowdsourcing databaseK anonymity for crowdsourcing database
K anonymity for crowdsourcing database
 

Ähnlich wie Analyzing Stack Overflow - Problem

Prediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksMohamed El-Geish
 
IRJET - Social Network Question and Answer System
IRJET - Social Network Question and Answer SystemIRJET - Social Network Question and Answer System
IRJET - Social Network Question and Answer SystemIRJET Journal
 
Narrative Mind Lessons Learned
Narrative Mind Lessons LearnedNarrative Mind Lessons Learned
Narrative Mind Lessons LearnedH4Diadmin
 
Narrative Mind Lessons Learned H4D Stanford 2016
Narrative Mind Lessons Learned H4D Stanford 2016Narrative Mind Lessons Learned H4D Stanford 2016
Narrative Mind Lessons Learned H4D Stanford 2016Stanford University
 
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETSQUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETSkevig
 
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETSQUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETSijnlc
 
WMNST 382 Gender, Science and TechnologySpring 2015Guidelines.docx
WMNST 382 Gender, Science and TechnologySpring 2015Guidelines.docxWMNST 382 Gender, Science and TechnologySpring 2015Guidelines.docx
WMNST 382 Gender, Science and TechnologySpring 2015Guidelines.docxericbrooks84875
 
Online Question and Answers Resources for the Bioinformatics Community
Online Question and Answers Resources for the Bioinformatics CommunityOnline Question and Answers Resources for the Bioinformatics Community
Online Question and Answers Resources for the Bioinformatics CommunityHoffman Lab
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Gabriel Moreira
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics DomainDrjabez
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamDoug Needham
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLPGVS Chaitanya
 
Focused Question and Answer for Job Portal
Focused Question and Answer for Job PortalFocused Question and Answer for Job Portal
Focused Question and Answer for Job PortalIRJET Journal
 
Reasesrty djhjan S - explanation required.pptx
Reasesrty djhjan S - explanation required.pptxReasesrty djhjan S - explanation required.pptx
Reasesrty djhjan S - explanation required.pptxAnkitaVerma776806
 
quizapplication-190520075937.pdf
quizapplication-190520075937.pdfquizapplication-190520075937.pdf
quizapplication-190520075937.pdfshubham504451
 
Quiz application
Quiz applicationQuiz application
Quiz applicationHarsh Verma
 

Ähnlich wie Analyzing Stack Overflow - Problem (20)

Report
ReportReport
Report
 
Prediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social Networks
 
IRJET - Social Network Question and Answer System
IRJET - Social Network Question and Answer SystemIRJET - Social Network Question and Answer System
IRJET - Social Network Question and Answer System
 
C017411728
C017411728C017411728
C017411728
 
Narrative Mind Lessons Learned
Narrative Mind Lessons LearnedNarrative Mind Lessons Learned
Narrative Mind Lessons Learned
 
Narrative Mind Lessons Learned H4D Stanford 2016
Narrative Mind Lessons Learned H4D Stanford 2016Narrative Mind Lessons Learned H4D Stanford 2016
Narrative Mind Lessons Learned H4D Stanford 2016
 
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETSQUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS
 
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETSQUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS
QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS
 
WMNST 382 Gender, Science and TechnologySpring 2015Guidelines.docx
WMNST 382 Gender, Science and TechnologySpring 2015Guidelines.docxWMNST 382 Gender, Science and TechnologySpring 2015Guidelines.docx
WMNST 382 Gender, Science and TechnologySpring 2015Guidelines.docx
 
Online Question and Answers Resources for the Bioinformatics Community
Online Question and Answers Resources for the Bioinformatics CommunityOnline Question and Answers Resources for the Bioinformatics Community
Online Question and Answers Resources for the Bioinformatics Community
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug Needham
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLP
 
Focused Question and Answer for Job Portal
Focused Question and Answer for Job PortalFocused Question and Answer for Job Portal
Focused Question and Answer for Job Portal
 
Seshadri ML Report
Seshadri ML ReportSeshadri ML Report
Seshadri ML Report
 
Reasesrty djhjan S - explanation required.pptx
Reasesrty djhjan S - explanation required.pptxReasesrty djhjan S - explanation required.pptx
Reasesrty djhjan S - explanation required.pptx
 
quizapplication-190520075937.pdf
quizapplication-190520075937.pdfquizapplication-190520075937.pdf
quizapplication-190520075937.pdf
 
Quiz application
Quiz applicationQuiz application
Quiz application
 
Gabber v1.1
Gabber v1.1Gabber v1.1
Gabber v1.1
 

Mehr von Amrith Krishna

Unsupervised program synthesis
Unsupervised program synthesisUnsupervised program synthesis
Unsupervised program synthesisAmrith Krishna
 
Asterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problemAsterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problemAmrith Krishna
 
Roller Coaster Problem - OS
Roller Coaster Problem  - OSRoller Coaster Problem  - OS
Roller Coaster Problem - OSAmrith Krishna
 
File Watcher - Lab Assignment
File Watcher - Lab AssignmentFile Watcher - Lab Assignment
File Watcher - Lab AssignmentAmrith Krishna
 
R - Eigen vector centrality with product reviews
R - Eigen vector centrality with product reviewsR - Eigen vector centrality with product reviews
R - Eigen vector centrality with product reviewsAmrith Krishna
 
Skipl List implementation - Part 2
Skipl List implementation - Part 2Skipl List implementation - Part 2
Skipl List implementation - Part 2Amrith Krishna
 
Skipl List implementation - Part 1
Skipl List implementation - Part 1Skipl List implementation - Part 1
Skipl List implementation - Part 1Amrith Krishna
 
Maach-Dal-Bhaat Problem
Maach-Dal-Bhaat ProblemMaach-Dal-Bhaat Problem
Maach-Dal-Bhaat ProblemAmrith Krishna
 
Named Entity recognition in Sanskrit
Named Entity recognition in SanskritNamed Entity recognition in Sanskrit
Named Entity recognition in SanskritAmrith Krishna
 
Astra word Segmentation
Astra word SegmentationAstra word Segmentation
Astra word SegmentationAmrith Krishna
 
Segmentation in Sanskrit texts
Segmentation in Sanskrit textsSegmentation in Sanskrit texts
Segmentation in Sanskrit textsAmrith Krishna
 

Mehr von Amrith Krishna (16)

Unsupervised program synthesis
Unsupervised program synthesisUnsupervised program synthesis
Unsupervised program synthesis
 
Asterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problemAsterix and the Maagic Potion - Suffix tree problem
Asterix and the Maagic Potion - Suffix tree problem
 
Roller Coaster Problem - OS
Roller Coaster Problem  - OSRoller Coaster Problem  - OS
Roller Coaster Problem - OS
 
File Watcher - Lab Assignment
File Watcher - Lab AssignmentFile Watcher - Lab Assignment
File Watcher - Lab Assignment
 
R - Eigen vector centrality with product reviews
R - Eigen vector centrality with product reviewsR - Eigen vector centrality with product reviews
R - Eigen vector centrality with product reviews
 
Skipl List implementation - Part 2
Skipl List implementation - Part 2Skipl List implementation - Part 2
Skipl List implementation - Part 2
 
Skipl List implementation - Part 1
Skipl List implementation - Part 1Skipl List implementation - Part 1
Skipl List implementation - Part 1
 
Maach-Dal-Bhaat Problem
Maach-Dal-Bhaat ProblemMaach-Dal-Bhaat Problem
Maach-Dal-Bhaat Problem
 
QGene Quiz 2016
QGene Quiz 2016QGene Quiz 2016
QGene Quiz 2016
 
Named Entity recognition in Sanskrit
Named Entity recognition in SanskritNamed Entity recognition in Sanskrit
Named Entity recognition in Sanskrit
 
Astra word Segmentation
Astra word SegmentationAstra word Segmentation
Astra word Segmentation
 
Segmentation in Sanskrit texts
Segmentation in Sanskrit textsSegmentation in Sanskrit texts
Segmentation in Sanskrit texts
 
ShutApp
ShutAppShutApp
ShutApp
 
Ferosa - Insights
Ferosa - InsightsFerosa - Insights
Ferosa - Insights
 
Taddhita Generation
Taddhita GenerationTaddhita Generation
Taddhita Generation
 
Windows Architecture
Windows ArchitectureWindows Architecture
Windows Architecture
 

Kürzlich hochgeladen

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 

Kürzlich hochgeladen (20)

Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

Analyzing Stack Overflow - Problem

  • 1. Assignment 6    Analyzing response time in Q&A websites    Question and Answer (Q&A) sites like StackOverflow, Yahoo! Answers, Naver, Quora,                      LiveQnA, WikiAnswers etc. are becoming increasingly popular with the growth of the Web.                          These are large collaborative production and social computing platforms of the Web, aimed at                            crowd­sourcing knowledge by allowing users to post and answer questions. They not only                          provide a platform for experts to share their knowledge and get identified but also help novice                                users solve their problems effectively. StackOverflow is one such community­driven Q&A                      website used by more than a million software developers who post and answer questions                            related to computer programming. It is governed by a reputation system which rewards the                            users by giving reputation points, badges, extra privileges on the website, etc. by the usefulness                              of their posts. The usefulness of a question or an answer is largely determined by the number of                                    votes it receives.    In such a crowd­sourced system driven by a reputation mechanism, response time of                          questions to receive the first answer plays an important role and would largely determine the                              popularity of the website. People who post questions would want to know the time by which they                                  can expect a response to their question. In this assignment, we want to investigate whether                              besides several other factors, tags of a question have strong correlation with response time.                            Tagging questions involves askers selecting appropriate keywords (e.g., android, jquery, c#) to                        broadly identify the domains to which their questions are related. There also exist mechanisms                            by which other users can subscribe to tags, search via tags, mark tags as favorites, etc. As a                                    result, tags should play a crucial role in how the questions are answered and hence determining                                their response time.    Input Dataset:    http://gaming.stackexchange.com/  (Dataset­ https://archive.org/download/stackexchange/gaming.stackexchange.com.7z) is      a sister site of StackOverflow where questions related to Gaming are discussed. We have                            attached the datadump of the website till 26th September, 2014. Download and Unzip the                            dataset and you will find the following files  ● Badges.xml  ● Comments.xml  ● PostHistory.xml  ● PostLinks.xml  ● Posts.xml  ● Tags.xml  ● Users.xml  ● Votes.xml 
  • 2.   Information about all the posts (questions and answers) and tags can be found in “Posts.xml”                              and “Tags.xml” files respectively. Examples from each of the files are given below.    Typical Question    <row Id="7" PostTypeId="1" AcceptedAnswerId="10" CreationDate="2014­05­14T00:11:06.457"  Score="1" ViewCount="185" Body="&lt;p&gt;As a researcher and instructor, I'm looking for  open­source books (or similar materials) that provide a relatively thorough overview of data  science from an applied perspective. To be clear, I'm especially interested in a thorough  overview that provides material suitable for a college­level course, not particular pieces or  papers.&lt;/p&gt;&#xA;" OwnerUserId="36" LastEditorUserId="97"  LastEditDate="2014­05­16T13:45:00.237" LastActivityDate="2014­05­16T13:45:00.237"  Title="What open­source books (or other materials) provide a relatively thorough overview of  data science?" Tags="&lt;education&gt;&lt;open­source&gt;" AnswerCount="3"  CommentCount="4" FavoriteCount="1" ClosedDate="2014­05­14T08:40:54.950" ></row>     Typical Answer    <row Id="10" PostTypeId="2" ParentId="7" CreationDate="2014­05­14T00:53:43.273" Score="8"  Body="&lt;p&gt;One book that's freely available is &quot;The Elements of Statistical  Learning&quot; by Hastie, Tibshirani, and Friedman (published by Springer): &lt;a  href=&quot;http://statweb.stanford.edu/~tibs/ElemStatLearn/&quot;&gt;see Tibshirani's  website&lt;/a&gt;.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;Another fantastic source, although it isn't a book,  is Andrew Ng's Machine Learning course on Coursera. This has a much more applied­focus  than the above book, and Prof. Ng does a great job of explaining the thinking behind several  different machine learning algorithms/situations.&lt;/p&gt;&#xA;" OwnerUserId="22"  LastActivityDate="2014­05­14T00:53:43.273" CommentCount="1" />     Typical Tag    <row Id="3" TagName="bigdata" Count="46" ExcerptPostId="66" WikiPostId="65" />     Output Deliverables:    A. Feature Calculation  You should use Java to parse these xml files and for each question, calculate the                              response time and the following tag based features:    1. tag_popularity: We define popularity of a tag t as its frequency, i.e., the number of                              questions that contains t as one of its tags. For each question, you should compute the                                average popularity of all its tags.  
  • 3. 2. num_pop_tags: We consider a tag to be popular if its frequency is more than 20. Here                                you should count the number of popular tags each question contains. There will be atmost                              6 boxes in plot as each question can contain at max 5 tags.  3. num_subs_ans: We define an “active subscriber” of a tag t to be a user who has posted                                  “sufficient” answers in the “recent past” to questions containing t. We say that a user has                                posted “sufficient” answers when the number of their answers is greater than 5 and by                              “recent past” we mean answers posted after 7th Jan 2014. After computing the number                            of active subscribers for every tag, you should compute the average number of active                            subscribers for individual tags in each question.  4. percent_subs_ans: For each tag, you should also compute the ratio of the number of                            “active subscribers” to the total number of subscribers, where the total number of                          subscribers indicates the number of users who have posted at least one answer to a                              question containing a particular tag. After computing the ratio for every tag, you should                            compute the average ratio for individual tags in each question.    B. Feature Analysis  To analyze the question features and their correlation with response time, you should                          construct plots of the response time against the values of different features. You should                            distribute the feature values into ten equal bins and then use gnuplot to produce the following                                two plots:  1. Box plots that capture the median, 25% and 75% of the response time distributions, as                              well as the minimum and maximum values, and   2. Cumulative distribution function (CDF) plots of the response time.