Más contenido relacionado

Más de Educational Technology(20)


Semantic Recommandation Sytems for Research 2.0

  1. SEMANTIC RECOMMENDATION SYSTEMS FOR RESEARCH 2.0 OR A Conceptual Prototype for a Twitter based Recommender System for Research 2.0 by Patrick Thonhauser Thursday, October 11, 12
  2. OUTLINE • Motivation • Basics (Semantic Web, Recommender Systems, Natural Language Processing) • Conceptual Prototype • Test results and Discussion • Questions Thursday, October 11, 12
  3. MOTIVATION • Is Twitter useful for discovering new connections between researchers in similar subject areas (and why Twitter)? • How much information can we extract form 140 character strings? • Is it possible to separate useful information from noise? • Are there any appropriate classifiers and metrics to measure the significance of Twitter users and Tweets? Thursday, October 11, 12
  4. SEMANTIC WEB • Additional Layer of Information • Linked Data (use URIs as names, use HTTP URIs, use standards to provide Information, include links to other URIs) • RDF (based on triples -> subject, predicate, object) is like HTML for the classic web • Nearly all semantic web standards are based on RDF (like FOAF - Friend of a Friend Project) Thursday, October 11, 12
  5. RECOMMENDER SYSTEMS • Collaborative Filtering (user based/ item based) • Content Based Recommendation • Knowledge Based Recommendation • Hybrid Recommendations Thursday, October 11, 12
  6. NATURAL LANGUAGE PROCESSING (NLP) • Classification of Microtext Artefacts (This presentation is killer!) • Applying NLP - Pipelines • End of Sentence Detection • Tokenization • POS Tagging • Chunking • Extraction Thursday, October 11, 12
  7. THE CONCEPT OF THOUGHT BUBBLES Let’s imagine every Twitter user belongs to several different topic related Bubbles Thursday, October 11, 12
  8. LET’S SUMMARIZE •A user is part of topic related bubbles • Twitter users within topic related bubbles don’t necessarily know each other • Connections of already existing connections of the service user lead to new information • Non bidirectional connections preferred So how can we find such potentially interesting users? Thursday, October 11, 12
  9. PROOF OF CONCEPT SYSTEM (1) Preselection of user set, which will be analyzed in depth A USERS THOUGHT BUBBLE SPORTS (2) Apply NLP-Pipeline for measuring user similarity SERVICE IOS DEV USER (3) Categorize the top-n best scoring TW RE ITT SOCIAL MEDIA ST ER T BU HO users according to the idea of AP BB UG I LE HT S AP Thought Bubbles I PRE- FILTERING NLP (4) Recommend top-n best scoring CLUSTERING users of a category to the user DB CATEGORI ANALYZE SATION RECS SERVER (5) Analyze acceptance of recommendations Thursday, October 11, 12
  10. (1) PRE-SELECTION/FILTERING Filter accounts Filter accounts where: Filter non that are already follower_count < 300 English speaking connected to you status_count < 1000 accounts Friends of Identifiy People Friends Filter Filter Filter by using a simple Twitter NLP Pipeline Accounts Set of Twitter accounts for further processing • The set of friends of friend‘s Twitter accounts changes from iteration to iteration • Filtersare added after analyzing the acceptance of recommendations Thursday, October 11, 12
  11. (2) NLP PIPELINE Tokenization and Neglect 200 most stripping Raw Tweets @mentions and POS tagged Tweets used English words URLs [('The', 'AT'), ('grand', 'JJ'), @testuser The ('jury', 'NN'), grand jury ('commented', commented on a POS tagging 'VBD'), ('on', 'IN'), Chunking number of… ('a', 'AT'), ('number', 'NN'), ... ('.', '.')] Set of Frequency Distributed mined Mined nouns and phrases nouns and phrases [('jury', 'NN'), [('jury', 34), 'number', ('social', 23), Frequency 'NN'), DB ('test case', Distribution ('social 16), ...] dayly', 'NP'), ...] Filter top n words 400 most recent Tweets of a potential recommendation are used for calculating the similarity measure Thursday, October 11, 12
  12. • Calculate top-n users by applying Single-Linkage- Clustering • Categorize if user belongs to user specific bubbles • Present recommendation lists to users • Analyze acceptance of recommendations (connect user accounts with FOAF) and add new filter predicate if necessary. Thursday, October 11, 12
  13. recommendations are framed @gargamit100* @selvers* @UpsideLearning* @poposkidimitar* @jkalten* SUPERVISED @cpappas* @pfidalgo1* @timbuckteeth* @starsandrobots* @TheJ Russ @cliveshepherd* TEST RUN @Microsoft @jtcobb* @MichaelPhelps @SebastianThrun* @elearning* @elvaandrade @BarackObama @SteveVictor @AnwarRichardson @pabaker55* @jamesmclynn @DrEvanHarris @mstrohm* @AmyFrearson @gekitz @Hhaitch @sclater* @TheRock @MCeraWeakBaby @fatcharlesh @FrankViola @timbarker @AnnaOscarsson @WithDrake sabrinaVanessa @charliesheen @WWEDanielBryan @cmccosky @kaitlyntrigger @judithsei* @atsc* @melaniedaveid @Emmadw* @ladygaga @marcusfairs @lucyheartsTW @PeterSmith @MikeVick @meadd cameron 0 0.075 0.150 0.225 0.300 Thursday, October 11, 12
  14. UNSUPERVISED TEST RESULTS The probability that a recommended item is relevant is 64.4%. Standard Derivation: 31.5% Thursday, October 11, 12
  15. DISCUSSION Twitter IS useful for discovering new information in sense of Research 2.0 but: • Recommendations reflect the Twitter behavior of the user • Automated tweets harm recommendation results (one sentence gets an enormous weight because it occurs very very often) • Twitter‘s request limitation is a show stopper • Comparison to similar systems (Content and collaborative filtering) Thursday, October 11, 12
  16. THANK YOU! ANY QUESTIONS? Thursday, October 11, 12