Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Internetlivestats.com
Coreference resolution
Question answering (QA)
Part-of-speech (POS) tagging
Word sense disambiguation (WSD)
Paraphrase
Nam...
non-standard English
Great job @justinbieber! Were
SOO PROUD of what youve
accomplished! U taught us 2
#neversaynever & yo...
http://www.alchemyapi.com/
https://www.congress.gov/resources/display/content/The+Federalist+Papers#TheFederalistP
apers-10
1:10pm
Non-Stop
Adair Moesteller &Wallace Fung Collins et al
Corpus
Document
Term
Source:
Chris Manning
Tokenize Clean Stem Filter
Then a hurricane came, and devastation reigned
then a hurricane came and devastation reigned
th...
GitHub site
1:20pm Code Lines: 1 - 49
Code Lines: 50-79
Federalist Paper 1: Before Federalist Paper 1: After
Code Lines: 71-88
Federalist Paper 1: After
Code Lines: 89-104
Code Lines: 142-149
Code Lines: 151-1651:30pm
Code Lines: 167-171
Code Lines: 173-188
Code Lines: 189-201
Code Lines: 202-207
Uncomment (CTRL + SHIFT +C) and run lines 107-139
Code Lines: 107-139
then rerun lines
141-206
1:50pm - 2pm
BayesTheorem
these slides
Code Lines: 208-219
Update
Code Lines: 231-241
Code Lines: 242-248
Code Lines: 250-273
Code Lines: 275-290
This will take about 4 mins, depending on the computer you run it on
Code Lines: 295-308
Source: David Blei (link to article)
Code Lines: 295-308
Index.html file in the “Federalist” folder in your working directory.
Open with FireFox; it is not supported by Chrome or ...
Code Lines: 321-349
Code Lines: 350-370
• Naïve Bayes predicts 9 of the 12 papers
as written by Madison.
• K-NN predicts only 4 of the 12 papers
as written by Mad...
2:30pm
Source: Richard Heimann
Source: Richard Heimann
Source: Richard Heimann
The Beige Book
GitHub
Source: Richard Heimann
https://github.com/wesslen/BeigeBookSentimentAnalysis
First six records of BB.sentiment
First six records of BB.sentiment (updated)
Raw Scored Sentiment Scaled Scored Sentiment
Stanford Deep Learning NLP class materials
https://projectmosaic.uncc.edu/events-list/
GNIP access
http://www.r-
bloggers.com/setting-up-the-twitter-r-package-for-te...
AlchemyAPI
Taste Analytics Signals
SAS Enterprise Miner
SAS Sentiment Analysis
Hamilton Soundtrack Amazon Reviews
R tm package
Python nltk package
Python gensim package
Mallet
IntroductoryText MiningClass
Coursera Natural
Language ProcessingClass
CourseraText
Mining & Analytics Course
Deep Learnin...
https://www.kaggle.com/c/word2vec-nlp-tutorial/details/part-
1-for-beginners-bag-of-words
http://www.alchemyapi.com/develo...
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Text Mining with R for Social Science Research
Nächste SlideShare
Wird geladen in …5
×

Text Mining with R for Social Science Research

712 Aufrufe

Veröffentlicht am

Text Mining workshop using R for Project Mosaic, UNC Charlotte's Social Science Research Initiative. The workshop analyzes the Federalist Papers and the Federal Reserve Beige Book with applications in text classification, topic modeling and sentiment analysis.

Veröffentlicht in: Bildung
  • Als Erste(r) kommentieren

Text Mining with R for Social Science Research

  1. 1. Internetlivestats.com
  2. 2. Coreference resolution Question answering (QA) Part-of-speech (POS) tagging Word sense disambiguation (WSD) Paraphrase Named entity recognition (NER) Parsing Summarization Information extraction (IE) Machine translation (MT) Dialog Sentiment analysis mostly solved making good progress still really hard Spam detection (Classification) Let’s go to Agra! Buy V1AGRA … ✓ ✗ Colorless green ideas sleep furiously. ADJ ADJ NOUN VERB ADV Einstein met with UN officials in Princeton PERSON ORG LOC You’re invited to our dinner party, Friday May 27 at 8:30 Party May 27 add Best roast chicken in San Francisco! The waiter ignored us for 20 minutes. Carter told Mubarak he shouldn’t run again. I need new batteries for my mouse. The 13th Shanghai International Film Festival… 第13届上海国际电影节开幕… The Dow Jones is up Housing prices rose Economy is good Q. How effective is ibuprofen in reducing fever in patients with acute febrile illness? I can see Alcatraz from the window! XYZ acquired ABC yesterday ABC has been taken over by XYZ Where is Citizen Kane playing in SF? Castro Theatre at 7:30. Do you want a ticket? The S&P500 jumped Source: Dan Jurafsky
  3. 3. non-standard English Great job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥ segmentation issues idioms dark horse get cold feet lose face throw in the towel neologisms unfriend Retweet bromance tricky entity names Where is A Bug’s Life playing … Let It Be was recorded … … a mutation on the for gene … the New York-New Haven Railroad the New York-New Haven Railroad Source: Dan Jurafsky (modified) sarcasm A: I love Justin Bieber. Do you like him to? B:Yeah. Sure. I absolutely love him.
  4. 4. http://www.alchemyapi.com/ https://www.congress.gov/resources/display/content/The+Federalist+Papers#TheFederalistP apers-10
  5. 5. 1:10pm
  6. 6. Non-Stop
  7. 7. Adair Moesteller &Wallace Fung Collins et al
  8. 8. Corpus Document Term
  9. 9. Source: Chris Manning
  10. 10. Tokenize Clean Stem Filter Then a hurricane came, and devastation reigned then a hurricane came and devastation reigned then a hurricane came and devastation reigned then a hurricane came and devastation reigned
  11. 11. GitHub site 1:20pm Code Lines: 1 - 49
  12. 12. Code Lines: 50-79
  13. 13. Federalist Paper 1: Before Federalist Paper 1: After Code Lines: 71-88
  14. 14. Federalist Paper 1: After Code Lines: 89-104
  15. 15. Code Lines: 142-149
  16. 16. Code Lines: 151-1651:30pm
  17. 17. Code Lines: 167-171
  18. 18. Code Lines: 173-188
  19. 19. Code Lines: 189-201
  20. 20. Code Lines: 202-207
  21. 21. Uncomment (CTRL + SHIFT +C) and run lines 107-139 Code Lines: 107-139 then rerun lines 141-206
  22. 22. 1:50pm - 2pm
  23. 23. BayesTheorem these slides
  24. 24. Code Lines: 208-219
  25. 25. Update Code Lines: 231-241
  26. 26. Code Lines: 242-248
  27. 27. Code Lines: 250-273
  28. 28. Code Lines: 275-290
  29. 29. This will take about 4 mins, depending on the computer you run it on Code Lines: 295-308
  30. 30. Source: David Blei (link to article)
  31. 31. Code Lines: 295-308
  32. 32. Index.html file in the “Federalist” folder in your working directory. Open with FireFox; it is not supported by Chrome or IE.
  33. 33. Code Lines: 321-349
  34. 34. Code Lines: 350-370
  35. 35. • Naïve Bayes predicts 9 of the 12 papers as written by Madison. • K-NN predicts only 4 of the 12 papers as written by Madison • Why? How stable are these results?? Code Lines: 371-373
  36. 36. 2:30pm
  37. 37. Source: Richard Heimann
  38. 38. Source: Richard Heimann
  39. 39. Source: Richard Heimann
  40. 40. The Beige Book GitHub Source: Richard Heimann
  41. 41. https://github.com/wesslen/BeigeBookSentimentAnalysis
  42. 42. First six records of BB.sentiment
  43. 43. First six records of BB.sentiment (updated)
  44. 44. Raw Scored Sentiment Scaled Scored Sentiment
  45. 45. Stanford Deep Learning NLP class materials
  46. 46. https://projectmosaic.uncc.edu/events-list/ GNIP access http://www.r- bloggers.com/setting-up-the-twitter-r-package-for-text-analytics/
  47. 47. AlchemyAPI Taste Analytics Signals SAS Enterprise Miner SAS Sentiment Analysis Hamilton Soundtrack Amazon Reviews
  48. 48. R tm package Python nltk package Python gensim package Mallet
  49. 49. IntroductoryText MiningClass Coursera Natural Language ProcessingClass CourseraText Mining & Analytics Course Deep Learning for Natural Language Processing
  50. 50. https://www.kaggle.com/c/word2vec-nlp-tutorial/details/part- 1-for-beginners-bag-of-words http://www.alchemyapi.com/developers/getting-started- guide/twitter-sentiment-analysis https://eight2late.wordpress.com/2015/09/29/a-gentle- introduction-to-topic-modeling-using-r/ http://www.r-bloggers.com/sentiment-analysis-on-donald- trump-using-r-and-tableau/ Follow this link for all R “text” blogs on Rbloggers website

×