SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Dissertation proposal defense Xiaoju Zheng June 9, 2010 Life Cycle of #hashtags: “words” in the #Twittertopia 1
Road Map of the Presentation Background of research questions Research Questions Overview of the data Follow-up Experiments Diffusion models Challenges 2
The Laws of Imitation  “why, given one hundred different innovations conceived of at the same time – innovations in the form of words, in mythological ideas, in industrial processes, etc. – ten will spread abroad, while ninety will be forgotten.”           --- Gabriel Tarde (1903) “The Laws of Imitation” Merit of its own? Or something else? 3
The Laws of Imitation “……..we see that the incessant struggle between minor linguistic inventions which always ends in the imitation of one of them, and in the abortion of the others, finally comes to transform a language in such a way as to adapt it, more or less rapidly and completely, according to the spirit of the community, to external realities and to the social purposes of language. ..”           --- Gabriel Tarde (1903) “The Laws of Imitation” 4
The Laws of Imitation Merit is not the only catalyst of the spread of an idea. In situations where “the poorest innovations, from the point of view of logic, are selected because of their place, or even date of birth.”, Tarde attributes these irrational occurrences to “extra-logical influences” 5
Social Network Analysis Current research in social network analysis asserts that these “extra-logical” influences can be explained by examining the dynamics of the network through which influence is transmitted between individuals.   In other words, if we view individuals as nodes in a social network, where a directed edge indicates that one node influences another, then some graph configurations make it more likely that an innovation will be widely adopted than others. 6
Basic Research Questions How is a word created? What makes a newly-created word better than others? How is a newly-created word picked up by users at large? How does a word gain popularity among the population? In a word, the life cycle of a word. 7
Research Question --- Data Word creation in EnglishIn spoken English, it can take decades – even centuries – for new words to emerge, become part of common parlance, and then fade into disuse. Word creation on Twitter,a word in the form of #hashtags can live the entire lifecycle in very short period of time, e.g. a couple of days A news story breaks, and competing hashtags vie for dominance. Then a few influential people adopt the same one. Suddenly the conversation coalesces around it, the term trends, the spammers start using it, and then the conversation peters out as we move on to the next topic. (only one possibility) Is that the pattern? And how closely does it map onto the ways that words and phrases emerge in spoken language? #hashtag – word on twitter 8
Twitter Twitter.com:Twitter is a social networking and micro-blogging service that enables its users to send and read messages known as tweets. Tweets are text-based posts of up to 140 characters displayed on the author's profile page and delivered to the author's subscribers who are known as followers.  9
Twitter: some conventions  10 @mentions - following word is the name of a twitter user and as such this tweet refers to that user, e.g. ”@dave thanks for the help” or ”Talking with @paul about twitter”. (can be used to spot smaller network) Retweets -”RT” means ”I am retweeting (copying) something from elsewhere”, e.g. ”RT@john I just saw Madonna” means that I am retweeting theoriginal message from John (can be used to spot smaller network) #hashtags –give contextual relevance to a tweet or identified as a keyword, e.g. ”Like this demo #acita09” or ”Why does #ms-word keep crashing”
Top Influential People on Twitter the Edinburgh Twitter Corpus (around 2 billion tokens)  11 Six Singers
Top 10 trending topic from the Edinburgh Twitter Corpus (around 2 billion tokens)  12
13
Research Question Word creation and its propsperitywhat count as criteria for a newly-coined “word” to be accepted as a good #hashtag and how a good #hashtag gain popularity among groups of people.   Logical: linguistic groundings of a good #hashtagLinguistic analysis of the #hashtags and behavioral studies  Extra-logical: social groundings of a popular #hashtage.g. network structure and dynamics 14
#Hashtag 15
I.  Linguistic Analysis of #hashtagsmore at https://docs.google.com/Doc?docid=0AWbvIzcQLhQXZGdoY256cDJfMTExZjZjbjNxbTM&hl=en Linking words into a sentence: e.g. whatsyourbackground, tweetwhatyoueatLetsMakeATrendingTopic,goodluckjustin Part of word + existing word: e.g. animtip,appstore Compounding: noun + noun e.g. sundayhug, pubquiz, waikikilunch Compounding: verb + noun e.g. hashtagme, pickon,killcapscop Compounding: adv + verb e.g. currentlycrushing Compounding: adj + noun e.g. digitalbritan, GoodTimes, morningsickness Splinter: e.g. socialmem (SocialCamp Memphis),  Acronym & Initials: e.g. smlb (St Michael Le Belfrey church), emr (electronic medical records), #eu (European Union), #cah (Crimes against humanity) Neologism (splinter involved): e.g. twacker(twitter users who lose user account), tweetie (Twitter client for Mac and iPhone), twitvorce(to divorce yourself from a Twitter member by unfollowing them), twittertopia, twendsetter MISC: omgfact, #tcot (top conversation on twitter) 16
Preliminary Analysis Public timeline: 20 tweets per minute 20 days of non-stop crawling Total tweets = 567,091 Total words = 8,495,323 Average words per tweet = 14.98 NPS Chat Corpus: 45010 tokens/6,066 types Webtext corpus in NLTK: 396,736 tokens/21,537 types 17
Top 10 Frequent words 18
Top 10 Frequent words 19
Top 10 Frequent words 20
Twitter presents a different genre of texts Self expression: "I" is the top-ranking word that tweets begin with. Stats update: "Watching", "trying", "listening", "reading" and "eating" are all in the Top 100 first words, revealing just how often people use Twitter to report on whatever they are experiencing at the time. News broadcast: The abbreviation "RT" (retweet) is extremely common 21
Twitter presents a different genre of text popular web addresses (e.g. URL shortening service) among the top 500: "tinyurl.com", "twitpic.com", "ff.im", "twurl.nl". These all appear because they offer services useful to twitterers.  Tech vocabulary: among top 500:  "Google”, “Faceobok”:, “internet”, “website”, “blog”, “Mac”, and “app”.   popular web addresses (e.g. URL shortening service) among the top 500: "tinyurl.com", "twitpic.com", "ff.im", "twurl.nl". These all appear because they offer services useful to twitterers.  Tech vocabulary: among top 500:  "Google”, “Faceobok”:, “internet”, “website”, “blog”, “Mac”, and “app”.   22
Research Question 23 linguistic groundings of a good #hashtagLinguistic analysis of the #hashtags and behavioral studies  social groundings of a popular #hashtage.g. network structure and dynamics Would linguistically equally good #hashtags have different degrees of popularity? Is it because of the different network structure? Behavioral studies to get quantitative measurement about linguistic goodness of #hashtags.
Linguistic Grounding 24 Question 1: Does the tag length distribution of adopted #hashtag demonstrate a different distribution from words?  Does it conform to a power law distribution or a lognormal distribution?  Do #hashtags of different length receive different goodness judgement (e.g. are extremely short tags better than extremely short words?)
Linguistic Grounding 25 Question 2: What are the linguistics processes of creating a #hashtag? What count as a good #hashtag (morphologically, phonotactically, and semantically)? A more qualitative analysis of the #hashtags needs to be done to design a metrics of analysis: e.g. compounding, splinter (of what kind)
Linguistic Grounding – behavioral experiements 26 Word vs. Nonword: Subjects will be presented with #hashtags collected from twitter.com, and asked to label them as either word or nonword. Come up with specific criterion for word vs. nonword Morpheme identification: based on the results obtained from the Word vs. Nonword experiment, #hashtags will be presented for subjects to divide them into morphemes and identify meaningful subparts.
Linguistic Grounding – behavioral experiements 27 Semantic transparency:word association game: for hashtags like “twitvorce”, subjects will be asked to provide free word associations.  For instance, subjects are likely to provide “twitter” and “divorce” for the “twitvorce”.
28 Goodness rating: general: for both #hashtags, that are “nonwords”, subjects will provide subjective goodness ratings, e.g. on a scale from 1 to 7.  phonotactic: subjects rate the pronouncability, e.g. for acronyms and initials.For instance, some acronyms are just strings of consonants without vowels, some are strings of vowels, and others are mixture of consonants and vowels.  Would more pronouncable #hashtags be perceived as better #hashtags?
Linguistic Grounding – statistical parser 29 Phonotactic likelihood:  Develop a statistical parser (e.g. finite state machine) for #hashtags and words, and compare the phonotactic probability.  Also compare the statistical parser with e.g. Vitevich (2004) model.
Social Grounding 30 Based on the realistic data from twitter, diffusion models can be tested. Diffusion models:Linear Threshold ModelCascade Model
The Threshold model Threshold Model.It says that people adopt a new behavior because a sufficiently large proportion of their friends have adopted that behavior. E.g. Early adopters have a very low threshold, say 5% or 10%, while late adopters would have a much higher threshold. Every person, however, has their own individual threshold. The key variable here is the initial distribution of thresholds across a social network, which describes in totality the final extent of the behavior. But this model says nothing about how people initially adopt behavior. That is, it says nothing about innovators or the things that are being invented, only about the spread of innovation through a social network. 31
The Threshold model 32 In the threshold model every person u has a threshold :and each of their neighbors v is weighted according to: W u,v.If    then the person u adopts the behavior.      The set of thresholds, weights, and initial adopters determines the extent of the behavior in the social network.
The Cascade Model Cascade Modelevery person has a chance of adopting a new behavior whenever one of their neighbors adopts it. The probability that a person adopts the new behavior is the conversion rate for the notification. This probability is both a function of the sender and the recipient, so more influential people are more likely to convince others to adopt a behavior.  33

Weitere ähnliche Inhalte

Was ist angesagt?

Will Twitter change the way that market researchers communicate?
Will Twitter change the way that market researchers communicate?Will Twitter change the way that market researchers communicate?
Will Twitter change the way that market researchers communicate?Daniel Alexander-Head
 
Slides for ssm presentation. Catherine Booth
Slides for ssm presentation. Catherine BoothSlides for ssm presentation. Catherine Booth
Slides for ssm presentation. Catherine Boothcbooth123
 
TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...
TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...
TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...TAUS - The Language Data Network
 
NMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportNMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportPatrick Grant
 
Metaphic or the art of looking another way.
Metaphic or the art of looking another way.Metaphic or the art of looking another way.
Metaphic or the art of looking another way.Suresh Manian
 
Jan 2010 Twitter Effectiveness Preso
Jan 2010 Twitter Effectiveness PresoJan 2010 Twitter Effectiveness Preso
Jan 2010 Twitter Effectiveness PresoHack the Hood
 
Your 'traditional' social media toolkit
Your 'traditional' social media toolkitYour 'traditional' social media toolkit
Your 'traditional' social media toolkitjoniayn
 
On Incentive-based Tagging
On Incentive-based TaggingOn Incentive-based Tagging
On Incentive-based TaggingFrancesco Rizzo
 
Handout from How Tweet It Is: Creating A Following On Twitter (Monica Hamburg)
Handout from How Tweet It Is: Creating A Following On Twitter (Monica Hamburg)Handout from How Tweet It Is: Creating A Following On Twitter (Monica Hamburg)
Handout from How Tweet It Is: Creating A Following On Twitter (Monica Hamburg)CanadaHelps / MyCharityConnects
 

Was ist angesagt? (10)

Twitter
TwitterTwitter
Twitter
 
Will Twitter change the way that market researchers communicate?
Will Twitter change the way that market researchers communicate?Will Twitter change the way that market researchers communicate?
Will Twitter change the way that market researchers communicate?
 
Slides for ssm presentation. Catherine Booth
Slides for ssm presentation. Catherine BoothSlides for ssm presentation. Catherine Booth
Slides for ssm presentation. Catherine Booth
 
TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...
TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...
TaaS Workshop 2014, Terminology Trends- First-hand Experience as a Blogger, M...
 
NMIX 4200 Final Paper Report
NMIX 4200 Final Paper ReportNMIX 4200 Final Paper Report
NMIX 4200 Final Paper Report
 
Metaphic or the art of looking another way.
Metaphic or the art of looking another way.Metaphic or the art of looking another way.
Metaphic or the art of looking another way.
 
Jan 2010 Twitter Effectiveness Preso
Jan 2010 Twitter Effectiveness PresoJan 2010 Twitter Effectiveness Preso
Jan 2010 Twitter Effectiveness Preso
 
Your 'traditional' social media toolkit
Your 'traditional' social media toolkitYour 'traditional' social media toolkit
Your 'traditional' social media toolkit
 
On Incentive-based Tagging
On Incentive-based TaggingOn Incentive-based Tagging
On Incentive-based Tagging
 
Handout from How Tweet It Is: Creating A Following On Twitter (Monica Hamburg)
Handout from How Tweet It Is: Creating A Following On Twitter (Monica Hamburg)Handout from How Tweet It Is: Creating A Following On Twitter (Monica Hamburg)
Handout from How Tweet It Is: Creating A Following On Twitter (Monica Hamburg)
 

Ähnlich wie Proposal defense

Slides for ssm presentation cat booth
Slides for ssm presentation cat boothSlides for ssm presentation cat booth
Slides for ssm presentation cat boothcbooth123
 
Atlanta Press Club Talk on # Grammar
Atlanta Press Club Talk on # GrammarAtlanta Press Club Talk on # Grammar
Atlanta Press Club Talk on # GrammarJeanne Bohannon
 
Where is New Media Now? Some Ideas...
Where is New Media Now? Some Ideas...Where is New Media Now? Some Ideas...
Where is New Media Now? Some Ideas...Jessica Laccetti
 
Talk of the City: Londoners and Social Media
Talk of the City: Londoners and Social MediaTalk of the City: Londoners and Social Media
Talk of the City: Londoners and Social MediaDaniele Quercia
 
Slides for ssm presentation Catherine Booth
Slides for ssm presentation Catherine BoothSlides for ssm presentation Catherine Booth
Slides for ssm presentation Catherine Boothcbooth123
 
Characterizing microblogs
Characterizing microblogsCharacterizing microblogs
Characterizing microblogsEtico Capital
 
What Is Twitter: A guide for NFPs
What Is Twitter: A guide for NFPsWhat Is Twitter: A guide for NFPs
What Is Twitter: A guide for NFPsRachel Beaney
 
Tags, Networks, Narrative: Investigating the Use of Social Software for the S...
Tags, Networks, Narrative: Investigating the Use of Social Software for the S...Tags, Networks, Narrative: Investigating the Use of Social Software for the S...
Tags, Networks, Narrative: Investigating the Use of Social Software for the S...Bruce Mason
 
SEGMENTING TWITTER HASHTAGS
SEGMENTING TWITTER HASHTAGSSEGMENTING TWITTER HASHTAGS
SEGMENTING TWITTER HASHTAGSijnlc
 
Use and Applications of Social Media in Research
Use and Applications of Social Media in ResearchUse and Applications of Social Media in Research
Use and Applications of Social Media in ResearchHarris Lygidakis
 
Investigating the Use of Social Software for the Study of Narrative Digital C...
Investigating the Use of Social Software for the Study of Narrative Digital C...Investigating the Use of Social Software for the Study of Narrative Digital C...
Investigating the Use of Social Software for the Study of Narrative Digital C...Bruce Mason
 
Healthcare Hashtags – a Social Project By.Dr.Mahboob ali Khan Phd
Healthcare Hashtags – a Social Project By.Dr.Mahboob ali Khan PhdHealthcare Hashtags – a Social Project By.Dr.Mahboob ali Khan Phd
Healthcare Hashtags – a Social Project By.Dr.Mahboob ali Khan PhdHealthcare consultant
 
NASW Workshop: The Secret Life of Social Media
NASW Workshop: The Secret Life of Social MediaNASW Workshop: The Secret Life of Social Media
NASW Workshop: The Secret Life of Social MediaDennis Meredith
 
Tenure track socialmedia_10082010
Tenure track socialmedia_10082010Tenure track socialmedia_10082010
Tenure track socialmedia_10082010Ines Mergel
 

Ähnlich wie Proposal defense (20)

March 11 Lab
March 11 LabMarch 11 Lab
March 11 Lab
 
Twitter 101
Twitter 101Twitter 101
Twitter 101
 
Slides for ssm presentation cat booth
Slides for ssm presentation cat boothSlides for ssm presentation cat booth
Slides for ssm presentation cat booth
 
Atlanta Press Club Talk on # Grammar
Atlanta Press Club Talk on # GrammarAtlanta Press Club Talk on # Grammar
Atlanta Press Club Talk on # Grammar
 
Where is New Media Now? Some Ideas...
Where is New Media Now? Some Ideas...Where is New Media Now? Some Ideas...
Where is New Media Now? Some Ideas...
 
Twitter for Researchers
Twitter for ResearchersTwitter for Researchers
Twitter for Researchers
 
Talk of the City: Londoners and Social Media
Talk of the City: Londoners and Social MediaTalk of the City: Londoners and Social Media
Talk of the City: Londoners and Social Media
 
Slides for ssm presentation Catherine Booth
Slides for ssm presentation Catherine BoothSlides for ssm presentation Catherine Booth
Slides for ssm presentation Catherine Booth
 
Twitter Training for Activists
Twitter Training for ActivistsTwitter Training for Activists
Twitter Training for Activists
 
Characterizing microblogs
Characterizing microblogsCharacterizing microblogs
Characterizing microblogs
 
What Is Twitter: A guide for NFPs
What Is Twitter: A guide for NFPsWhat Is Twitter: A guide for NFPs
What Is Twitter: A guide for NFPs
 
Tags, Networks, Narrative: Investigating the Use of Social Software for the S...
Tags, Networks, Narrative: Investigating the Use of Social Software for the S...Tags, Networks, Narrative: Investigating the Use of Social Software for the S...
Tags, Networks, Narrative: Investigating the Use of Social Software for the S...
 
SEGMENTING TWITTER HASHTAGS
SEGMENTING TWITTER HASHTAGSSEGMENTING TWITTER HASHTAGS
SEGMENTING TWITTER HASHTAGS
 
Use and Applications of Social Media in Research
Use and Applications of Social Media in ResearchUse and Applications of Social Media in Research
Use and Applications of Social Media in Research
 
Investigating the Use of Social Software for the Study of Narrative Digital C...
Investigating the Use of Social Software for the Study of Narrative Digital C...Investigating the Use of Social Software for the Study of Narrative Digital C...
Investigating the Use of Social Software for the Study of Narrative Digital C...
 
Healthcare Hashtags – a Social Project By.Dr.Mahboob ali Khan Phd
Healthcare Hashtags – a Social Project By.Dr.Mahboob ali Khan PhdHealthcare Hashtags – a Social Project By.Dr.Mahboob ali Khan Phd
Healthcare Hashtags – a Social Project By.Dr.Mahboob ali Khan Phd
 
Sjsul web2.011
Sjsul web2.011Sjsul web2.011
Sjsul web2.011
 
NASW Workshop: The Secret Life of Social Media
NASW Workshop: The Secret Life of Social MediaNASW Workshop: The Secret Life of Social Media
NASW Workshop: The Secret Life of Social Media
 
Tenure track socialmedia_10082010
Tenure track socialmedia_10082010Tenure track socialmedia_10082010
Tenure track socialmedia_10082010
 
Twitter
TwitterTwitter
Twitter
 

Kürzlich hochgeladen

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Kürzlich hochgeladen (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Proposal defense

  • 1. Dissertation proposal defense Xiaoju Zheng June 9, 2010 Life Cycle of #hashtags: “words” in the #Twittertopia 1
  • 2. Road Map of the Presentation Background of research questions Research Questions Overview of the data Follow-up Experiments Diffusion models Challenges 2
  • 3. The Laws of Imitation “why, given one hundred different innovations conceived of at the same time – innovations in the form of words, in mythological ideas, in industrial processes, etc. – ten will spread abroad, while ninety will be forgotten.” --- Gabriel Tarde (1903) “The Laws of Imitation” Merit of its own? Or something else? 3
  • 4. The Laws of Imitation “……..we see that the incessant struggle between minor linguistic inventions which always ends in the imitation of one of them, and in the abortion of the others, finally comes to transform a language in such a way as to adapt it, more or less rapidly and completely, according to the spirit of the community, to external realities and to the social purposes of language. ..” --- Gabriel Tarde (1903) “The Laws of Imitation” 4
  • 5. The Laws of Imitation Merit is not the only catalyst of the spread of an idea. In situations where “the poorest innovations, from the point of view of logic, are selected because of their place, or even date of birth.”, Tarde attributes these irrational occurrences to “extra-logical influences” 5
  • 6. Social Network Analysis Current research in social network analysis asserts that these “extra-logical” influences can be explained by examining the dynamics of the network through which influence is transmitted between individuals. In other words, if we view individuals as nodes in a social network, where a directed edge indicates that one node influences another, then some graph configurations make it more likely that an innovation will be widely adopted than others. 6
  • 7. Basic Research Questions How is a word created? What makes a newly-created word better than others? How is a newly-created word picked up by users at large? How does a word gain popularity among the population? In a word, the life cycle of a word. 7
  • 8. Research Question --- Data Word creation in EnglishIn spoken English, it can take decades – even centuries – for new words to emerge, become part of common parlance, and then fade into disuse. Word creation on Twitter,a word in the form of #hashtags can live the entire lifecycle in very short period of time, e.g. a couple of days A news story breaks, and competing hashtags vie for dominance. Then a few influential people adopt the same one. Suddenly the conversation coalesces around it, the term trends, the spammers start using it, and then the conversation peters out as we move on to the next topic. (only one possibility) Is that the pattern? And how closely does it map onto the ways that words and phrases emerge in spoken language? #hashtag – word on twitter 8
  • 9. Twitter Twitter.com:Twitter is a social networking and micro-blogging service that enables its users to send and read messages known as tweets. Tweets are text-based posts of up to 140 characters displayed on the author's profile page and delivered to the author's subscribers who are known as followers. 9
  • 10. Twitter: some conventions 10 @mentions - following word is the name of a twitter user and as such this tweet refers to that user, e.g. ”@dave thanks for the help” or ”Talking with @paul about twitter”. (can be used to spot smaller network) Retweets -”RT” means ”I am retweeting (copying) something from elsewhere”, e.g. ”RT@john I just saw Madonna” means that I am retweeting theoriginal message from John (can be used to spot smaller network) #hashtags –give contextual relevance to a tweet or identified as a keyword, e.g. ”Like this demo #acita09” or ”Why does #ms-word keep crashing”
  • 11. Top Influential People on Twitter the Edinburgh Twitter Corpus (around 2 billion tokens) 11 Six Singers
  • 12. Top 10 trending topic from the Edinburgh Twitter Corpus (around 2 billion tokens) 12
  • 13. 13
  • 14. Research Question Word creation and its propsperitywhat count as criteria for a newly-coined “word” to be accepted as a good #hashtag and how a good #hashtag gain popularity among groups of people. Logical: linguistic groundings of a good #hashtagLinguistic analysis of the #hashtags and behavioral studies Extra-logical: social groundings of a popular #hashtage.g. network structure and dynamics 14
  • 16. I. Linguistic Analysis of #hashtagsmore at https://docs.google.com/Doc?docid=0AWbvIzcQLhQXZGdoY256cDJfMTExZjZjbjNxbTM&hl=en Linking words into a sentence: e.g. whatsyourbackground, tweetwhatyoueatLetsMakeATrendingTopic,goodluckjustin Part of word + existing word: e.g. animtip,appstore Compounding: noun + noun e.g. sundayhug, pubquiz, waikikilunch Compounding: verb + noun e.g. hashtagme, pickon,killcapscop Compounding: adv + verb e.g. currentlycrushing Compounding: adj + noun e.g. digitalbritan, GoodTimes, morningsickness Splinter: e.g. socialmem (SocialCamp Memphis), Acronym & Initials: e.g. smlb (St Michael Le Belfrey church), emr (electronic medical records), #eu (European Union), #cah (Crimes against humanity) Neologism (splinter involved): e.g. twacker(twitter users who lose user account), tweetie (Twitter client for Mac and iPhone), twitvorce(to divorce yourself from a Twitter member by unfollowing them), twittertopia, twendsetter MISC: omgfact, #tcot (top conversation on twitter) 16
  • 17. Preliminary Analysis Public timeline: 20 tweets per minute 20 days of non-stop crawling Total tweets = 567,091 Total words = 8,495,323 Average words per tweet = 14.98 NPS Chat Corpus: 45010 tokens/6,066 types Webtext corpus in NLTK: 396,736 tokens/21,537 types 17
  • 18. Top 10 Frequent words 18
  • 19. Top 10 Frequent words 19
  • 20. Top 10 Frequent words 20
  • 21. Twitter presents a different genre of texts Self expression: "I" is the top-ranking word that tweets begin with. Stats update: "Watching", "trying", "listening", "reading" and "eating" are all in the Top 100 first words, revealing just how often people use Twitter to report on whatever they are experiencing at the time. News broadcast: The abbreviation "RT" (retweet) is extremely common 21
  • 22. Twitter presents a different genre of text popular web addresses (e.g. URL shortening service) among the top 500: "tinyurl.com", "twitpic.com", "ff.im", "twurl.nl". These all appear because they offer services useful to twitterers. Tech vocabulary: among top 500: "Google”, “Faceobok”:, “internet”, “website”, “blog”, “Mac”, and “app”. popular web addresses (e.g. URL shortening service) among the top 500: "tinyurl.com", "twitpic.com", "ff.im", "twurl.nl". These all appear because they offer services useful to twitterers. Tech vocabulary: among top 500: "Google”, “Faceobok”:, “internet”, “website”, “blog”, “Mac”, and “app”. 22
  • 23. Research Question 23 linguistic groundings of a good #hashtagLinguistic analysis of the #hashtags and behavioral studies social groundings of a popular #hashtage.g. network structure and dynamics Would linguistically equally good #hashtags have different degrees of popularity? Is it because of the different network structure? Behavioral studies to get quantitative measurement about linguistic goodness of #hashtags.
  • 24. Linguistic Grounding 24 Question 1: Does the tag length distribution of adopted #hashtag demonstrate a different distribution from words? Does it conform to a power law distribution or a lognormal distribution? Do #hashtags of different length receive different goodness judgement (e.g. are extremely short tags better than extremely short words?)
  • 25. Linguistic Grounding 25 Question 2: What are the linguistics processes of creating a #hashtag? What count as a good #hashtag (morphologically, phonotactically, and semantically)? A more qualitative analysis of the #hashtags needs to be done to design a metrics of analysis: e.g. compounding, splinter (of what kind)
  • 26. Linguistic Grounding – behavioral experiements 26 Word vs. Nonword: Subjects will be presented with #hashtags collected from twitter.com, and asked to label them as either word or nonword. Come up with specific criterion for word vs. nonword Morpheme identification: based on the results obtained from the Word vs. Nonword experiment, #hashtags will be presented for subjects to divide them into morphemes and identify meaningful subparts.
  • 27. Linguistic Grounding – behavioral experiements 27 Semantic transparency:word association game: for hashtags like “twitvorce”, subjects will be asked to provide free word associations. For instance, subjects are likely to provide “twitter” and “divorce” for the “twitvorce”.
  • 28. 28 Goodness rating: general: for both #hashtags, that are “nonwords”, subjects will provide subjective goodness ratings, e.g. on a scale from 1 to 7. phonotactic: subjects rate the pronouncability, e.g. for acronyms and initials.For instance, some acronyms are just strings of consonants without vowels, some are strings of vowels, and others are mixture of consonants and vowels. Would more pronouncable #hashtags be perceived as better #hashtags?
  • 29. Linguistic Grounding – statistical parser 29 Phonotactic likelihood: Develop a statistical parser (e.g. finite state machine) for #hashtags and words, and compare the phonotactic probability. Also compare the statistical parser with e.g. Vitevich (2004) model.
  • 30. Social Grounding 30 Based on the realistic data from twitter, diffusion models can be tested. Diffusion models:Linear Threshold ModelCascade Model
  • 31. The Threshold model Threshold Model.It says that people adopt a new behavior because a sufficiently large proportion of their friends have adopted that behavior. E.g. Early adopters have a very low threshold, say 5% or 10%, while late adopters would have a much higher threshold. Every person, however, has their own individual threshold. The key variable here is the initial distribution of thresholds across a social network, which describes in totality the final extent of the behavior. But this model says nothing about how people initially adopt behavior. That is, it says nothing about innovators or the things that are being invented, only about the spread of innovation through a social network. 31
  • 32. The Threshold model 32 In the threshold model every person u has a threshold :and each of their neighbors v is weighted according to: W u,v.If then the person u adopts the behavior. The set of thresholds, weights, and initial adopters determines the extent of the behavior in the social network.
  • 33. The Cascade Model Cascade Modelevery person has a chance of adopting a new behavior whenever one of their neighbors adopts it. The probability that a person adopts the new behavior is the conversion rate for the notification. This probability is both a function of the sender and the recipient, so more influential people are more likely to convince others to adopt a behavior. 33
  • 34. The Cascade Model 34 In the cascade model, for every person u and neighbor v there is a random variable X u,v which describes the likelihood of u adopting the behavior if v has adopted it.
  • 35. Diffusion Model 35 Threshold model: neighborhood densityadopt if enough friends do so.  Cascade Model: function of the sender and receiverpeople have a chance of doing something if one of their friends is doing it.
  • 36. Several Challenges at this step 36 Design a metrics for #hashtag classification ( e.g. p. 16): position of #hashtag, functions, word structure. Different #hashtag may have different adoption patterns and diffusion patterns. Quantitative measurement of “success” of a #hashtag: by frequency of mentioning, logevity (within a short or long time frame) Design a way to find competing, equally good #hashtags Representative sample
  • 37. Twitter Network: spot the right network Despite having large networks, a smaller circle is maintained: for users with a high number of followers, they actually only still communicate with a smaller subset of users. Where’s the value? Within the hidden network: find out the true influence model of who people really trust above all other users by looking at actual “@” behavior and follow behavior. 37
  • 39. 39

Hinweis der Redaktion

  1. Individualism vs. collectivism
  2. November 11th 2009 until February 1st 2010, 14G#Tcot: top conversation on twitter#mm: music monday
  3. D. Zhao and M. B. Rosson. How and why people twitter: the role thatmicro-blogging plays in informal communication at work. In Proceedings of theACM 2009 international conference on Supporting group work. ACM, 2009.C. Wilson, B. Boe, A. Sala, K. P. Puttaswamy, and B. Y. Zhao. User interactionsin social networks and their implications. In Proc. of the 4th ACM Europeanconference on Computer systems. ACM, 2009.J. Weng, E.-P. Lim, J. Jiang, and Q. He. Twitterrank: finding topic-sensitiveinfluential twitterers. In Proc. of the third ACM international conference on Websearch and data mining. ACM, 2010.
  4. Asur, S., and Huberman, B. A. (… )Predicting the Future with Social Mediahttp://www.hpl.hp.com/research/scl/papers/socialmedia/socialmedia.pdfBoyd D. Golder S., and Lotan G. (2010 ) Tweet, Tweet, Retweet: Conversational aspects of retweeting on Twitter. HICSS-43. IEEE: Kauai, HI, January 6.Honeycutt C. and Herring S.C. (2009) Beyond Microblogging: conversation and collaboration via twitter. Proceedings of the forty-second Hawai’I International conference system sciences (HICSS-42) Los Alamitos, CA: IEEE pressFocal Point (game theory) http://en.wikipedia.org/wiki/Focal_point_%28game_theory%29Steels L., and Kaplan F. (1999) Collective learning and semiotic dynamics. In D. Floreano and J.-D. Nicoud and F. Mondada, editors, Advances in Artificial Life: 5th European conference (ECAL 99), Lecture Notes in Artificial Intelligence, 1674, pp. 679-688, Berlin. J. Ke, J.W. Minett, A. Ching-Pong, W.S.-Y,Wang, Selforganization and selection in the emergence of vocabulary, Complexity 7, 41-54 (2002).
  5. The network: simply the network of your followers/followings. Those are the people whose updates you might be reading and who might be receiving your updates. This is the reach of your Twitter stream.The FOAF-network: the network of your followers/follwing’s networks. Those are the people you could potentially reach via retweeting messages. This is the extended reach of your Twitter stream.Asur, S., and Huberman, B. A. (… )Predicting the Future with Social Mediahttp://www.hpl.hp.com/research/scl/papers/socialmedia/socialmedia.pdfFriends and Followers are not the real network@mention and RT indicate a closer network@-conversation: within a one-hour period indicated that about 31% of tweets with @ received a response. (Honeycutt and Herring, 2009).
  6. Physicists in Germany claim to have developed a new computer model that can describe how human languages evolve over time. Dietrich Stauffer and Christian Schulze of Cologne University have taken techniques used by biologists to describe evolution and applied them to the rise and fall of languages. In particular they find that the size distribution of languages - a measure of the relative popularity of different languages - can be described by a nearly "log-normal" curve (arXiv.org/abs/cond-mat/0411162).All languages change over time, with some languages disappearing because they are not spoken by enough people. Stauffer and Schulze describe a particular language by a string of 8 or 16 bits, where each bit can equal 0 or 1, and start their simulations with one person speaking language zero (all bits equal to zero). Two languages are different from each other if they differ by at least one bit. The model works as follows: After a given time, this person produces one offspring who speaks a language that might differ from that spoken by their parent by one bit: the possibility of such a mutation occurring is governed by a probability p. The model also allows for the possibility of a person dying during any iteration: this is governed by a factor called the "carrying capacity" in biology. Lastly, it is also possible that the parent decides to start speaking a different language: this is determined by several factors including the carrying capacity and the fraction of the population who already speak that language. The Cologne physicists found that, for a sample of 10 million people, high mutation rates are needed to ensure that no single language dominates. This finding agrees with data on real languages, as does the prediction that the size distribution of languages is close to a "log-normal" distribution (see figure). "Our model is more realistic than other similar models we know of since it allows for numerous languages, instead of only two," say Stauffer and Schulze. "In these models, only one language survived because it was assumed to be superior to the other. We, on the other hand, have regarded all languages as being equally fit." However, it remains to be seen how the work will be received in the linguistics community. "Linguistics is a relatively new topic for physics and complex systems theory and any tentative way to understand and quantify it is useful and welcome," says Marco Patriarca of the Helsinki University of Technology. "However, while the model Stauffer and Christian Schulze is interesting and worth investigating, it also seems preliminary."