SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Czech Twitter
         as
a data mining source




           Josef Šlerka, WebExpo 2009
Twitter.com
Twitter is a free social networking and micro-
blogging service that enables its users to send and
read messages knows as tweets.

Tweets are text-based posts of up to 140 characters
displayed on the authorʼs profile page and delivered
to the authorʼs subscribers who are known as
followers
                                         (Wikipedia)
What is data
mining and how is
it connected with
Twitter?
Data mining is the process of extracting
patterns from data. As more data are gathered,
data mining is becoming an increasingly
important tool to transform there data into
information
                                   (Wikipedie)

Different variations would be text mining,
web mining including semantic analysis
Twitter Data mining


- makes it easy to use all data mining methods

- adds ʻʻtimeʼʼ & ʻʻspaceʼʼ

- provides real-time picture

- easy connects with other social media (about 30%
users have unique nickname for all platforms)
Data mining - different methods

- different variations of semantic distance of
similarities (Jaccard index)

- frequency analysis based on time (are people
happier in the morning or in the evening?)

- frequency analysis based on location

- one of the results -> identification of opinion
makers in the social networks
Transmission News
using different APIs to
get more information
Transmission News = 5 APIs in one
                  www. transnews.tw

•   5x Twitter News Service accounts
•   1x Yahoo Geo
•   1x Google Search AJAX
•   1x Google Maps
•   1x Open Calais
•   and a little bit of Wikipedia
www.transnews.tw
This brings us to the
downside of Twitter API
API searches are limited
to the number of
inquiries
Even worse, their data
doesnʼt go farther than
1.5 weeks in the past
Hence the development
of Sparrow 1.0
Czech Twitter by the
numbers
Sparrow 1.0
                          application methodology

- archives all tweets located in Czech republic in
  hourly interval via Twitter API (starting June 2009)

- automatically detects language

- identifies Czech tweets with word count dictionary

- compares Czech Twitter statistics with foreign
  countriesʼ statistics
Sparrow 1.0 - June 2009 stats
- about 700.000 tweets

- created by 10,628 unique users who enabled their
  geo-location (CZ) or tweeted in Czech
- 5.880 users tweeted at least once in Czech

- 2.424 Czech writing users revealed their geo-location
  (usually about 30% of users do that)
How many Twitter users are in the Czech republic?

    Between 6,000 - 8,000 users write in Czech

      1.000 až 2.000 users prefer English

                  There are about
         10,000 active Twitter users in CR
Whatʼs the Czech Twitter dynamics?

 Every four weeks the number of users with at
        least one tweet rises about 25%


The number of active users rises 3-5% each week


Absolute number of tweets rises about 25% too
What characteristics do Czech tweets have?



2 % are RT
4 % use a ʻʼ#ʼʼ
21.5 % represent reply and conversation
34.6 % includes a link
What languages do
people in the CR use for
tweeting?
Letʼs see that graph

English   Czech         Slovak   Deutsch   others



                  13%
            4%
           7%
                                 44%




                33%
Geo-location breakdown of Tweets among big cities in CR
                  (July-August 2009)                             6. Liberec 14178x
                                                                 en - 9561x ~ 67.44%
1. Praha 247685x
                                                                 cs - 2864x ~ 20.20%
en - 116580x ~ 47.07%
                                                                 sk - 462x ~ 3.26%
cs - 79957x ~ 32.28%    9 cities         Prague         others
sk - 16449x ~ 6.64%
                                                                 7. České Budějovice
                                                                 6219x
2. Brno 37021x
                                                                 cs - 2589x ~ 41.63%
en - 16104x ~ 43.50%
                                                                 en - 1386x ~ 22.29%
cs - 14753x ~ 39.85%
                                                                 es - 551x ~ 8.86%
sk - 3360x ~ 9.08%
                                                                 8. Hradec Králové
3. Ostrava 23836x
                                                                 11888x
en - 13885x ~ 58.25%                              25%            cs - 4696x ~ 39.50%
cs - 5306x ~ 22.26%                30%                           en - 4400x ~ 37.01%
pl - 1638x ~ 6.87%
                                                                 de - 1113x ~ 9.36%
4. Plzeň 13681x
                                                                 9. Ústí nad Labem
en - 9160x ~ 66.95%
                                                                 12016x
cs - 2206x ~ 16.12%
                                                                 en - 4266x ~ 35.50%
fr - 417x ~ 3.05%
                                                                 de - 2882x ~ 23.98%
                                                                 cs - 2570x ~ 21.39%
5. Olomouc 10754
en - 4619x ~ 42.95%
                                                                 10. Pardubice 5576x
cs - 3062x ~ 28.47%
                                                                 cs - 2718x ~ 48.74%
pt - 999x ~ 9.29%
                                          45%                    en - 1831x ~ 32.84%
                                                                 sk - 414x ~ 7.42%
And what about
ʻʻwhen?ʼʼ
And why does it
matter?
This is what weʼve learned in a few months:

- Czechs tweet most often on Tuesday or Thursday, and
the least in Saturday
 Around the world the most popular day is Tuesday, and the
least is Sunday

- The number of tweets rises steadily from the beginning to
the end of the month, then falls and begins rising again.
That means people tweet more at the end of the month
than at the beginning.
Prediction of the presence
Google vs. Twitter
MADONNA
IN PRAGUE
 13. 8. 2009
Madonna - August 2009 - Google search
Madonna - August 2009 - Czech Twitter
Sometimes Twitter is quicker & can predict future
                   searches
September 17th,
    Ostrava
Rammstein - August 2009 - Google search
Rammstein - August 2009 - Czech Twitter




                         17.9.2009
Thanks for your attention.
   Questions? Ideas?
   slerka@ataxo.com

Weitere ähnliche Inhalte

Andere mochten auch

Kaplan & Haenlein - The early bird catches the news nine things you should kn...
Kaplan & Haenlein - The early bird catches the news nine things you should kn...Kaplan & Haenlein - The early bird catches the news nine things you should kn...
Kaplan & Haenlein - The early bird catches the news nine things you should kn...ESCP Exchange
 
Mining Twitter to Understand Engineering Students' Experiences
Mining Twitter to Understand Engineering Students' ExperiencesMining Twitter to Understand Engineering Students' Experiences
Mining Twitter to Understand Engineering Students' ExperiencesXin Chen
 
A Sentiment-Based Approach to Twitter User Recommendation
A Sentiment-Based Approach to Twitter User RecommendationA Sentiment-Based Approach to Twitter User Recommendation
A Sentiment-Based Approach to Twitter User RecommendationDavide Feltoni Gurini
 
Tutorial on Relationship Mining In Online Social Networks
Tutorial on Relationship Mining In Online Social NetworksTutorial on Relationship Mining In Online Social Networks
Tutorial on Relationship Mining In Online Social Networkspjing2
 
#ANZMAC2014 Twitter Content Analysis Framework: Classification and Coding of ...
#ANZMAC2014 Twitter Content Analysis Framework: Classification and Coding of ...#ANZMAC2014 Twitter Content Analysis Framework: Classification and Coding of ...
#ANZMAC2014 Twitter Content Analysis Framework: Classification and Coding of ...Stephen Dann
 
Identifying rumours on Twitter
Identifying rumours on TwitterIdentifying rumours on Twitter
Identifying rumours on TwitterKim Holmberg
 
The Asset Consultancy_PPT _final
The Asset Consultancy_PPT _finalThe Asset Consultancy_PPT _final
The Asset Consultancy_PPT _finalRushin Naik
 
Presentation sdimi risks, challenges and benefits of social media 2011
Presentation sdimi risks, challenges and benefits of social media 2011Presentation sdimi risks, challenges and benefits of social media 2011
Presentation sdimi risks, challenges and benefits of social media 2011ZoeMM
 
DataSift Update - May 3rd 2011 - Devnest
DataSift Update - May 3rd 2011 - DevnestDataSift Update - May 3rd 2011 - Devnest
DataSift Update - May 3rd 2011 - DevnestOllie Parsley
 
Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester
Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester
Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester Hortonworks
 
Tweet alert - semantic analysis in social networks for citizen opinion mining
Tweet alert - semantic analysis in social networks for citizen opinion miningTweet alert - semantic analysis in social networks for citizen opinion mining
Tweet alert - semantic analysis in social networks for citizen opinion miningSngular Meaning
 
Demo or Die: Where advertising meets product design
Demo or Die: Where advertising meets product designDemo or Die: Where advertising meets product design
Demo or Die: Where advertising meets product designChristine Outram
 
Sentiment analyzer and opinion mining
Sentiment analyzer and opinion miningSentiment analyzer and opinion mining
Sentiment analyzer and opinion miningAnkush Mehta
 
Spatial Analytics, Where 2.0 2010
Spatial Analytics, Where 2.0 2010Spatial Analytics, Where 2.0 2010
Spatial Analytics, Where 2.0 2010Kevin Weil
 
Flume in 10minutes
Flume in 10minutesFlume in 10minutes
Flume in 10minutesdwmclary
 
Social Network Analysis - Twitter
Social Network Analysis - TwitterSocial Network Analysis - Twitter
Social Network Analysis - TwitterSocial Figures
 
Big Data at Twitter, Chirp 2010
Big Data at Twitter, Chirp 2010Big Data at Twitter, Chirp 2010
Big Data at Twitter, Chirp 2010Kevin Weil
 
Social media data for Social science research
Social media data for Social science researchSocial media data for Social science research
Social media data for Social science researchDavide Bennato
 
Slides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on TwitterSlides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on TwitterParang Saraf
 

Andere mochten auch (20)

Kaplan & Haenlein - The early bird catches the news nine things you should kn...
Kaplan & Haenlein - The early bird catches the news nine things you should kn...Kaplan & Haenlein - The early bird catches the news nine things you should kn...
Kaplan & Haenlein - The early bird catches the news nine things you should kn...
 
Mining Twitter to Understand Engineering Students' Experiences
Mining Twitter to Understand Engineering Students' ExperiencesMining Twitter to Understand Engineering Students' Experiences
Mining Twitter to Understand Engineering Students' Experiences
 
A Sentiment-Based Approach to Twitter User Recommendation
A Sentiment-Based Approach to Twitter User RecommendationA Sentiment-Based Approach to Twitter User Recommendation
A Sentiment-Based Approach to Twitter User Recommendation
 
Tutorial on Relationship Mining In Online Social Networks
Tutorial on Relationship Mining In Online Social NetworksTutorial on Relationship Mining In Online Social Networks
Tutorial on Relationship Mining In Online Social Networks
 
#ANZMAC2014 Twitter Content Analysis Framework: Classification and Coding of ...
#ANZMAC2014 Twitter Content Analysis Framework: Classification and Coding of ...#ANZMAC2014 Twitter Content Analysis Framework: Classification and Coding of ...
#ANZMAC2014 Twitter Content Analysis Framework: Classification and Coding of ...
 
Identifying rumours on Twitter
Identifying rumours on TwitterIdentifying rumours on Twitter
Identifying rumours on Twitter
 
The Asset Consultancy_PPT _final
The Asset Consultancy_PPT _finalThe Asset Consultancy_PPT _final
The Asset Consultancy_PPT _final
 
Presentation sdimi risks, challenges and benefits of social media 2011
Presentation sdimi risks, challenges and benefits of social media 2011Presentation sdimi risks, challenges and benefits of social media 2011
Presentation sdimi risks, challenges and benefits of social media 2011
 
DataSift Update - May 3rd 2011 - Devnest
DataSift Update - May 3rd 2011 - DevnestDataSift Update - May 3rd 2011 - Devnest
DataSift Update - May 3rd 2011 - Devnest
 
Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester
Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester
Demystify Big Data Breakfast Briefing: Martha Bennett, Forrester
 
Tweet alert - semantic analysis in social networks for citizen opinion mining
Tweet alert - semantic analysis in social networks for citizen opinion miningTweet alert - semantic analysis in social networks for citizen opinion mining
Tweet alert - semantic analysis in social networks for citizen opinion mining
 
Demo or Die: Where advertising meets product design
Demo or Die: Where advertising meets product designDemo or Die: Where advertising meets product design
Demo or Die: Where advertising meets product design
 
Sentiment analyzer and opinion mining
Sentiment analyzer and opinion miningSentiment analyzer and opinion mining
Sentiment analyzer and opinion mining
 
Spatial Analytics, Where 2.0 2010
Spatial Analytics, Where 2.0 2010Spatial Analytics, Where 2.0 2010
Spatial Analytics, Where 2.0 2010
 
Flume in 10minutes
Flume in 10minutesFlume in 10minutes
Flume in 10minutes
 
Social Network Analysis - Twitter
Social Network Analysis - TwitterSocial Network Analysis - Twitter
Social Network Analysis - Twitter
 
Big Data at Twitter, Chirp 2010
Big Data at Twitter, Chirp 2010Big Data at Twitter, Chirp 2010
Big Data at Twitter, Chirp 2010
 
Social media data for Social science research
Social media data for Social science researchSocial media data for Social science research
Social media data for Social science research
 
PPT FOR BIG
PPT FOR BIGPPT FOR BIG
PPT FOR BIG
 
Slides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on TwitterSlides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on Twitter
 

Mehr von Ataxo Group

Zaklady ppc reklamy
Zaklady ppc reklamyZaklady ppc reklamy
Zaklady ppc reklamyAtaxo Group
 
Česko-slovenský Facebook a Twitter v číslech
Česko-slovenský Facebook a Twitter v číslechČesko-slovenský Facebook a Twitter v číslech
Česko-slovenský Facebook a Twitter v číslechAtaxo Group
 
Barcamp Brno 2010
Barcamp Brno 2010Barcamp Brno 2010
Barcamp Brno 2010Ataxo Group
 
PPC Bidding Workshop at Ataxo
PPC Bidding Workshop at AtaxoPPC Bidding Workshop at Ataxo
PPC Bidding Workshop at AtaxoAtaxo Group
 
Výkonnostní marketing jako nedílná součást mediálního mixu
Výkonnostní marketing jako nedílná součást mediálního mixuVýkonnostní marketing jako nedílná součást mediálního mixu
Výkonnostní marketing jako nedílná součást mediálního mixuAtaxo Group
 
Modelovanie reklamnej kampaně
Modelovanie reklamnej kampaně Modelovanie reklamnej kampaně
Modelovanie reklamnej kampaně Ataxo Group
 
On Line Reputation Management
On Line Reputation ManagementOn Line Reputation Management
On Line Reputation ManagementAtaxo Group
 
Světový a český Twitter jako zdroj pro data mining
Světový a český Twitter jako  zdroj pro data miningSvětový a český Twitter jako  zdroj pro data mining
Světový a český Twitter jako zdroj pro data miningAtaxo Group
 
Online reputation management - Matěj Novák
Online reputation management - Matěj NovákOnline reputation management - Matěj Novák
Online reputation management - Matěj NovákAtaxo Group
 
E Commerce 2009 - Marcela Krzemień
E Commerce 2009  - Marcela KrzemieńE Commerce 2009  - Marcela Krzemień
E Commerce 2009 - Marcela KrzemieńAtaxo Group
 
Matěj Novák - pro neziskovky
Matěj Novák - pro neziskovkyMatěj Novák - pro neziskovky
Matěj Novák - pro neziskovkyAtaxo Group
 
SEMPO a trendy internetového marketingu
SEMPO a trendy internetového marketinguSEMPO a trendy internetového marketingu
SEMPO a trendy internetového marketinguAtaxo Group
 
Reputation Management On-line
Reputation Management On-lineReputation Management On-line
Reputation Management On-lineAtaxo Group
 
Internet in the Czech Republic
Internet in the Czech RepublicInternet in the Czech Republic
Internet in the Czech RepublicAtaxo Group
 
SEO for Yellow Pages Publishers
SEO for Yellow Pages PublishersSEO for Yellow Pages Publishers
SEO for Yellow Pages PublishersAtaxo Group
 
Case study: Optimalizace PPC kampaně
Case study: Optimalizace PPC kampaněCase study: Optimalizace PPC kampaně
Case study: Optimalizace PPC kampaněAtaxo Group
 
SEO @ Google University pro finanční segment
SEO @ Google University pro finanční segmentSEO @ Google University pro finanční segment
SEO @ Google University pro finanční segmentAtaxo Group
 
SEO – optimalizace pro vyhledávače
SEO – optimalizace pro vyhledávačeSEO – optimalizace pro vyhledávače
SEO – optimalizace pro vyhledávačeAtaxo Group
 

Mehr von Ataxo Group (19)

Zaklady ppc reklamy
Zaklady ppc reklamyZaklady ppc reklamy
Zaklady ppc reklamy
 
Česko-slovenský Facebook a Twitter v číslech
Česko-slovenský Facebook a Twitter v číslechČesko-slovenský Facebook a Twitter v číslech
Česko-slovenský Facebook a Twitter v číslech
 
Barcamp Brno 2010
Barcamp Brno 2010Barcamp Brno 2010
Barcamp Brno 2010
 
PPC Bidding Workshop at Ataxo
PPC Bidding Workshop at AtaxoPPC Bidding Workshop at Ataxo
PPC Bidding Workshop at Ataxo
 
Výkonnostní marketing jako nedílná součást mediálního mixu
Výkonnostní marketing jako nedílná součást mediálního mixuVýkonnostní marketing jako nedílná součást mediálního mixu
Výkonnostní marketing jako nedílná součást mediálního mixu
 
Modelovanie reklamnej kampaně
Modelovanie reklamnej kampaně Modelovanie reklamnej kampaně
Modelovanie reklamnej kampaně
 
On Line Reputation Management
On Line Reputation ManagementOn Line Reputation Management
On Line Reputation Management
 
Světový a český Twitter jako zdroj pro data mining
Světový a český Twitter jako  zdroj pro data miningSvětový a český Twitter jako  zdroj pro data mining
Světový a český Twitter jako zdroj pro data mining
 
e-commerce 2009
e-commerce 2009e-commerce 2009
e-commerce 2009
 
Online reputation management - Matěj Novák
Online reputation management - Matěj NovákOnline reputation management - Matěj Novák
Online reputation management - Matěj Novák
 
E Commerce 2009 - Marcela Krzemień
E Commerce 2009  - Marcela KrzemieńE Commerce 2009  - Marcela Krzemień
E Commerce 2009 - Marcela Krzemień
 
Matěj Novák - pro neziskovky
Matěj Novák - pro neziskovkyMatěj Novák - pro neziskovky
Matěj Novák - pro neziskovky
 
SEMPO a trendy internetového marketingu
SEMPO a trendy internetového marketinguSEMPO a trendy internetového marketingu
SEMPO a trendy internetového marketingu
 
Reputation Management On-line
Reputation Management On-lineReputation Management On-line
Reputation Management On-line
 
Internet in the Czech Republic
Internet in the Czech RepublicInternet in the Czech Republic
Internet in the Czech Republic
 
SEO for Yellow Pages Publishers
SEO for Yellow Pages PublishersSEO for Yellow Pages Publishers
SEO for Yellow Pages Publishers
 
Case study: Optimalizace PPC kampaně
Case study: Optimalizace PPC kampaněCase study: Optimalizace PPC kampaně
Case study: Optimalizace PPC kampaně
 
SEO @ Google University pro finanční segment
SEO @ Google University pro finanční segmentSEO @ Google University pro finanční segment
SEO @ Google University pro finanční segment
 
SEO – optimalizace pro vyhledávače
SEO – optimalizace pro vyhledávačeSEO – optimalizace pro vyhledávače
SEO – optimalizace pro vyhledávače
 

Kürzlich hochgeladen

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Kürzlich hochgeladen (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Twitter as a data mining source

  • 1. Czech Twitter as a data mining source Josef Šlerka, WebExpo 2009
  • 2. Twitter.com Twitter is a free social networking and micro- blogging service that enables its users to send and read messages knows as tweets. Tweets are text-based posts of up to 140 characters displayed on the authorʼs profile page and delivered to the authorʼs subscribers who are known as followers (Wikipedia)
  • 3. What is data mining and how is it connected with Twitter?
  • 4. Data mining is the process of extracting patterns from data. As more data are gathered, data mining is becoming an increasingly important tool to transform there data into information (Wikipedie) Different variations would be text mining, web mining including semantic analysis
  • 5. Twitter Data mining - makes it easy to use all data mining methods - adds ʻʻtimeʼʼ & ʻʻspaceʼʼ - provides real-time picture - easy connects with other social media (about 30% users have unique nickname for all platforms)
  • 6. Data mining - different methods - different variations of semantic distance of similarities (Jaccard index) - frequency analysis based on time (are people happier in the morning or in the evening?) - frequency analysis based on location - one of the results -> identification of opinion makers in the social networks
  • 7. Transmission News using different APIs to get more information
  • 8. Transmission News = 5 APIs in one www. transnews.tw • 5x Twitter News Service accounts • 1x Yahoo Geo • 1x Google Search AJAX • 1x Google Maps • 1x Open Calais • and a little bit of Wikipedia
  • 10. This brings us to the downside of Twitter API
  • 11. API searches are limited to the number of inquiries Even worse, their data doesnʼt go farther than 1.5 weeks in the past
  • 13. Czech Twitter by the numbers
  • 14. Sparrow 1.0 application methodology - archives all tweets located in Czech republic in hourly interval via Twitter API (starting June 2009) - automatically detects language - identifies Czech tweets with word count dictionary - compares Czech Twitter statistics with foreign countriesʼ statistics
  • 15. Sparrow 1.0 - June 2009 stats - about 700.000 tweets - created by 10,628 unique users who enabled their geo-location (CZ) or tweeted in Czech - 5.880 users tweeted at least once in Czech - 2.424 Czech writing users revealed their geo-location (usually about 30% of users do that)
  • 16. How many Twitter users are in the Czech republic? Between 6,000 - 8,000 users write in Czech 1.000 až 2.000 users prefer English There are about 10,000 active Twitter users in CR
  • 17. Whatʼs the Czech Twitter dynamics? Every four weeks the number of users with at least one tweet rises about 25% The number of active users rises 3-5% each week Absolute number of tweets rises about 25% too
  • 18. What characteristics do Czech tweets have? 2 % are RT 4 % use a ʻʼ#ʼʼ 21.5 % represent reply and conversation 34.6 % includes a link
  • 19. What languages do people in the CR use for tweeting?
  • 20. Letʼs see that graph English Czech Slovak Deutsch others 13% 4% 7% 44% 33%
  • 21. Geo-location breakdown of Tweets among big cities in CR (July-August 2009) 6. Liberec 14178x en - 9561x ~ 67.44% 1. Praha 247685x cs - 2864x ~ 20.20% en - 116580x ~ 47.07% sk - 462x ~ 3.26% cs - 79957x ~ 32.28% 9 cities Prague others sk - 16449x ~ 6.64% 7. České Budějovice 6219x 2. Brno 37021x cs - 2589x ~ 41.63% en - 16104x ~ 43.50% en - 1386x ~ 22.29% cs - 14753x ~ 39.85% es - 551x ~ 8.86% sk - 3360x ~ 9.08% 8. Hradec Králové 3. Ostrava 23836x 11888x en - 13885x ~ 58.25% 25% cs - 4696x ~ 39.50% cs - 5306x ~ 22.26% 30% en - 4400x ~ 37.01% pl - 1638x ~ 6.87% de - 1113x ~ 9.36% 4. Plzeň 13681x 9. Ústí nad Labem en - 9160x ~ 66.95% 12016x cs - 2206x ~ 16.12% en - 4266x ~ 35.50% fr - 417x ~ 3.05% de - 2882x ~ 23.98% cs - 2570x ~ 21.39% 5. Olomouc 10754 en - 4619x ~ 42.95% 10. Pardubice 5576x cs - 3062x ~ 28.47% cs - 2718x ~ 48.74% pt - 999x ~ 9.29% 45% en - 1831x ~ 32.84% sk - 414x ~ 7.42%
  • 22. And what about ʻʻwhen?ʼʼ And why does it matter?
  • 23. This is what weʼve learned in a few months: - Czechs tweet most often on Tuesday or Thursday, and the least in Saturday Around the world the most popular day is Tuesday, and the least is Sunday - The number of tweets rises steadily from the beginning to the end of the month, then falls and begins rising again. That means people tweet more at the end of the month than at the beginning.
  • 24. Prediction of the presence Google vs. Twitter
  • 26. Madonna - August 2009 - Google search
  • 27. Madonna - August 2009 - Czech Twitter
  • 28. Sometimes Twitter is quicker & can predict future searches
  • 29. September 17th, Ostrava
  • 30. Rammstein - August 2009 - Google search
  • 31. Rammstein - August 2009 - Czech Twitter 17.9.2009
  • 32. Thanks for your attention. Questions? Ideas? slerka@ataxo.com