This document summarizes a study analyzing social media influence and credibility. A team of students and professors extracted different types of data from Twitter, including mentions of users, tweets by authorities, and keywords. They developed a semantic parser to analyze tweet content using ontological semantic technology. An initial linear score was formulated to measure user influence, and network analysis identified pivotal users between communities. Validation will compare content analysis to real-world events to supplement credibility assessment. The study has potential applications in public policy, business, psychology, and traffic monitoring.
BUS 625 Week 4 Response to Discussion 2Guided Response Your.docx
Social media influence scoring and semantic analysis
1. Social Media Influence and Credibility
By Students and Professors of Texas A&M University
Acknowledgements
Our team would like to thank, Drs. Yates and
Hempelmann for their guidance and contributions. In
addition, The Bush School members and the State
department helped in good faith to analyze past research,
areas for further examination, and diverse methodologies
for working with large data sets.
Citations
*Authors listed in alphabetical order.
Future Considerations
While our team focused on a very unique and non-
traditional social media application, the progress made in
developing user metrics has great potential in other
areas. The combination of OST with statistical and
network concepts is particularly unique and has
significant promise as electronics communication
continues to increase. This exploratory research suggests
that there are many directions which future research and
applications might take. For instance:
• Public Policy: Twitter has been used recently during
tumultuous political periods to coordinate protests or
even violent acts. Predicting and preparing for such
events can increase awareness and safety.
• Business Tools: The ability to classify social media
users is an invaluable tool for businesses, as it offers
an accurate, cost effective way of reaching their
target consumers.
• Psychological : Determine meaning, and even
credibility, of text content. Large digital data sets can
help to understand and refine perceptions and trust.
Social behavior can be mapped on a scale never
before imagined.
• Real Time Traffic: Accident detection, Crime
detection, and geographic information can be tracking
in real time.
Abstract
Background
Twitter Data Extraction
Our Computer Science Team extracted various kinds of data from Twitter using a Python library called Tweepy. Four
major kinds of data were extracted:
Searching Twitter for “mentions” of a particular user
• Target: What the network is saying about a user
• Application: Rumors and News about player injuries
Pulling all of the tweets issued by a given set of Twitter authorities
• Target: What the user is saying
• Application: What a player, team , or an interested party is saying regarding injuries
Searching Twitter for combinations of keywords and returning the resulting tweets
• Target: Identify noteworthy word combinations and usages
• Application: Eliminate false positives and refine word ontology
Pulled entire follower data for a given set of users so as to help form a follower network
• Target: Identify communities of users to augment user influence and credibility scoring
• Application: Identify communities and find pivotal users who bridge groups
This methodology has been encoded and documented to allow quick implementation of command line prompts to
initiate quick data extraction.
Statistical Analysis
•
•
•
•
Semantic Parser
In order to analyze the content of tweets, to more fully gauge influence and in particular credibility of a user, a semantic parser was developed to quickly and
accurately track the most likely meanings of words in relation to the studied area. The methodology is based on Ontological Semantic Technology (OST) and can
establish the relevance of a tweet’s content. It works by following the computational steps below (Figure 5):
OST Knowledge Resources
OnSe Lexicon,
Proper Name DB
OnSe
InfoStore
Figure 1: Past Growth of Social Media
Scoring
This assessment will help to identify verified users, and highly active/effective users. The elements above were
used to formulate an initial linear score to measure a users influence (Figure 3). This figure shows that among
109 authoritative users selected by an informed user, a small few were vastly more influential than the majority.
This score will be augmented with network analysis that will help to identify pivotal users who bridge different
communities (Figure 4). This figure shows a user who merits further investigation.
Validation and verification to come. Predictive identification of verified users, pivotal users, or influential users
through third party statistics will be considered successful and influence future considerations. Influence is a
difficult quality to quantify but many third party measures are available to us; such as Klout, Kred, and Google
trends. In addition, injuries were tracked for four basketball teams during the end of the 2014 season: The
Chicago Bulls, The Dallas Mavericks, The Utah Jazz, and The New Orleans Pelicans. This ground truth will be
compared to content to supplement both the content and the non-content credibility verification.
Figure 3: Normalized Non-Content Related Influence Score
Figure 4: Pivotal Network Interconnections
Accessing the meaning of text in terms of
underlying concepts
• Goes beyond simple keywords on the
surface
For each content word in a sentence it:
• Grabs the underlying concepts for all
senses of the word
• Sets up a matrix for all sense
combinations
• Finds the shortest path in the
ontology graph for each sense
combination
• Declares overall shortest path the
correct meaning
If correct meaning uses a concept from
the sports injury subdomain of ontology
the tweet is declared a true positive and
can be evaluated for further credibility
Double-pass corpus-driven acquisition
(desired and undesired senses of all
relevant vocabulary) ontology of
basketball and injuries are in place
Language independent ontology
capturing world knowledge
• For quick ramp-up, the lexicon is
integrated into ontology Figure 5: Flow Representation of the Steps for our Ontological Semantic Technology Parser
Figure 2: Parallels Between the NBA and States
Industrial and Systems Engineering
Thomas Gray tomegray77@neo.tamu.edu
Tucker Truesdale tuckertruesdale92@gmail.com
Dr. Justin Yates (ISEN) jtyates@iemail.tamu.edu
Bush School
Alex Bitter bittermeister@gmail.com
Cheryl Landry clandry24@tamu.edu
John Mellusi johnmellusi@tamu.edu
Computer Science
Zachary Habersang zachhabersang@gmail.com
Nishaanth Narayanan nishaanthn@gmail.com
Commerce
Dr. Chris Hempelmann ontology@tamuc.edu
Todd Morris tmorris5@leomail.tamuc.edu