SlideShare a Scribd company logo
1 of 48
16 Decision Support and Business Intelligence Systems (9th
Edition) Instructor’s Manual
Chapter 7:
Text Analytics, Text Mining, and Sentiment Analysis
Learning Objectives for Chapter 7
1. Describe text mining and understand the need for text mining
2. Differentiate among text analytics, text mining, and data
mining
3. Understand the different application areas for text mining
4. Know the process of carrying out a text mining project
5. Appreciate the different methods to introduce structure to
text-based data
6. Describe sentiment analysis
7. Develop familiarity with popular applications of sentiment
analysis
8. Learn the common methods for sentiment analysis
9. Become familiar with speech analytics as it relates to
sentiment analysis
10. Learn three facets of Web analytics—content, structure, and
usage mining
11. Know social analytics including social media and social
network analyses
CHAPTER OVERVIEW
This chapter provides a comprehensive overview of text
analytics/mining and Web analytics/mining along with their
popular application areas such as search engines, sentiment
analysis, and social network/media analytics. As we have been
witnessing in recent years, the unstructured data generated over
the Internet of Things (IoT) (Web, sensor networks, radio-
frequency identification [RFID]–enabled supply chain systems,
surveillance networks, etc.) are increasing at an exponential
pace, and there is no indication of its slowing down. This
changing nature of data is forcing organizations to make text
and Web analytics a critical part of their business
intelligence/analytics infrastructure.
CHAPTER OUTLINE
7.1 Opening Vignette: Amadori Group Converts Consumer
Sentiments into
Near-Real-Time Sales
7.2 Text Analytics and Text Mining Overview
7.3 Natural Language Processing (NLP)
7.4 Text Mining Applications
7.5 Text Mining Process
7.6 Sentiment Analysis
7.7 Web Mining Overview
7.8 Search Engines
7.9 Web Usage Mining
7.10 Social Analytics
ANSWERS TO END OF SECTION REVIEW QUESTIONS( ( (
( ( (
Section 7.1 Review Questions
1. According to the vignette and based on your opinion, what
are the challenges that the food industry is facing today?
Student perceptions may vary, but some common themes related
to the challenges faced by the food industry could include the
changing nature and role of food in people’s lifestyles, the shift
towards pre-prepared or easily prepared food, and the growing
importance of marketing to keep customers interested in brands.
2. How can analytics help businesses in the food industry to
survive and thrive in this competitive marketplace?
Analytics can serve dual purposes by both tracking customer
interest in the brand as well as providing valuable feedback on
customer preferences. An analytics system can be used to
evaluate the traffic to various brand marketing campaigns
(website or social) that play a pivotal role in ensuring that
products are being shown to new potential buyers and reminding
existing customers of their value. An analytics system can also
be used to help gather customer feedback and perception
information on a brand in general or products in particular. This
valuable information can be used as a part of both marketing
and product design.
3. What were and still are the main objectives for Amadori to
embark into analytics? What were the results?
The company’s main objectives were to market more effectively
to potential customers and create direct communications
through social media and other channels with current customers
to start a dialogue. The case illustrates how an analytics system
integrated with thoughtful website design can help a company
meet these goals.
4. Can you think of other businesses in the food industry that
utilize analytics to become more competitive and customer
focused? If not, an Internet search could help find relevant
information to answer this question.
Student opinions and Web searches will vary, but will show
similar strategies for packaged foods as well as fast foods in the
US.
Section 7.2 Review Questions
1. What is text analytics? How does it differ from text mining?
Text analytics is a concept that includes information retrieval
(e.g., searching and identifying relevant documents for a given
set of key terms) as well as information extraction, data mining,
and Web mining. By contrast, text mining is primarily focused
on discovering new and useful knowledge from textual data
sources. The overarching goal for both text analytics and text
mining is to turn unstructured textual data into actionable
information through the application of natural language
processing (NLP) and analytics. However, text analytics is a
broader term because of its inclusion of information retrieval.
You can think of text analytics as a combination of information
retrieval plus text mining.
2. What is text mining? How does it differ from data mining?
Text mining is the application of data mining to unstructured, or
less structured, text files. As the names indicate, text mining
analyzes words; and data mining analyzes numeric data.
3. Why is the popularity of text mining as an analytics tool
increasing?
Text mining as a BI is increasing because of the rapid growth in
text data and availability of sophisticated BI tools. The benefits
of text mining are obvious in the areas where very large
amounts of textual data are being generated, such as law (court
orders), academic research (research articles), finance
(quarterly reports), medicine (discharge summaries), biology
(molecular interactions), technology (patent files), and
marketing (customer comments).
4. What are some popular application areas of text mining?
· Information extraction. Identification of key phrases and
relationships within text by looking for predefined sequences in
text via pattern matching.
· Topic tracking. Based on a user profile and documents that a
user views, text mining can predict other documents of interest
to the user.
· Summarization. Summarizing a document to save time on the
part of the reader.
· Categorization. Identifying the main themes of a document and
then placing the document into a predefined set of categories
based on those themes.
· Clustering. Grouping similar documents without having a
predefined set of categories.
· Concept linking. Connects related documents by identifying
their shared concepts and, by doing so, helps users find
information that they perhaps would not have found using
traditional search methods.
· Question answering. Finding the best answer to a given
question through knowledge-driven pattern matching.
Section 7.3 Review Questions
1.
What is NLP?
Natural language processing (NLP) is an important component
of text mining and is a subfield of artificial intelligence and
computational linguistics. It studies the problem of
“understanding” the natural human language, with the view of
converting depictions of human language (such as textual
documents) into more formal representations (in the form of
numeric and symbolic data) that are easier for computer
programs to manipulate.
2.
How does NLP relate to text mining?
Text mining uses natural language processing to induce
structure into the text collection and then uses data mining
algorithms such as classification, clustering, association, and
sequence discovery to extract knowledge from it.
3.
What are some of the benefits and challenges of NLP?
NLP moves beyond syntax-driven text manipulation (which is
often called “word counting”) to a true understanding and
processing of natural language that considers grammatical and
semantic constraints as well as the context. The challenges
include:
· Part-of-speech tagging. It is difficult to mark up terms in a
text as corresponding to a particular part of speech because the
part of speech depends not only on the definition of the term but
also on the context within which it is used.
· Text segmentation. Some written languages, such as Chinese,
Japanese, and Thai, do not have single-word boundaries.
· Word sense disambiguation. Many words have more than one
meaning. Selecting the meaning that makes the most sense can
only be accomplished by taking into account the context within
which the word is used.
· Syntactic ambiguity. The grammar for natural languages is
ambiguous; that is, multiple possible sentence structures often
need to be considered. Choosing the most appropriate structure
usually requires a fusion of semantic and contextual
information.
· Imperfect or irregular input. Foreign or regional accents and
vocal impediments in speech and typographical or grammatical
errors in texts make the processing of the language an even
more difficult task.
· Speech acts. A sentence can often be considered an action by
the speaker. The sentence structure alone may not contain
enough information to define this action.
4.
What are the most common tasks addressed by NLP?
Following are among the most popular tasks:
• Question answering.
• Automatic summarization.
• Natural language generation.
• Natural language understanding.
• Machine translation.
• Foreign language reading.
• Foreign language writing.
• Speech recognition.
• Text-to-speech.
• Text proofing.
• Optical character recognition.
Section 7.4 Review Questions
5. List and briefly discuss some of the text mining applications
in marketing.
Text mining can be used to increase cross-selling and up-selling
by analyzing the unstructured data generated by call centers.
Text mining has become invaluable for customer relationship
management. Companies can use text mining to analyze rich
sets of unstructured text data, combined with the relevant
structured data extracted from organizational databases, to
predict customer perceptions and subsequent purchasing
behavior.
6. How can text mining be used in security and
counterterrorism?
Students may use the introductory case in this answer.
In 2007, EUROPOL developed an integrated system capable of
accessing, storing, and analyzing vast amounts of structured and
unstructured data sources in order to track transnational
organized crime.
Another security-related application of text mining is in the area
of deception detection.
7. What are some promising text mining applications in
biomedicine?
As in any other experimental approach, it is necessary to
analyze this vast amount of data in the context of previously
known information about the biological entities under study.
The literature is a particularly valuable source of information
for experiment validation and interpretation. Therefore, the
development of automated text mining tools to assist in such
interpretation is one of the main challenges in current
bioinformatics research.
Section 7.5 Review Questions
8. What are the main steps in the text mining process?
See Figure 7.6 (p. 309). Text mining entails three tasks:
· Establish the Corpus: Collect and organize the domain-
specific unstructured data
· Create the Term–Document Matrix: Introduce structure to the
corpus
· Extract Knowledge: Discover novel patterns from the T-D
matrix
9. What is the reason for normalizing word frequencies? What
are the common methods for normalizing word frequencies?
The raw indices need to be normalized in order to have a more
consistent TDM for further analysis. Common methods are log
frequencies, binary frequencies, and inverse document
frequencies.
10. What is SVD? How is it used in text mining?
Singular value decomposition (SVD), which is closely related to
principal components analysis, reduces the overall
dimensionality of the input matrix (number of input documents
by number of extracted terms) to a lower dimensional space,
where each consecutive dimension represents the largest degree
of variability (between words and documents) possible
11. What are the main knowledge extraction methods from
corpus?
The main categories of knowledge extraction methods are
classification, clustering, association, and trend analysis.
Section 7.6 Review Questions
12. What is sentiment analysis? How does it relate to text
mining?
Sentiment analysis tries to answer the question, “What do
people feel about a certain topic?” by digging into opinions of
many using a variety of automated tools. It is also known as
opinion mining, subjectivity analysis, and appraisal extraction
Sentiment analysis shares many characteristics and techniques
with text mining. However, unlike text mining, which
categorizes text by conceptual taxonomies of topics, sentiment
classification generally deals with two classes (positive versus
negative), a range of polarity (e.g., star ratings for movies), or a
range in strength of opinion.
13. What are the most popular application areas for sentiment
analysis? Why?
Customer relationship management (CRM) and customer
experience management are popular “voice of the customer
(VOC)” applications. Other application areas include “voice of
the market (VOM)” and “voice of the employee (VOE).”
14. What would be the expected benefits and beneficiaries of
sentiment analysis in politics?
Opinions matter a great deal in politics. Because political
discussions are dominated by quotes, sarcasm, and complex
references to persons, organizations, and ideas, politics is one
of the most difficult, and potentially fruitful, areas for
sentiment analysis. By analyzing the sentiment on election
forums, one may predict who is more likely to win or lose.
Sentiment analysis can help understand what voters are thinking
and can clarify a candidate’s position on issues. Sentiment
analysis can help political organizations, campaigns, and news
analysts to better understand which issues and positions matter
the most to voters. The technology was successfully applied by
both parties to the 2008 and 2012 American presidential
election campaigns.
15. What are the main steps in carrying out sentiment analysis
projects?
The first step when performing sentiment analysis of a text
document is called sentiment detection, during which text data
is differentiated between fact and opinion (objective vs.
subjective). This is followed by negative-positive (N-P) polarity
classification, where a subjective text item is classified on a
bipolar range. Following this comes target identification
(identifying the person, product, event, etc. that the sentiment is
about). Finally come collection and aggregation, in which the
overall sentiment for the document is calculated based on the
calculations of sentiments of individual phrases and words from
the first three steps.
16. What are the two common methods for polarity
identification? What is the main difference between the two?
Polarity identification can be done via a lexicon (as a reference
library) or by using a collection of training documents and
inductive machine learning algorithms. The lexicon approach
uses a catalog of words, their synonyms, and their meanings,
combined with numerical ratings indicating the position on the
N-P polarity associated with these words. In this way, affective,
emotional, and attitudinal phrases can be classified according to
their degree of positivity or negativity. By contrast, the
training-document approach uses statistical analysis and
machine learning algorithms, such as neural networks,
clustering approaches, and decision trees to ascertain the
sentiment for a new text document based on patterns from
previous “training” documents with assigned sentiment scores.
Section 7.7 Review Questions
17. What are some of the main challenges the Web poses for
knowledge discovery?
• The Web is too big for effective data mining.
• The Web is too complex.
• The Web is too dynamic.
• The Web is not specific to a domain.
• The Web has everything.
18. What is Web mining? How does it differ from regular data
mining or text mining?
Web mining is the discovery and analysis of interesting and
useful information from the Web and about the Web, usually
through Web-based tools. Text mining is less structured because
it’s based on words instead of numeric data.
19. What are the three main areas of Web mining?
The three main areas of Web mining are Web content mining,
Web structure mining, and Web usage (or activity) mining.
20. Identify three application areas for Web mining (at the
bottom of Figure 8.1). Based on your own experiences,
comment on their use cases in business settings.
(Since there are several application areas, this answer will vary
for different students. Following is one possible answer.)
Three possible application areas for Web mining include
sentiment analysis, clickstream analysis, and customer
analytics. Clickstream analysis helps to better understand user
behavior on a website. Sentiment analysis helps us understand
the opinions and affective state of users on a system. Customer
analytics helps to provide solutions for sales, service,
marketing, and product teams, and optimize the customer life
cycles. The use cases for these applications center on user
experience, and primarily affect customer service and customer
relationship management functions of an organization.
21. What is Web content mining? How can it be used for
competitive advantage?
Web content mining refers to the extraction of useful
information from Web pages. The documents may be extracted
in some machine-readable format so that automated techniques
can generate some information about the Web pages. Collecting
and mining Web content can be used for competitive
intelligence (collecting intelligence about competitors’
products, services, and customers), which can give your
organization a competitive advantage.
22. What is Web structure mining? How does it differ from Web
content mining?
Web structure mining is the process of extracting useful
information from the links embedded in Web documents. By
contrast, Web content mining involves analysis of the specific
textual content of web pages. So, Web structure mining is more
related to navigation through a website, whereas Web content
mining is more related to text mining and the document
hierarchy of a particular web page.
Section 7.8 Review Questions
1. What is a search engine? Why are search engines critically
important for today’s businesses?
A search engine is a software program that searches for
documents (Internet sites or files) based on the keywords
(individual words, multi-word terms, or a complete sentence)
that users have provided that have to do with the subject of their
inquiry. This is the most prominent type of information retrieval
system for finding relevant content on the Web. Search engines
have become the centerpiece of most Internet-based transactions
and other activities. Because people use them extensively to
learn about products and services, it is very important for
companies to have prominent visibility on the Web; hence the
major effort of companies to enhance their search engine
optimization (SEO).
2. What is a Web crawler? What is it used for? How does it
work?
A Web crawler (also called a spider or a Web spider) is a piece
of software that systematically browses (crawls through) the
World Wide Web for the purpose of finding and fetching Web
pages. It starts with a list of “seed” URLs, goes to the pages of
those URLs, and then follows each page’s hyperlinks, adding
them to the search engine’s database. Thus, the Web crawler
navigates through the Web in order to construct the database of
websites.
3. What is “search engine optimization”? Who benefits from it?
Search engine optimization (SEO) is the intentional activity of
affecting the visibility of an e-commerce site or a website in a
search engine’s natural (unpaid or organic) search results. It
involves editing a page’s content, HTML, metadata, and
associated coding to both increase its relevance to specific
keywords and to remove barriers to the indexing activities of
search engines. In addition, SEO efforts include promoting a
site to increase its number of inbound links. SEO primarily
benefits companies with e-commerce sites by making their
pages appear toward the top of search engine lists when users
query.
4. What things can help Web pages rank higher in search engine
results?
Cross-linking between pages of the same website to provide
more links to the most important pages may improve its
visibility. Writing content that includes frequently searched
keyword phrases, so as to be relevant to a wide variety of search
queries, will tend to increase traffic. Updating content so as to
keep search engines crawling back frequently can give
additional weight to a site. Adding relevant keywords to a Web
page’s metadata, including the title tag and metadescription,
will tend to improve the relevancy of a site’s search listings,
thus increasing traffic. URL normalization of Web pages so that
they are accessible via multiple URLs. Using canonical link
elements and redirects can help make sure links to different
versions of the URL all count toward the page’s link popularity
score.
Section 7.9 Review Questions
1. What are the three types of data generated through Web page
visits?
· Automatically generated data stored in server access logs,
referrer logs, agent logs, and client-side cookies
· User profiles
· Metadata, such as page attributes, content attributes, and
usage data.
2. What is clickstream analysis? What is it used for?
Analysis of the information collected by Web servers can help
us better understand user behavior. Analysis of this data is often
called clickstream analysis. By using the data and text mining
techniques, a company might be able to discern interesting
patterns from the clickstreams.
3. What are the main applications of Web mining?
· Determine the lifetime value of clients.
· Design cross-marketing strategies across products.
· Evaluate promotional campaigns.
· Target electronic ads and coupons at user groups based on user
access patterns.
· Predict user behavior based on previously learned rules and
users’ profiles.
· Present dynamic information to users based on their interests
and profiles.
4. What are commonly used Web analytics metrics? What is the
importance of metrics?
There are four main categories of Web analytic metrics:
· Website usability: How were they using my website? These
involve page views, time on site, downloads, click map, and
click paths.
· Traffic sources: Where did they come from? These include
referral websites, search engines, direct, offline campaigns, and
online campaigns.
· Visitor profiles: What do my visitors look like? These include
keywords, content groupings, geography, time of day, and
landing page profiles.
· Conversion statistics: What does all this mean for the
business? Metrics include new visitors, returning visitors, leads,
sales/conversions, and abandonments.
These metrics are important because they provide access to a lot
of valuable marketing data, which can be leveraged for better
insights to grow your business and better document your ROI.
The insight and intelligence gained from Web analytics can be
used to effectively manage the marketing efforts of an
organization and its various products or services.
Section 7.10 Review Questions
1. What is meant by social analytics? Why is it an important
business topic?
From a philosophical perspective, social analytics focuses on a
theoretical object called a “socius,” a kind of “commonness”
that is neither a universal account nor a communality shared by
every member of a body. Thus, social analytics in this sense
attempts to articulate the differences between philosophy and
sociology. From a BI perspective, social analytics involves
“monitoring, analyzing, measuring and interpreting digital
interactions and relationships of people, topics, ideas and
content.” In this perspective, social analytics involves mining
the textual content created in social media (e.g., sentiment
analysis, natural language processing) and analyzing socially
established networks (e.g., influencer identification, profiling,
prediction). This is an important business topic because it helps
companies gain insight about existing and potential customers’
current and future behaviors, and about the likes and dislikes
toward a firm’s products and services.
2. What is a social network? What is the need for SNA?
A social network is a social structure composed of
individuals/people (or groups of individuals or organizations)
linked to one another with some type of
connections/relationships. Social network analysis (SNA) is the
systematic examination of social networks. Dating back to the
1950s, social network analysis is an interdisciplinary field that
emerged from social psychology, sociology, statistics, and
graph (network) theory.
3. What is social media? How does it relate to Web 2.0?
Social media refers to the enabling technologies of social
interactions among people in which they create, share, and
exchange information, ideas, and opinions in virtual
communities and networks. It is a group of Internet-based
software applications that build on the ideological and
technological foundations of Web 2.0, and that allow the
creation and exchange of user-generated content.
4. What is social media analytics? What are the reasons behind
its increasing popularity?
Social media analytics refers to the systematic and scientific
ways to consume the vast amount of content created by Web-
based social media outlets, tools, and techniques for the
betterment of an organization’s competitiveness. Data includes
anything posted in a social media site.
5. How can you measure the impact of social media analytics?
First, determine what your social media goals are. From here,
you can use analysis tools such as descriptive analytics, social
network analysis, and advanced (predictive, text examining
content in online conversations), and ultimately prescriptive
analytics tools.
ANSWERS TO APPLICATION CASE QUESTIONS FOR
DISCUSSION( (
Application Case 7.1: Netflix: Using Big Data to Drive Big
Engagement: Unlocking the Power of Analytics to Drive
Content and Consumer Insight
1. What does Netflix do? How did they evolve into this current
business model?
Netflix is a provider and creator of streaming digital content for
end-users. The company began as a subscription-based DVD
rental company and expanded into digital streaming. Changes in
the digital streaming market made it more economically
beneficial for the company to also expand into content creation.
2. In the case of Netflix, what was it meant to be data-driven
and customer-focused?
Four Netflix, being data-driven means that decisions on new
content being created or licensed (as well as maintaining
existing licenses) should be based on data directly related to
customer preferences (based on actual use as well as preferred
genres). The decision on how to spend these limited licensing
dollars must be made to create the greatest benefit for the
greatest number of users in order to maintain current
subscriptions as well as drive new sign-ups. This focus on
customer demands also meets the aspect of customer focus.
3. How did Netflix use Teradata technologies in its analytics
endeavors?
The Teradata solution provides two distinct advantages for
Netflix. The first is the ability to use an existing, robust cloud-
based system to perform analytics functions. This provides
security both in terms of redundancy as well as confidence in
the technology itself. The second is the ability to use Teradata’s
robust discovery engine to perform analytics functions to create
a better understanding of customer types and their preferences.
Application Case 7.2: AMC Networks Is Using Analytics to
Capture New Viewers, Predict Ratings, and Add Value for
Advertisers in a Multichannel World
1. What are the common challenges that broadcasting companies
are facing today? How can analytics help to alleviate these
challenges?
Broadcasters are faced with the need to maintain the attention
of viewers by providing quality program that meets their current
desires, desires that are always in flux. An analytics system can
help to alleviate these challenges by evaluating viewers current
tastes and desires for future programming.
2. How did AMC leverage analytics to enhance its business
performance?
AMC has been able to successfully leverage analytics to better
understand their customers and use this understanding to market
effectively to them as well as ensure that they provide content
that meets viewers desires. These analytics systems allow the
broadcaster to better understand what different customers want
in specific groupings, but also how to market their services
effectively to individuals within these groupings.
3. What were the types of text analytics and text minisolutions
developed by AMC networks? Can you think of other potential
uses of text mining applications in the broadcasting industry?
These details are not discussed in the case, but students may be
able to identify some of this information through independent
research. In general, the analytics being used incorporate data
from internal and external sources to determine viewership of
individual programs as well as purchases of digital programs.
Application Case 7.3: Mining for Lies
1.
Why is it difficult to detect deception?
Humans tend to perform poorly at deception-detection tasks.
This phenomenon is exacerbated in text-based communications.
Although some people believe that they can readily identify
those who are not being truthful, a summary of deception
research showed that, on average, people are only 54 percent
accurate in making veracity determinations.
2.
How can text/data mining be used to detect deception in text?
Through a process known as message feature mining, statements
are transcribed for processing, then cues are extracted and
selected. Text processing software identifies cues in statements
and generates quantified cues. Classification models are trained
and tested on quantified cues, and based on this, statements are
labeled as truthful or deceptive (e.g., by law enforcement
personnel). The feature-selection methods along with 10-fold
cross-validation allow researchers to compare the prediction
accuracy of different data mining methods (for example, neural
networks).
3.
What do you think are the main challenges for such an
automated system?
One challenge is that the training the system depends on humans
to ascertain the truthfulness of statements in the training data
itself. You can’t know for sure whether these statements are
true or false, so you may be using incorrect training samples
when “teaching” the machine learning system to predict lies in
new text data. (This answer will vary by student.)
Application Case 7.4: The Magic Behind the Magic: Instant
Access to Information Helps the Orlando Magic Up their Game
and the Fan’s Experience
1. According to the application case, what were the main
challenges the Orlando Magic was facing?
The primary challenges are both financial, with a need to
maximize both ticket sales and concessions/merchandising.
These challenges can be addressed by tailoring services to meet
the needs of customers.
2. How did analytics help the Orlando Magic to overcome some
of its most significant challenges on and off the court?
In order to meet the challenge of ensuring that season ticket
purchases continue, the team used a predictive analysis decision
tree model to identify customers that may not renew, and those
individuals could receive additional sales and marketing
attention. Additionally, coaching staff use analytic tools to
better understand all aspects of the game itself including player
strengths and weaknesses and opponent strategies.
3. Can you think of other uses of analytics in sports and
especially in the case of the Orlando Magic? You can search the
Web to find some answers to this question.
Student research and responses will vary but may include the
idea to use analytics when selecting new players or determining
if players ready for contract renewal should be retained.
Application Case 7.5: Research Literature Survey with Text
Mining
1.
How can text mining be used to ease the task of literature
review?
Text mining enables a semiautomated analysis of large volumes
of published literature. Clustering was used in this study to
identify the natural groupings of the articles and list the most
descriptive terms that characterized those clusters. This led to
discovery and exploration of interesting patterns using tabular
and graphical representation of their findings. Use of text and
data mining can thus speed up and simplify the literature review
process for academic researchers.
2.
What are the common outcomes of a text mining project on a
specific collection of journal articles? Can you think of other
potential outcomes not mentioned in this case?
Common outcomes include identifying natural clusters of
similar articles, helping to identify the optimal number of
cluster classifications. Using text mining, you can answer
questions such as “Are there clusters that represent different
research themes specific to a single journal?” and “Is there a
time-varying characterization of those clusters?” Text mining
also has other possible applications in literature reviews. For
example, sentiment analysis can help to identify positive and
negative judgments. Text mining can be used to build
taxonomies of concepts and terms within and between research
articles. You can find common themes by author as well as by
journal.
Application Case 7.6: Creating a Unique Digital Experience to
Capture Moments That Matter at Wimbledon
1. How did Wimbledon use analytics capabilities to enhance
viewers’ experience?
Wimbledon undertook a number of analytics projects in order to
enhance the viewer experience for the tournament. One project
was a redesign of the website for the tournament. Based on their
understanding of current users they were able to determine that
the majority of users were still accessing the site using desktop
browsers, and so were able to work on optimizing the
experience for that platform (as well as accommodating the
growth in mobile users). Next they were able to tap in to a huge
amount of data that was being collected or had been collected
over time. Data and analysis from the match came in in real
time and was analyzed and displayed as information to viewers.
Additionally, NLP systems allowed the tournament to search its
vast archives of facts and details that could be retrieved and
presented in real time. Finally, Wimbledon used their
understanding of the potential demand for this service to make
appropriate decisions about housing both the broadcasting and
analytics portions of the service in the cloud to provide the
necessary bandwidth and security.
2. What were the challenges, proposed solution, and obtained
results?
Many of the challenges were technical, but those were overcome
through the use of existing, trusted IT partners to provide
services and create a unique digital presence. The results were
very positive, with a significant number of online visitors and a
98% growth in total visits over the previous year.
Application Case 7.7: Delivering Individualized Content and
Driving Digital Engagement: How Barbour Collected More
Than 49,000 New Leads in One Month with Teradata Interactive
1. What does Barbour do? What was the challenge Barbour was
facing?
Barbour is an English heritage and lifestyle brand renowned for
waterproof outerwear that has recently expanded into other
luxury fashion goods. The company’s challenge was
establishing a direct relationship with its customers because in
the past all sales had been made through distributors. While in
e-commerce site was launched in 2013, connecting with
customers in a crowded digital market space was difficult.
2. What was the proposed analytics solution?
The company worked with partner Teradata to create a lead
nurturing program. The goal was both to drive immediate sales
as well as to gather information that could create more
meaningful long-term relationships with customers. The
company had access to historical customer information that was
generally seen as incomplete. Teradata was able to begin with
this information and create marketing campaigns that also
collected additional data that allowed for future, more
personalized content to be driven to customers through a
customer lifecycle program. These activities were integrated
with existing marketing and social media campaigns.
3. What were the results?
The results were very positive and over a one-month period the
company was able to collect more than 49,700 leads within their
primary European regions. In addition to very positive
responses (60% click through rates) the company was also able
to collect and begin analysis on additional customer data that
can be used in the lifecycle program.
Application Case 7.8: Tito’s Vodka Establishes Brand Loyalty
with an Authentic Social Strategy
1. How can social media analytics be used in the consumer
products industry?
Social media analytics can be incredibly useful for consumer
products because it allows the producers/retailers to better
understand their customers and connect with them individually
based on their preferences.
2. What do you think are the key challenges, potential solutions,
and probable results in applying social media analytics in
consumer products and services firms?
It’s
Student opinions will vary, but responses will focus on:
· Challenges - crowded market spaces with data in multiple
formats from multiple sources
· potential solutions - focus on analytics and the ability to better
understand customer needs at both the aggregate and individual
level
· probable results - can be very positive if analytic efforts are
successful in identifying customer preferences and targeting
individual customers or unique groups with marketing or social
media content that drives interest in the brand
ANSWERS TO END OF CHAPTER QUESTIONS FOR
DISCUSSION( ( (
1. Explain the relationship among data mining, text mining, and
sentiment analysis.
Technically speaking, data mining is a process that uses
statistical, mathematical, and artificial intelligence techniques
to extract and identify useful information and subsequent
knowledge (or patterns) from large sets of data. Data mining is
the general concept. Text mining is a specific application of
data mining: applying it to unstructured text files. Sentiment
analysis is a specialized form of text and data mining that
identifies and classifies terms in text sources according to
sentiment (e.g. judgment, opinion, and emotional content).
2. In your own words, define text mining, and discuss its most
popular applications.
Students’ answers will vary.
3. What does it mean to induce structure into text-based data?
Discuss the alternative ways of inducing structure into them.
Text mining, like other data mining approaches, are inductive
approaches for finding patterns and trends in data. One
difference between text mining and other data mining
approaches is the use of natural language processing.
Text-based data is inherently unstructured and must be
converted to a structured format for predictive modeling or
other type of analysis. The third step of the 3-step text mining
process utilizes inductive algorithms to classify and structure
the corpus of text sources so that knowledge can be extracted.
Four possible approaches for inducing structure in text in order
to extract knowledge are (a) classification (grouping terms into
predefined categories), (b) clustering (coming up with “natural”
groupings), (c) association rule learning (finding frequent
combinations of terms), and (d) trend analysis (recognizing
concept distributions based on specific collections of
documents).
4. What is the role of NLP in text mining? Discuss the
capabilities and limitations of NLP in the context of text
mining.
NLP is an important component of text mining. It studies the
problem of “understanding” the natural human language, with
the view of converting depictions of human language (such as
textual documents) into more formal representations (in the
form of numeric and symbolic data) that are easier for computer
programs to manipulate. The goal of NLP is to move beyond
syntax-driven text manipulation (which is often called “word
counting”) to a true understanding and processing of natural
language that considers grammatical and semantic constraints as
well as the context.
5. List and discuss three prominent application areas for text
mining. What is the common theme among the three application
areas you chose?
National security and counterterrorism: The FBI’s system is
expected to create a gigantic data warehouse along with a
variety of data and text mining modules to meet the knowledge
discovery needs of federal, state, and local law enforcement
agencies.
Biomedical: A system extracts disease–gene relationships from
literature accessed via MEDLINE.
Marketing: Text mining can be used to increase cross-selling
and up-selling by analyzing the unstructured data generated by
call centers.
The value of all of these applications is significantly increased
by extracting knowledge from huge volumes of text-data and
documents through the use of text mining tools.
6. What is sentiment analysis? How does it relate to text
mining?
Sentiment analysis tries to answer the question, “What do
people feel about a certain topic?” by digging into opinions of
many using a variety of automated tools. It is also known as
opinion mining, subjectivity analysis, and appraisal extraction.
Sentiment analysis shares many characteristics and techniques
with text mining. However, unlike text mining, which
categorizes text by conceptual taxonomies of topics, sentiment
classification generally deals with two classes (positive versus
negative), a range of polarity (e.g., star ratings for movies), or a
range in strength of opinion.
7. What are the common challenges with which sentiment
analysis deals?
Sentiment that appears in text comes in two flavors: explicit,
where the subjective sentence directly expresses an opinion
(“It’s a wonderful day”), and implicit, where the text implies an
opinion (“The handle breaks too easily”). Implicit sentiment
analysis is harder to analyze because it may not include words
that are obviously evaluations or judgments. Another challenge
involves the timeliness of collection/analysis of textual data
coming from a wide variety of data sources. A third challenge is
the difficulty of identifying whether a piece of text involves
sentiment or not, especially with implicit sentiment analysis.
The same sorts of issues involving text mining in natural
language settings also apply to sentiment analysis.
8. What are the most popular application areas for sentiment
analysis? Why?
Customer relationship management (CRM) and customer
experience management are popular “voice of the customer
(VOC)” applications. Other application areas include “voice of
the market (VOM)” and “voice of the employee (VOE).”
9. What are the main steps in carrying out sentiment analysis
projects?
The first step when performing sentiment analysis of a text
document is called sentiment detection, during which text data
is differentiated between fact and opinion (objective vs.
subjective). This is followed by negative-positive (N-P) polarity
classification, where a subjective text item is classified on a
bipolar range. Following this comes target identification
(identifying the person, product, event, etc. that the sentiment is
about). Finally come collection and aggregation, in which the
overall sentiment for the document is calculated based on the
calculations of sentiments of individual phrases and words from
the first three steps.
10. What are the two common methods for polarity
identification? Explain.
Polarity identification can be done via a lexicon (as a reference
library) or by using a collection of training documents and
inductive machine learning algorithms. The lexicon approach
uses a catalog of words, their synonyms, and their meanings,
combined with numerical ratings indicating the position on the
N-P polarity associated with these words. In this way, affective,
emotional, and attitudinal phrases can be classified according to
their degree of positivity or negativity. By contrast, the
training-document approach uses statistical analysis and
machine learning algorithms, such as neural networks,
clustering approaches, and decision trees to ascertain the
sentiment for a new text document based on patterns from
previous “training” documents with assigned sentiment scores.
11. Discuss the differences and commonalities between text
mining and Web mining.
Text mining is a specific application of data mining: applying it
to unstructured text files. Web mining is a specific application
of data mining: applying it to information on and about the Web
(content, structure, and usage).
12. In your own words, define Web mining, and discuss its
importance.
Students’ answers will vary.
13. What are the three main areas of Web mining? Discuss the
differences and commonalities among these three areas.
Web mining consists of three areas: Web content mining, Web
structure mining, and Web usage mining.
Web content mining refers to the automatic extraction of useful
information from Web pages. It may be used to enhance search
results produced by search engines.
Web structure mining refers to generating interesting
information from the links included in Web pages. Web
structure mining can also be used to identify the members of a
specific community and perhaps even the roles of the members
in the community.
Web usage mining refers to developing useful information
through analysis of Web server logs, user profiles, and
transaction information.
14. What is a search engine? Why is it important for
businesses?
A search engine is a software program that searches for
documents (Internet sites or files) based on the keywords
(individual words, multi-word terms, or a complete sentence)
that users have provided that have to do with the subject of their
inquiry. This is the most prominent type of information retrieval
system for finding relevant content on the Web. Search engines
have become the centerpiece of most Internet-based transactions
and other activities. Because people use them extensively to
learn about products and services, it is very important for
companies to have prominent visibility on the Web; hence the
major effort of companies to enhance their search engine
optimization (SEO).
15. What is SEO? Who benefits from it? How?
Search engine optimization (SEO) is the intentional activity of
affecting the visibility of an e-commerce site or a website in a
search engine’s natural (unpaid or organic) search results. It
involves editing a page’s content, HTML, metadata, and
associated coding to both increase its relevance to specific
keywords and to remove barriers to the indexing activities of
search engines. In addition, SEO efforts include promoting a
site to increase its number of inbound links. SEO primarily
benefits companies with e-commerce sites by making their
pages appear toward the top of search engine lists when users
query.
16.What is Web analytics? What are the metrics used in Web
analytics?
Web analytics, also called Web usage mining, can be considered
a part of Web mining, and aims to describe what has happened
on the website (employing a predefined, metrics-driven
descriptive analytics methodology). There are four main
categories of Web analytic metrics:
· Website usability: How were they using my website? These
involve page views, time on site, downloads, click map, and
click paths.
· Traffic sources: Where did they come from? These include
referral websites, search engines, direct, offline campaigns, and
online campaigns.
· Visitor profiles: What do my visitors look like? These include
keywords, content groupings, geography, time of day, and
landing page profiles.
· Conversion statistics: What does all this mean for the
business? Metrics include new visitors, returning visitors, leads,
sales/conversions, and abandonments.
These metrics are important because they provide access to a lot
of valuable marketing data, which can be leveraged for better
insights to grow your business and better document your ROI.
The insight and intelligence gained from Web analytics can be
used to effectively manage the marketing efforts of an
organization and its various products or services.
17. Define social analytics, social network, and social network
analysis. What are the relationships among them?
Social analytics involves monitoring, analyzing, measuring, and
interpreting digital interactions and relationships of people,
topics, ideas, and content. It involves mining the textual content
created in social media (e.g., sentiment analysis, natural
language processing) and analyzing socially established
networks (e.g., influencer identification, profiling, prediction).
A social network is a social structure composed of
individuals/people (or groups of individuals or organizations)
linked to one another with some type of
connections/relationships. Social network analysis (SNA) is the
systematic examination of social networks, and is an
interdisciplinary field that emerged from social psychology,
sociology, statistics, and graph (network) theory. Social
analytics therefore combines text analysis for content and
sentiment in online communications with social network
analysis to identify and analyze relationships between
individuals in a community.
18. What is social media analytics? How is it done? Who does
it? What comes out of it?
Social media analytics refers to the systematic and scientific
ways to consume the vast amount of content created by Web-
based social media outlets, tools, and techniques for the
betterment of an organization’s competitiveness. It is done
using many analytic methods, including text mining, sentiment
analysis, and social network analysis. Companies use it to get a
better understanding of their customer base, and can gain
financial and competitive advantages from doing so.
Governments use it to track potential terrorist threats, which
can lead to enhanced national security. Social scientists use it to
get a better understanding of how communities and societies
work, which can provide guidance on how to best manage these
societies.
ANSWERS TO END OF CHAPTER Exercises( (
Teradata University Network (TUN) and Other Hands-on
Exercises
1. Visit teradatauniversitynetwork.com. Identify cases about
text mining. Describe recent developments in the field. If you
cannot find enough cases at the Teradata University Network
Web site, broaden your search to other Web-based resources.
Student selection of cases will vary and create differences in
reports.
2. Go to teradatauniversitynetwork.com to locate white papers,
Web seminars, and other materials related to text mining.
Synthesize your findings into a short written report.
Student selection and perspectives on different reports will
generate a variety of findings.
3. Go to teradatauniversitynetwork.com and find the case study
named “eBay Analytics.” Read the case carefully and extend
your understanding of it by searching the Internet for additional
information, and answer the case questions.
Student interpretation and analysis of the information will vary.
4. Go to teradatauniversitynetwork.com and find the sentiment
analysis case named “How Do We Fix an App Like That?” Read
the description, and follow the directions to download the data
and the tool to carry out the exercise.
Work on this exercise will vary.
5. Visit teradatauniversitynetwork.com. Identify cases about
Web mining. Describe recent developments in the field. If you
cannot find enough cases at the Teradata University Network
Web site, broaden your search to other Web-based resources. 6.
Browse the Web and your library’s digital databases to identify
articles that make the linkage between text/Web mining and
contemporary business intelligence systems.
Student selection of cases and their analysis will differ.
Team Assignments and Role-Playing Projects
1. Examine how textual data can be captured automatically
using Web-based technologies. Once captured, what are the
potential patterns that you can extract from these unstructured
data sources?
Team selections of technology will create variations in their
reports.
2. Interview administrators at your college or executives in your
organization to determine how text mining and Web mining
could assist them in their work. Write a proposal describing
your findings. Include a preliminary cost–benefit analysis in
your report.
Team interviews of administrators will vary and create
differences in their findings.
3. Go to your library’s online resources. Learn how to download
attributes of a collection of literature (journal articles) in a
specific topic. Download and process the data using a
methodology similar to the one explained in Application Case
7.5.
Processes at individual institutions will vary as well the articles
downloaded.
4. Find a readily available sentiment text data set (see
Technology Insights 7.2 for a list of popular data sets) and
download it onto your computer. If you have an analytics tool
that is capable of text mining, use that. If not, download
RapidMiner (http://rapid-i.com) and install it. Also install the
Text Analytics add-on for RapidMiner. Process the downloaded
data using your text mining tool (i.e., convert the data into a
structured form). Build models and assess the sentiment
detection accuracy of several classification models (e.g.,
support vector machines, decision trees, neural networks,
logistic regression). Write a detailed report in which you
explain your findings and your experiences.
Selection of different data sets will result in different models
and variances in the final report.
5. Examine how Web-based data can be captured automatically
using the latest technologies. Once captured, what are the
potential patterns that you can extract from these content-rich,
mostly unstructured data sources?
Selection of technologies will vary based on student preferences
in the data search.
Internet Exercises
1. Find recent cases of successful text mining and Web mining
applications. Try text and Web mining software vendors and
consultancy firms and look for cases or success stories. Prepare
a report summarizing five new case studies.
Student research and reports will vary.
2. Go to statsoft.com. Select Downloads, and download at least
three white papers on applications. Which of these applications
might have used the data/text/Web mining techniques discussed
in this chapter?
Student research and reports will vary.
3. Go to sas.com. Download at least three white papers on
applications. Which of these applications might have used the
data/text/Web mining techniques discussed in this chapter?
Student research and reports will vary.
4. Go to ibm.com. Download at least three white papers on
applications. Which of these applications might have used the
data/text/Web mining techniques discussed in this chapter?
Student research and reports will vary.
5. Go to teradata.com. Download at least three white papers on
applications. Which of these applications might have used the
data/text/Web mining techniques discussed in this chapter?
Student research and reports will vary.
6. Go to clarabridge.com. Download at least three white papers
on applications. Which of these applications might have used
text mining in a creative way?
Student research and reports will vary.
7. Go to kdnuggets.com. Explore the sections on applications as
well as software. Find names of at least three additional
packages for data mining and text mining.
Student research and reports will vary.
8. Survey some Web mining tools and vendors. Identify some
Web mining products and service providers that are not
mentioned in this chapter.
Student research and reports will vary.
9. Go to attensity.com. Download at least three white papers on
Web analytics applications. Which of these applications might
have used a combination of data/ text/Web mining techniques?
Student research and reports will vary.
Copyright © 2011 Pearson Education, Inc. Publishing as
Prentice Hall
Copyright © 2011 Pearson Education, Inc. Publishing as
Prentice Hall
24
Copyright © 2019 Pearson Education, Inc.
A study into peer-to-peer network structure for online games
The classic model for online game’s network is client-server
(CS) model. The problems that the model faces are bottlenecks,
single points of failure and poor flexibility (Ramakrishna,2006).
Peer-to-peer (P2P) model brings another way to structure
networks. The problems and shortage of CS model can be
solved by some of the P2P models. However, P2P models have
shortages like security loopholes that is not researched
thoroughly. Also, P2P models are not put into practical use
since researches are not enough to support a well-designed P2P
structure for online games. This essay focuses on the merits and
expectations of using P2P model for online games, including
introduction of practical CS models and experimental P2P
structures. Furthermore, the limitations and expectations of
these P2P structures are discussed.
Although classic CS network structure is popular in online game
design, it is facing serious issues of complexity and scalability.
A CS network is a network that consists of two parts: clients
and servers. For online games, Yahyavi and Kemme (2013)
describes the server as machines that contain the master copies
of all variable objects and control the game world, and clients
as machines to receive information about the game world. Also,
clients need to send updates to servers. Most of the existing
online games are based on CS network, for example, the famous
Massively Multi-player Online Role-Play Game (MMORPG)
World of Warcraft is using multiple CS network to support its
12 million subscribed accounts (Liu, et al., 2015). Basic CS
models are popular for their simplicity and control efficiency
(Yahyavi and Kemme, 2013), although suffering from
scalability, error tolerance and cost. Advanced CS model, taking
multi-server model as an example, improved its scalability and
fault tolerance, but at the price of isolating players in different
regions (the players using European servers of World of
Warcraft cannot interact with players using Asian servers),
complexity and cost. It is clear that the CS model cannot avoid
the high cost of servers, as also claimed by Liu, et al. (2015),
and cannot fulfill simplicity and scalability at the same time.
Also, as Gibson and Vasconcelos (2018) suggested, the CS
structure faces the problem of single point of failure, which
means the whole network would collapse if the server goes
down. Ramakrishna, et al. (2006) also mentioned network
throughput bottlenecks and poor flexibility, but the problems
are now seen as trivial matters due to high-speed internet and
improved network structures. In conclusion, CS networks are
well-developed models, yet are facing development bottlenecks.
Researchers are using P2P network to break through the
bottlenecks CS networks are facing, yet there exist many
challenges for putting P2P network into practical use. P2P
network has P2P network is a network in which every machine
participates in providing information and calculating data for
the whole network (Cassavia, 2018). The role ‘server’ which is
played by servers in client-server model is taken by the whole
community. Each peer (end device) takes charge of a part of the
game world (Yahyavi and Kemme, 2013). The machines are
clients and servers at the same time. With the highest potential
for scalability, P2P structure can allow any number of players
into the game without causing any extra load for the game
provider. However, Yahyavi and Kemme (2013) claims that P2P
networks face challenges like difficulty in version control, low
defense level against malware and cheating. The latter problem
is also highlighted by Meng (2018), who designed a model to
solve the issue. P2P models in online games are feasible
according to various researches with models and relevant tests
(Yahyavi and Kemme; Cassavia; Barri, et al., 2016; Gibson,
2018; Liu, et al., 2015), yet are not well-established as runnable
game structures.
P2P game architectures have many sorting methods that allow
systematic analysis. Yahyavi and Kemme (2013) sorted P2P
game architectures into three kinds: structured P2P architecture,
unstructured P2P architecture and hybrid P2P architectures.
Except for the three kinds of P2P architectures, there exist
another kind of P2P structure: hybrid CS/P2P network.
The hybrid CS/P2P model has the most similar structure in P2P
networks to CS models, such that the application of this model
is highly feasible. Barri, et al. (2014) developed a CS/P2P
network structure for MMORPGs. The MMORPG is divided into
two parts: main game and auxiliary game. As defined, the main
game is a constantly existing game world with lower player
interaction, while the auxiliary game is where a few players
temporarily gather as a group with dense player interaction.
Barri, et al. decided to use CS model for the main game and P2P
for the auxiliary game. The two models are combined by a
mapping mechanism, which moves the P2P structure among
servers in CS model, reducing the redundancy of
communications. This model has high flexibility, scalability and
efficiency by test, but fails to examine the problems of churns
and cheating. Also, this model is only estimated in a single
game genre, so its performance in other types of games is
unknown.
Pure P2P structures are less familiar, but provide new ways to
improve online game architectures. Structured P2P architectures
are supported by specific graph structures and mechanisms such
as Distributed Hash Table (DHT) that allows every node to send
a message to or find other nodes (Yahyavi and Kemme, 2013).
Each node is bind with a unique key, such that it could be
found, created or deleted with low overhead. Yahyavi and
Kemme selected SimMud as an example. Players are distributed
into different regions and assigned NodeID. The regions are
connected by using ‘the peer with the NodeID closest to the
RegionID’ as the leader (Yahyavi and Kemme, pp.18). When a
new player joins in, the coordinator who holds the master copy
sends its data to the player. When updating the game, the game
provider updates the coordinator’s data. The example
demonstrates that structured P2P architectures have high
efficiency due to their specially optimized connections between
nodes, yet fails to examine the problem of malware attack and
cheating.
Unstructured P2P architectures provide examples of structures
without global mechanisms. Mutual notification system is used
instead of global mechanisms in unstructured P2P architectures
(Yahyavi and Kemme, 2013). Liu, et al. (2015) described a
protocol QuOn based on quad-tree structure. In this model,
‘binding neighbors’ is the key of the mutual notification system.
Messages are unicast from one node to another, and nodes are
connected through the ‘binding neighbors’ method between
neighbors. Backups are needed in case the current quad-tree is
lost. Two sets of ‘variable control’ tests have been applied to
the model, with results demonstrating high scalability.
However, the efficiency is suboptimal and should be further
researched. In addition, Liu, et al. did not analyze safety issues,
such that this protocol cannot be applied to game applications.
Few P2P models focused on safety issues, because dealing with
safety loopholes decrease the efficiency of the models (Meng,
2018). As a solution to the dilemma, Meng created a trust model
called speedTrust inspired by the trust relationship between
humans. By Yahyavi and kemme’s (2013) definition, speedTrust
is also a hybrid (structured combined with unstructured) P2P
model. In this hybrid P2P model, super peers are employed to
supervise the communication between two nodes, rewarding the
regulated peers and punishing malicious transactions using
specific formulas. Also, the model abandoned iterative
operations and instead use feedback reputations to calculate a
peer’s trust value, decreasing the time complexity of the model.
This model offers the best solution to the challenges of P2P
model considering both efficiency and safety. However, it did
not focus on scalability or game adaptability. If able to be
combined with the models above properly, this hybrid model
could be the perfect solution for peer-to-peer structures for
online games.
In conclusion, although research for peer-to-peer network
online games exists, there is not a perfect solution for the
challenges that peer-to-peer structures face. This essay
mentioned four P2P structures: a hybrid CS/P2P method for
MMORPG (Barri, et al., 2014); structured P2P architecture
SimMud (Yahyavi and Kemme, 2013); unstructured P2P
architecture QuOn (Liu, et al., 2015); trust model (also a hybrid
P2P model) speedTrust (Meng, 2018). All of the methods solved
the single point of failure that client-server model has, and most
of the models can scale high and achieve high efficiency. Only
one method focused on the safety issues, which are vital for
providing safe and fair gaming experience. This method,
however, is not applied to games and its scalability is not
examined. Without providing an integrated solution to these
challenges, the development of P2P online games could be hard
and slow.
Reference
Barri I., Roig C. and Giné F. (2016) ‘Distributing Game
Instances in A Hybrid
Client-Server/P2P System to Support MMORPG Playability’,
Multimed
Tools Appl, 75, pp.2005-2029.
Cassavia N., et al. (2018) ‘Distributed computing by leveraging
and rewarding
idling user resources from P2P networks’, J. Parallel Distrib.
Comput.,
122, pp.81-94.
Gibson M. and Vasconcelos W. (2018) ‘A Knowledge-Based
Approach to
Multiplayer Games in Peer-to-Peer Networks’, Knowledge and
Information
Systems. Available at: https://doi.org/10.1007/s10115-018-
1295-6
(Accessed: 12 May 2019).
Liu L. et al. (2015) ‘Performance Evaluation and Simulation of
Peer-to-Peer
Protocols for Massively Multiplayer Online Games’, Multimed
Tools Appl,
74, pp.2763-2780.
Meng X. (2018) ‘SpeedTrust: A Super Peer-guaranteed Trust
Model in Hybrid
P2P Networks’, The Journal of Supercomputing, 74(6), pp.
2553-2580.
Ramakrishna, V. et al. (2006) ‘An Active Self-Optimizing
Multiplayer Gaming
Architecture’, Cluster Computing, 9(2), pp. 201-215.
Yahyavi, A. and Kemme, B. (2013) ‘Peer-to-Peer Architectures
for Massively
Multiplayer Online Games: A Survey’, ACM Computing
Surveys (CSUR),
46(1), pp. 1-51.
1
Final Research Essay – Guidelines for students
Overview
Your final research essay will test your ability to write about
the importance of a concept, process or practice within your
subject area.
You will be required to:
· give an extended definition of the concept, process or practice,
considering different points of view
· explain the main principles of the concept, process or practice
· explain the significance of the concept, process or practice to
the general public (or how it can be used in industry,
manufacturing etc.)
· demonstrate analytical and/or critical thinking skills
Academic sources are peer-reviewed journal articles or passages
from books/textbooks published by reputable university presses.
Semi-academic articles include articles from publications such
as New Scientist, The Economist or Financial Times.
You should not cite unofficial websites or Wikipedia: these
have not been peer-reviewed. You should only cite sources that
have been written in English – for this assessment, you should
not cite any sources written in Chinese that you have
Assessment Criteria
In order to pass this assessment, you need to demonstrate that
you can:
· understand a wide range of demanding, longer texts
· use language flexibly and effectively for academic purposes
· produce clear, well-structured, detailed writing on complex
subjects
· show controlled use of organisational patterns, connectors and
cohesive devices
Your teacher will review the assessment criteria with you. You
will be assessed in the following areas:
Argument/Task Fulfilment
Language Use: Vocabulary
Language Use: Grammar
Structure and Coherence
Academic Style and Conventions
Guidelines
Please note the following guidelines.
1. You should submit both the printed and electronic copies of
your essay with an official cover page. Please include your
question on the first page.
2. Your essay should be 1,500 words. Essays below the required
length will lose marks. Please indicate the word count on the
cover sheet.
3. You should refer to at least seven sources. At least four of
these should be fully academic sources from peer-reviewed
articles or passages from textbooks. However, up to three
appropriate semi-academic sources articles from publications
such as New Scientist, The Economist or Financial Times are
also acceptable. You should not cite unofficial websites or
Wikipedia. All sources should be in English. You should not
translate sources from other languages.
4. Plagiarism will be heavily penalized. All essays will be
checked by the originality checker Urkund. You must ensure
that you fully acknowledge all sources, whether you paraphrase
them or quote directly.
5. You should follow the assigned style guide throughout. For
Finance students this is APA, for others it is Harvard. The most
important thing is that you are consistent throughout.
6. References: Please ensure that you provide a list of
references at the end of the paper. Make sure that all references
cited in the text are included in the list of references, and that
the list does not include any entries that have not been cited in
the text. References should be listed alphabetically by author. In
the case of more than one publication by the same author(s), list
them chronologically.
7. Your assignment should be typed in Calibri, Times New
Roman, or Arial font. Font size should be 12. Font colour
should be black. Please either double-space your assignment or
use 1.5 spacing.
8. Your title/question should be in bold. If you use section
headings, these should also be in bold.
9. Please either indent your paragraphs (typically 1.25cm) or
leave spaces between them. You should be consistent
throughout the essay.
10. Please include page numbers at the bottom of each page.
11. You do not have to include an abstract or table of contents.
Essay checklist
Read through this essay and check for these possible problems.
1st Highlight anything that looks bad in red in the text and
make some notes if you want.
2nd In the table below mark which areas the student needs to
pay attention to specifically.
Answer the questions
No
Yes
Any notes or advice to help:
Is there an introduction which introduces the topic and states
why this topic is important and some background information?
Is there a thesis statement?
Are there key definitions given with some referencing?
Are there clear paragraphs with topic sentences?
Does the essay follow a logical order and is it easy to follow
and understand?
Is there a clear conclusion which relates back to the thesis
statement and covers any limitations and solutions?
Is the essay formatted correctly with coversheet, essay question,
double spacing, 12 size font and correct style?
Is there a reference list in alphabetical order and following the
style guide?
Do you think the writer has spell-checked and grammar checked
using Grammarly.com?
Are there in-text references that match all the references in the
reference list?
Has the writer used personal language?
Can you see any phrasal verbs or evidence of informal style?
Check the subject-verb-agreement – is this okay?
Check the plural nouns and singular nouns – are they okay?
Have you noticed any other issues? If so, give some examples.

More Related Content

Similar to 16     Decision Support and Business Intelligence Systems (9th E.docx

NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISrathnaarul
 
Framework for Product Recommandation for Review Dataset
Framework for Product Recommandation for Review DatasetFramework for Product Recommandation for Review Dataset
Framework for Product Recommandation for Review Datasetrahulmonikasharma
 
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOMTEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOMITC Infotech
 
Natural Language Processing Use Cases for Business Optimization
Natural Language Processing Use Cases for Business OptimizationNatural Language Processing Use Cases for Business Optimization
Natural Language Processing Use Cases for Business OptimizationTakayuki Yamazaki
 
Opinion Mining Techniques for Non-English Languages: An Overview
Opinion Mining Techniques for Non-English Languages: An OverviewOpinion Mining Techniques for Non-English Languages: An Overview
Opinion Mining Techniques for Non-English Languages: An OverviewCSCJournals
 
Query formulation process
Query formulation processQuery formulation process
Query formulation processmalathimurugan
 
Lexicon Based Emotion Analysis on Twitter Data
Lexicon Based Emotion Analysis on Twitter DataLexicon Based Emotion Analysis on Twitter Data
Lexicon Based Emotion Analysis on Twitter Dataijtsrd
 
Improving search result via search keywords and data classification similarity
Improving search result via search keywords and data classification similarityImproving search result via search keywords and data classification similarity
Improving search result via search keywords and data classification similarityConference Papers
 
A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...ijcnes
 
A Review: Text Classification on Social Media Data
A Review: Text Classification on Social Media DataA Review: Text Classification on Social Media Data
A Review: Text Classification on Social Media DataIOSR Journals
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibEl Habib NFAOUI
 
Information Retrieval on Text using Concept Similarity
Information Retrieval on Text using Concept SimilarityInformation Retrieval on Text using Concept Similarity
Information Retrieval on Text using Concept Similarityrahulmonikasharma
 
Inverted files for text search engines
Inverted files for text search enginesInverted files for text search engines
Inverted files for text search enginesunyil96
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...ijcsity
 
Competitive Intelligence Made easy
Competitive Intelligence Made easyCompetitive Intelligence Made easy
Competitive Intelligence Made easyRaghav Shaligram
 
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningClassification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningIOSR Journals
 

Similar to 16     Decision Support and Business Intelligence Systems (9th E.docx (20)

NE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSISNE7012- SOCIAL NETWORK ANALYSIS
NE7012- SOCIAL NETWORK ANALYSIS
 
Ieee format 5th nccci_a study on factors influencing as a best practice for...
Ieee format 5th nccci_a study on factors influencing as  a  best practice for...Ieee format 5th nccci_a study on factors influencing as  a  best practice for...
Ieee format 5th nccci_a study on factors influencing as a best practice for...
 
Framework for Product Recommandation for Review Dataset
Framework for Product Recommandation for Review DatasetFramework for Product Recommandation for Review Dataset
Framework for Product Recommandation for Review Dataset
 
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOMTEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM
 
Natural Language Processing Use Cases for Business Optimization
Natural Language Processing Use Cases for Business OptimizationNatural Language Processing Use Cases for Business Optimization
Natural Language Processing Use Cases for Business Optimization
 
Opinion Mining Techniques for Non-English Languages: An Overview
Opinion Mining Techniques for Non-English Languages: An OverviewOpinion Mining Techniques for Non-English Languages: An Overview
Opinion Mining Techniques for Non-English Languages: An Overview
 
Query formulation process
Query formulation processQuery formulation process
Query formulation process
 
Lexicon Based Emotion Analysis on Twitter Data
Lexicon Based Emotion Analysis on Twitter DataLexicon Based Emotion Analysis on Twitter Data
Lexicon Based Emotion Analysis on Twitter Data
 
Improving search result via search keywords and data classification similarity
Improving search result via search keywords and data classification similarityImproving search result via search keywords and data classification similarity
Improving search result via search keywords and data classification similarity
 
A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...
 
A Review: Text Classification on Social Media Data
A Review: Text Classification on Social Media DataA Review: Text Classification on Social Media Data
A Review: Text Classification on Social Media Data
 
O017148084
O017148084O017148084
O017148084
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 
Information Retrieval on Text using Concept Similarity
Information Retrieval on Text using Concept SimilarityInformation Retrieval on Text using Concept Similarity
Information Retrieval on Text using Concept Similarity
 
Inverted files for text search engines
Inverted files for text search enginesInverted files for text search engines
Inverted files for text search engines
 
Estimating the overall sentiment score by inferring modus ponens law
Estimating the overall sentiment score by inferring modus ponens lawEstimating the overall sentiment score by inferring modus ponens law
Estimating the overall sentiment score by inferring modus ponens law
 
Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...Great model a model for the automatic generation of semantic relations betwee...
Great model a model for the automatic generation of semantic relations betwee...
 
NLP Ecosystem
NLP EcosystemNLP Ecosystem
NLP Ecosystem
 
Competitive Intelligence Made easy
Competitive Intelligence Made easyCompetitive Intelligence Made easy
Competitive Intelligence Made easy
 
Classification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern MiningClassification of News and Research Articles Using Text Pattern Mining
Classification of News and Research Articles Using Text Pattern Mining
 

More from RAJU852744

2222020 Report Pagehttpsww3.capsim.comcgi-bindispla.docx
2222020 Report Pagehttpsww3.capsim.comcgi-bindispla.docx2222020 Report Pagehttpsww3.capsim.comcgi-bindispla.docx
2222020 Report Pagehttpsww3.capsim.comcgi-bindispla.docxRAJU852744
 
2212020 Soil Colloids (Chapter 8) Notes - AGRI1050R50 Intro.docx
2212020 Soil Colloids (Chapter 8) Notes - AGRI1050R50 Intro.docx2212020 Soil Colloids (Chapter 8) Notes - AGRI1050R50 Intro.docx
2212020 Soil Colloids (Chapter 8) Notes - AGRI1050R50 Intro.docxRAJU852744
 
20 Other Conditions That May Be a Focus of Clinical AttentionV-c.docx
20 Other Conditions That May Be a Focus of Clinical AttentionV-c.docx20 Other Conditions That May Be a Focus of Clinical AttentionV-c.docx
20 Other Conditions That May Be a Focus of Clinical AttentionV-c.docxRAJU852744
 
223 Case 53 Problems in Pasta Land by Andres Sous.docx
223 Case 53 Problems in Pasta Land by  Andres Sous.docx223 Case 53 Problems in Pasta Land by  Andres Sous.docx
223 Case 53 Problems in Pasta Land by Andres Sous.docxRAJU852744
 
222111Organization N.docx
222111Organization N.docx222111Organization N.docx
222111Organization N.docxRAJU852744
 
22-6  Reporting the Plight of Depression FamiliesMARTHA GELLHOR.docx
22-6  Reporting the Plight of Depression FamiliesMARTHA GELLHOR.docx22-6  Reporting the Plight of Depression FamiliesMARTHA GELLHOR.docx
22-6  Reporting the Plight of Depression FamiliesMARTHA GELLHOR.docxRAJU852744
 
2012 © Laureate Education, Inc. ASSIGNMENT AND FINAL P.docx
2012 © Laureate Education, Inc. ASSIGNMENT AND FINAL P.docx2012 © Laureate Education, Inc. ASSIGNMENT AND FINAL P.docx
2012 © Laureate Education, Inc. ASSIGNMENT AND FINAL P.docxRAJU852744
 
216Author’s Note I would like to thank the Division of Wo.docx
216Author’s Note I would like to thank the Division of Wo.docx216Author’s Note I would like to thank the Division of Wo.docx
216Author’s Note I would like to thank the Division of Wo.docxRAJU852744
 
2019 International Conference on Machine Learning, Big Data, C.docx
2019 International Conference on Machine Learning, Big Data, C.docx2019 International Conference on Machine Learning, Big Data, C.docx
2019 International Conference on Machine Learning, Big Data, C.docxRAJU852744
 
2018 4th International Conference on Green Technology and Sust.docx
2018 4th International Conference on Green Technology and Sust.docx2018 4th International Conference on Green Technology and Sust.docx
2018 4th International Conference on Green Technology and Sust.docxRAJU852744
 
202 S.W.3d 811Court of Appeals of Texas,San Antonio.PROG.docx
202 S.W.3d 811Court of Appeals of Texas,San Antonio.PROG.docx202 S.W.3d 811Court of Appeals of Texas,San Antonio.PROG.docx
202 S.W.3d 811Court of Appeals of Texas,San Antonio.PROG.docxRAJU852744
 
200 wordsResearch Interest Lack of minorities in top level ma.docx
200 wordsResearch Interest Lack of minorities in top level ma.docx200 wordsResearch Interest Lack of minorities in top level ma.docx
200 wordsResearch Interest Lack of minorities in top level ma.docxRAJU852744
 
2019 14th Iberian Conference on Information Systems and Tech.docx
2019 14th Iberian Conference on Information Systems and Tech.docx2019 14th Iberian Conference on Information Systems and Tech.docx
2019 14th Iberian Conference on Information Systems and Tech.docxRAJU852744
 
200520201ORG30002 – Leadership Practice and Skills.docx
200520201ORG30002 – Leadership Practice and Skills.docx200520201ORG30002 – Leadership Practice and Skills.docx
200520201ORG30002 – Leadership Practice and Skills.docxRAJU852744
 
2182020 Sample Content Topichttpspurdueglobal.brights.docx
2182020 Sample Content Topichttpspurdueglobal.brights.docx2182020 Sample Content Topichttpspurdueglobal.brights.docx
2182020 Sample Content Topichttpspurdueglobal.brights.docxRAJU852744
 
21 hours agoMercy Eke Week 2 Discussion Hamilton Depression.docx
21 hours agoMercy Eke Week 2 Discussion Hamilton Depression.docx21 hours agoMercy Eke Week 2 Discussion Hamilton Depression.docx
21 hours agoMercy Eke Week 2 Discussion Hamilton Depression.docxRAJU852744
 
2192020 Originality Reporthttpsucumberlands.blackboar.docx
2192020 Originality Reporthttpsucumberlands.blackboar.docx2192020 Originality Reporthttpsucumberlands.blackboar.docx
2192020 Originality Reporthttpsucumberlands.blackboar.docxRAJU852744
 
20810chapter Information Systems Sourcing .docx
20810chapter       Information Systems Sourcing    .docx20810chapter       Information Systems Sourcing    .docx
20810chapter Information Systems Sourcing .docxRAJU852744
 
21720201Chapter 14Eating and WeightHealth Ps.docx
21720201Chapter 14Eating and WeightHealth Ps.docx21720201Chapter 14Eating and WeightHealth Ps.docx
21720201Chapter 14Eating and WeightHealth Ps.docxRAJU852744
 
2020221 Critical Review #2 - WebCOM™ 2.0httpssmc.grte.docx
2020221 Critical Review #2 - WebCOM™ 2.0httpssmc.grte.docx2020221 Critical Review #2 - WebCOM™ 2.0httpssmc.grte.docx
2020221 Critical Review #2 - WebCOM™ 2.0httpssmc.grte.docxRAJU852744
 

More from RAJU852744 (20)

2222020 Report Pagehttpsww3.capsim.comcgi-bindispla.docx
2222020 Report Pagehttpsww3.capsim.comcgi-bindispla.docx2222020 Report Pagehttpsww3.capsim.comcgi-bindispla.docx
2222020 Report Pagehttpsww3.capsim.comcgi-bindispla.docx
 
2212020 Soil Colloids (Chapter 8) Notes - AGRI1050R50 Intro.docx
2212020 Soil Colloids (Chapter 8) Notes - AGRI1050R50 Intro.docx2212020 Soil Colloids (Chapter 8) Notes - AGRI1050R50 Intro.docx
2212020 Soil Colloids (Chapter 8) Notes - AGRI1050R50 Intro.docx
 
20 Other Conditions That May Be a Focus of Clinical AttentionV-c.docx
20 Other Conditions That May Be a Focus of Clinical AttentionV-c.docx20 Other Conditions That May Be a Focus of Clinical AttentionV-c.docx
20 Other Conditions That May Be a Focus of Clinical AttentionV-c.docx
 
223 Case 53 Problems in Pasta Land by Andres Sous.docx
223 Case 53 Problems in Pasta Land by  Andres Sous.docx223 Case 53 Problems in Pasta Land by  Andres Sous.docx
223 Case 53 Problems in Pasta Land by Andres Sous.docx
 
222111Organization N.docx
222111Organization N.docx222111Organization N.docx
222111Organization N.docx
 
22-6  Reporting the Plight of Depression FamiliesMARTHA GELLHOR.docx
22-6  Reporting the Plight of Depression FamiliesMARTHA GELLHOR.docx22-6  Reporting the Plight of Depression FamiliesMARTHA GELLHOR.docx
22-6  Reporting the Plight of Depression FamiliesMARTHA GELLHOR.docx
 
2012 © Laureate Education, Inc. ASSIGNMENT AND FINAL P.docx
2012 © Laureate Education, Inc. ASSIGNMENT AND FINAL P.docx2012 © Laureate Education, Inc. ASSIGNMENT AND FINAL P.docx
2012 © Laureate Education, Inc. ASSIGNMENT AND FINAL P.docx
 
216Author’s Note I would like to thank the Division of Wo.docx
216Author’s Note I would like to thank the Division of Wo.docx216Author’s Note I would like to thank the Division of Wo.docx
216Author’s Note I would like to thank the Division of Wo.docx
 
2019 International Conference on Machine Learning, Big Data, C.docx
2019 International Conference on Machine Learning, Big Data, C.docx2019 International Conference on Machine Learning, Big Data, C.docx
2019 International Conference on Machine Learning, Big Data, C.docx
 
2018 4th International Conference on Green Technology and Sust.docx
2018 4th International Conference on Green Technology and Sust.docx2018 4th International Conference on Green Technology and Sust.docx
2018 4th International Conference on Green Technology and Sust.docx
 
202 S.W.3d 811Court of Appeals of Texas,San Antonio.PROG.docx
202 S.W.3d 811Court of Appeals of Texas,San Antonio.PROG.docx202 S.W.3d 811Court of Appeals of Texas,San Antonio.PROG.docx
202 S.W.3d 811Court of Appeals of Texas,San Antonio.PROG.docx
 
200 wordsResearch Interest Lack of minorities in top level ma.docx
200 wordsResearch Interest Lack of minorities in top level ma.docx200 wordsResearch Interest Lack of minorities in top level ma.docx
200 wordsResearch Interest Lack of minorities in top level ma.docx
 
2019 14th Iberian Conference on Information Systems and Tech.docx
2019 14th Iberian Conference on Information Systems and Tech.docx2019 14th Iberian Conference on Information Systems and Tech.docx
2019 14th Iberian Conference on Information Systems and Tech.docx
 
200520201ORG30002 – Leadership Practice and Skills.docx
200520201ORG30002 – Leadership Practice and Skills.docx200520201ORG30002 – Leadership Practice and Skills.docx
200520201ORG30002 – Leadership Practice and Skills.docx
 
2182020 Sample Content Topichttpspurdueglobal.brights.docx
2182020 Sample Content Topichttpspurdueglobal.brights.docx2182020 Sample Content Topichttpspurdueglobal.brights.docx
2182020 Sample Content Topichttpspurdueglobal.brights.docx
 
21 hours agoMercy Eke Week 2 Discussion Hamilton Depression.docx
21 hours agoMercy Eke Week 2 Discussion Hamilton Depression.docx21 hours agoMercy Eke Week 2 Discussion Hamilton Depression.docx
21 hours agoMercy Eke Week 2 Discussion Hamilton Depression.docx
 
2192020 Originality Reporthttpsucumberlands.blackboar.docx
2192020 Originality Reporthttpsucumberlands.blackboar.docx2192020 Originality Reporthttpsucumberlands.blackboar.docx
2192020 Originality Reporthttpsucumberlands.blackboar.docx
 
20810chapter Information Systems Sourcing .docx
20810chapter       Information Systems Sourcing    .docx20810chapter       Information Systems Sourcing    .docx
20810chapter Information Systems Sourcing .docx
 
21720201Chapter 14Eating and WeightHealth Ps.docx
21720201Chapter 14Eating and WeightHealth Ps.docx21720201Chapter 14Eating and WeightHealth Ps.docx
21720201Chapter 14Eating and WeightHealth Ps.docx
 
2020221 Critical Review #2 - WebCOM™ 2.0httpssmc.grte.docx
2020221 Critical Review #2 - WebCOM™ 2.0httpssmc.grte.docx2020221 Critical Review #2 - WebCOM™ 2.0httpssmc.grte.docx
2020221 Critical Review #2 - WebCOM™ 2.0httpssmc.grte.docx
 

Recently uploaded

Basic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationBasic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationNeilDeclaro1
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactisticshameyhk98
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
latest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answerslatest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answersdalebeck957
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 

Recently uploaded (20)

Basic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationBasic Intentional Injuries Health Education
Basic Intentional Injuries Health Education
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactistics
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
latest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answerslatest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answers
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 

16     Decision Support and Business Intelligence Systems (9th E.docx

  • 1. 16 Decision Support and Business Intelligence Systems (9th Edition) Instructor’s Manual Chapter 7: Text Analytics, Text Mining, and Sentiment Analysis Learning Objectives for Chapter 7 1. Describe text mining and understand the need for text mining 2. Differentiate among text analytics, text mining, and data mining 3. Understand the different application areas for text mining 4. Know the process of carrying out a text mining project 5. Appreciate the different methods to introduce structure to text-based data 6. Describe sentiment analysis 7. Develop familiarity with popular applications of sentiment analysis 8. Learn the common methods for sentiment analysis 9. Become familiar with speech analytics as it relates to sentiment analysis 10. Learn three facets of Web analytics—content, structure, and usage mining 11. Know social analytics including social media and social network analyses CHAPTER OVERVIEW This chapter provides a comprehensive overview of text analytics/mining and Web analytics/mining along with their
  • 2. popular application areas such as search engines, sentiment analysis, and social network/media analytics. As we have been witnessing in recent years, the unstructured data generated over the Internet of Things (IoT) (Web, sensor networks, radio- frequency identification [RFID]–enabled supply chain systems, surveillance networks, etc.) are increasing at an exponential pace, and there is no indication of its slowing down. This changing nature of data is forcing organizations to make text and Web analytics a critical part of their business intelligence/analytics infrastructure. CHAPTER OUTLINE 7.1 Opening Vignette: Amadori Group Converts Consumer Sentiments into Near-Real-Time Sales 7.2 Text Analytics and Text Mining Overview 7.3 Natural Language Processing (NLP) 7.4 Text Mining Applications 7.5 Text Mining Process 7.6 Sentiment Analysis 7.7 Web Mining Overview 7.8 Search Engines 7.9 Web Usage Mining 7.10 Social Analytics ANSWERS TO END OF SECTION REVIEW QUESTIONS( ( ( ( ( ( Section 7.1 Review Questions 1. According to the vignette and based on your opinion, what
  • 3. are the challenges that the food industry is facing today? Student perceptions may vary, but some common themes related to the challenges faced by the food industry could include the changing nature and role of food in people’s lifestyles, the shift towards pre-prepared or easily prepared food, and the growing importance of marketing to keep customers interested in brands. 2. How can analytics help businesses in the food industry to survive and thrive in this competitive marketplace? Analytics can serve dual purposes by both tracking customer interest in the brand as well as providing valuable feedback on customer preferences. An analytics system can be used to evaluate the traffic to various brand marketing campaigns (website or social) that play a pivotal role in ensuring that products are being shown to new potential buyers and reminding existing customers of their value. An analytics system can also be used to help gather customer feedback and perception information on a brand in general or products in particular. This valuable information can be used as a part of both marketing and product design. 3. What were and still are the main objectives for Amadori to embark into analytics? What were the results? The company’s main objectives were to market more effectively to potential customers and create direct communications through social media and other channels with current customers to start a dialogue. The case illustrates how an analytics system integrated with thoughtful website design can help a company meet these goals. 4. Can you think of other businesses in the food industry that utilize analytics to become more competitive and customer focused? If not, an Internet search could help find relevant
  • 4. information to answer this question. Student opinions and Web searches will vary, but will show similar strategies for packaged foods as well as fast foods in the US. Section 7.2 Review Questions 1. What is text analytics? How does it differ from text mining? Text analytics is a concept that includes information retrieval (e.g., searching and identifying relevant documents for a given set of key terms) as well as information extraction, data mining, and Web mining. By contrast, text mining is primarily focused on discovering new and useful knowledge from textual data sources. The overarching goal for both text analytics and text mining is to turn unstructured textual data into actionable information through the application of natural language processing (NLP) and analytics. However, text analytics is a broader term because of its inclusion of information retrieval. You can think of text analytics as a combination of information retrieval plus text mining. 2. What is text mining? How does it differ from data mining? Text mining is the application of data mining to unstructured, or less structured, text files. As the names indicate, text mining analyzes words; and data mining analyzes numeric data. 3. Why is the popularity of text mining as an analytics tool increasing? Text mining as a BI is increasing because of the rapid growth in text data and availability of sophisticated BI tools. The benefits of text mining are obvious in the areas where very large amounts of textual data are being generated, such as law (court orders), academic research (research articles), finance (quarterly reports), medicine (discharge summaries), biology
  • 5. (molecular interactions), technology (patent files), and marketing (customer comments). 4. What are some popular application areas of text mining? · Information extraction. Identification of key phrases and relationships within text by looking for predefined sequences in text via pattern matching. · Topic tracking. Based on a user profile and documents that a user views, text mining can predict other documents of interest to the user. · Summarization. Summarizing a document to save time on the part of the reader. · Categorization. Identifying the main themes of a document and then placing the document into a predefined set of categories based on those themes. · Clustering. Grouping similar documents without having a predefined set of categories. · Concept linking. Connects related documents by identifying their shared concepts and, by doing so, helps users find information that they perhaps would not have found using traditional search methods. · Question answering. Finding the best answer to a given question through knowledge-driven pattern matching. Section 7.3 Review Questions 1. What is NLP? Natural language processing (NLP) is an important component
  • 6. of text mining and is a subfield of artificial intelligence and computational linguistics. It studies the problem of “understanding” the natural human language, with the view of converting depictions of human language (such as textual documents) into more formal representations (in the form of numeric and symbolic data) that are easier for computer programs to manipulate. 2. How does NLP relate to text mining? Text mining uses natural language processing to induce structure into the text collection and then uses data mining algorithms such as classification, clustering, association, and sequence discovery to extract knowledge from it. 3. What are some of the benefits and challenges of NLP? NLP moves beyond syntax-driven text manipulation (which is often called “word counting”) to a true understanding and processing of natural language that considers grammatical and semantic constraints as well as the context. The challenges include: · Part-of-speech tagging. It is difficult to mark up terms in a text as corresponding to a particular part of speech because the part of speech depends not only on the definition of the term but also on the context within which it is used. · Text segmentation. Some written languages, such as Chinese, Japanese, and Thai, do not have single-word boundaries. · Word sense disambiguation. Many words have more than one meaning. Selecting the meaning that makes the most sense can only be accomplished by taking into account the context within
  • 7. which the word is used. · Syntactic ambiguity. The grammar for natural languages is ambiguous; that is, multiple possible sentence structures often need to be considered. Choosing the most appropriate structure usually requires a fusion of semantic and contextual information. · Imperfect or irregular input. Foreign or regional accents and vocal impediments in speech and typographical or grammatical errors in texts make the processing of the language an even more difficult task. · Speech acts. A sentence can often be considered an action by the speaker. The sentence structure alone may not contain enough information to define this action. 4. What are the most common tasks addressed by NLP? Following are among the most popular tasks: • Question answering. • Automatic summarization. • Natural language generation. • Natural language understanding. • Machine translation. • Foreign language reading. • Foreign language writing. • Speech recognition. • Text-to-speech. • Text proofing. • Optical character recognition. Section 7.4 Review Questions
  • 8. 5. List and briefly discuss some of the text mining applications in marketing. Text mining can be used to increase cross-selling and up-selling by analyzing the unstructured data generated by call centers. Text mining has become invaluable for customer relationship management. Companies can use text mining to analyze rich sets of unstructured text data, combined with the relevant structured data extracted from organizational databases, to predict customer perceptions and subsequent purchasing behavior. 6. How can text mining be used in security and counterterrorism? Students may use the introductory case in this answer. In 2007, EUROPOL developed an integrated system capable of accessing, storing, and analyzing vast amounts of structured and unstructured data sources in order to track transnational organized crime. Another security-related application of text mining is in the area of deception detection. 7. What are some promising text mining applications in biomedicine? As in any other experimental approach, it is necessary to analyze this vast amount of data in the context of previously known information about the biological entities under study. The literature is a particularly valuable source of information for experiment validation and interpretation. Therefore, the development of automated text mining tools to assist in such interpretation is one of the main challenges in current bioinformatics research.
  • 9. Section 7.5 Review Questions 8. What are the main steps in the text mining process? See Figure 7.6 (p. 309). Text mining entails three tasks: · Establish the Corpus: Collect and organize the domain- specific unstructured data · Create the Term–Document Matrix: Introduce structure to the corpus · Extract Knowledge: Discover novel patterns from the T-D matrix 9. What is the reason for normalizing word frequencies? What are the common methods for normalizing word frequencies? The raw indices need to be normalized in order to have a more consistent TDM for further analysis. Common methods are log frequencies, binary frequencies, and inverse document frequencies. 10. What is SVD? How is it used in text mining? Singular value decomposition (SVD), which is closely related to principal components analysis, reduces the overall dimensionality of the input matrix (number of input documents by number of extracted terms) to a lower dimensional space, where each consecutive dimension represents the largest degree of variability (between words and documents) possible 11. What are the main knowledge extraction methods from corpus? The main categories of knowledge extraction methods are classification, clustering, association, and trend analysis. Section 7.6 Review Questions
  • 10. 12. What is sentiment analysis? How does it relate to text mining? Sentiment analysis tries to answer the question, “What do people feel about a certain topic?” by digging into opinions of many using a variety of automated tools. It is also known as opinion mining, subjectivity analysis, and appraisal extraction Sentiment analysis shares many characteristics and techniques with text mining. However, unlike text mining, which categorizes text by conceptual taxonomies of topics, sentiment classification generally deals with two classes (positive versus negative), a range of polarity (e.g., star ratings for movies), or a range in strength of opinion. 13. What are the most popular application areas for sentiment analysis? Why? Customer relationship management (CRM) and customer experience management are popular “voice of the customer (VOC)” applications. Other application areas include “voice of the market (VOM)” and “voice of the employee (VOE).” 14. What would be the expected benefits and beneficiaries of sentiment analysis in politics? Opinions matter a great deal in politics. Because political discussions are dominated by quotes, sarcasm, and complex references to persons, organizations, and ideas, politics is one of the most difficult, and potentially fruitful, areas for sentiment analysis. By analyzing the sentiment on election forums, one may predict who is more likely to win or lose. Sentiment analysis can help understand what voters are thinking and can clarify a candidate’s position on issues. Sentiment analysis can help political organizations, campaigns, and news analysts to better understand which issues and positions matter the most to voters. The technology was successfully applied by
  • 11. both parties to the 2008 and 2012 American presidential election campaigns. 15. What are the main steps in carrying out sentiment analysis projects? The first step when performing sentiment analysis of a text document is called sentiment detection, during which text data is differentiated between fact and opinion (objective vs. subjective). This is followed by negative-positive (N-P) polarity classification, where a subjective text item is classified on a bipolar range. Following this comes target identification (identifying the person, product, event, etc. that the sentiment is about). Finally come collection and aggregation, in which the overall sentiment for the document is calculated based on the calculations of sentiments of individual phrases and words from the first three steps. 16. What are the two common methods for polarity identification? What is the main difference between the two? Polarity identification can be done via a lexicon (as a reference library) or by using a collection of training documents and inductive machine learning algorithms. The lexicon approach uses a catalog of words, their synonyms, and their meanings, combined with numerical ratings indicating the position on the N-P polarity associated with these words. In this way, affective, emotional, and attitudinal phrases can be classified according to their degree of positivity or negativity. By contrast, the training-document approach uses statistical analysis and machine learning algorithms, such as neural networks, clustering approaches, and decision trees to ascertain the sentiment for a new text document based on patterns from previous “training” documents with assigned sentiment scores. Section 7.7 Review Questions
  • 12. 17. What are some of the main challenges the Web poses for knowledge discovery? • The Web is too big for effective data mining. • The Web is too complex. • The Web is too dynamic. • The Web is not specific to a domain. • The Web has everything. 18. What is Web mining? How does it differ from regular data mining or text mining? Web mining is the discovery and analysis of interesting and useful information from the Web and about the Web, usually through Web-based tools. Text mining is less structured because it’s based on words instead of numeric data. 19. What are the three main areas of Web mining? The three main areas of Web mining are Web content mining, Web structure mining, and Web usage (or activity) mining. 20. Identify three application areas for Web mining (at the bottom of Figure 8.1). Based on your own experiences, comment on their use cases in business settings. (Since there are several application areas, this answer will vary for different students. Following is one possible answer.) Three possible application areas for Web mining include sentiment analysis, clickstream analysis, and customer analytics. Clickstream analysis helps to better understand user
  • 13. behavior on a website. Sentiment analysis helps us understand the opinions and affective state of users on a system. Customer analytics helps to provide solutions for sales, service, marketing, and product teams, and optimize the customer life cycles. The use cases for these applications center on user experience, and primarily affect customer service and customer relationship management functions of an organization. 21. What is Web content mining? How can it be used for competitive advantage? Web content mining refers to the extraction of useful information from Web pages. The documents may be extracted in some machine-readable format so that automated techniques can generate some information about the Web pages. Collecting and mining Web content can be used for competitive intelligence (collecting intelligence about competitors’ products, services, and customers), which can give your organization a competitive advantage. 22. What is Web structure mining? How does it differ from Web content mining? Web structure mining is the process of extracting useful information from the links embedded in Web documents. By contrast, Web content mining involves analysis of the specific textual content of web pages. So, Web structure mining is more related to navigation through a website, whereas Web content mining is more related to text mining and the document hierarchy of a particular web page. Section 7.8 Review Questions 1. What is a search engine? Why are search engines critically important for today’s businesses?
  • 14. A search engine is a software program that searches for documents (Internet sites or files) based on the keywords (individual words, multi-word terms, or a complete sentence) that users have provided that have to do with the subject of their inquiry. This is the most prominent type of information retrieval system for finding relevant content on the Web. Search engines have become the centerpiece of most Internet-based transactions and other activities. Because people use them extensively to learn about products and services, it is very important for companies to have prominent visibility on the Web; hence the major effort of companies to enhance their search engine optimization (SEO). 2. What is a Web crawler? What is it used for? How does it work? A Web crawler (also called a spider or a Web spider) is a piece of software that systematically browses (crawls through) the World Wide Web for the purpose of finding and fetching Web pages. It starts with a list of “seed” URLs, goes to the pages of those URLs, and then follows each page’s hyperlinks, adding them to the search engine’s database. Thus, the Web crawler navigates through the Web in order to construct the database of websites. 3. What is “search engine optimization”? Who benefits from it? Search engine optimization (SEO) is the intentional activity of affecting the visibility of an e-commerce site or a website in a search engine’s natural (unpaid or organic) search results. It involves editing a page’s content, HTML, metadata, and associated coding to both increase its relevance to specific keywords and to remove barriers to the indexing activities of search engines. In addition, SEO efforts include promoting a site to increase its number of inbound links. SEO primarily benefits companies with e-commerce sites by making their
  • 15. pages appear toward the top of search engine lists when users query. 4. What things can help Web pages rank higher in search engine results? Cross-linking between pages of the same website to provide more links to the most important pages may improve its visibility. Writing content that includes frequently searched keyword phrases, so as to be relevant to a wide variety of search queries, will tend to increase traffic. Updating content so as to keep search engines crawling back frequently can give additional weight to a site. Adding relevant keywords to a Web page’s metadata, including the title tag and metadescription, will tend to improve the relevancy of a site’s search listings, thus increasing traffic. URL normalization of Web pages so that they are accessible via multiple URLs. Using canonical link elements and redirects can help make sure links to different versions of the URL all count toward the page’s link popularity score. Section 7.9 Review Questions 1. What are the three types of data generated through Web page visits? · Automatically generated data stored in server access logs, referrer logs, agent logs, and client-side cookies · User profiles · Metadata, such as page attributes, content attributes, and usage data. 2. What is clickstream analysis? What is it used for?
  • 16. Analysis of the information collected by Web servers can help us better understand user behavior. Analysis of this data is often called clickstream analysis. By using the data and text mining techniques, a company might be able to discern interesting patterns from the clickstreams. 3. What are the main applications of Web mining? · Determine the lifetime value of clients. · Design cross-marketing strategies across products. · Evaluate promotional campaigns. · Target electronic ads and coupons at user groups based on user access patterns. · Predict user behavior based on previously learned rules and users’ profiles. · Present dynamic information to users based on their interests and profiles. 4. What are commonly used Web analytics metrics? What is the importance of metrics? There are four main categories of Web analytic metrics: · Website usability: How were they using my website? These involve page views, time on site, downloads, click map, and click paths. · Traffic sources: Where did they come from? These include referral websites, search engines, direct, offline campaigns, and online campaigns.
  • 17. · Visitor profiles: What do my visitors look like? These include keywords, content groupings, geography, time of day, and landing page profiles. · Conversion statistics: What does all this mean for the business? Metrics include new visitors, returning visitors, leads, sales/conversions, and abandonments. These metrics are important because they provide access to a lot of valuable marketing data, which can be leveraged for better insights to grow your business and better document your ROI. The insight and intelligence gained from Web analytics can be used to effectively manage the marketing efforts of an organization and its various products or services. Section 7.10 Review Questions 1. What is meant by social analytics? Why is it an important business topic? From a philosophical perspective, social analytics focuses on a theoretical object called a “socius,” a kind of “commonness” that is neither a universal account nor a communality shared by every member of a body. Thus, social analytics in this sense attempts to articulate the differences between philosophy and sociology. From a BI perspective, social analytics involves “monitoring, analyzing, measuring and interpreting digital interactions and relationships of people, topics, ideas and content.” In this perspective, social analytics involves mining the textual content created in social media (e.g., sentiment analysis, natural language processing) and analyzing socially established networks (e.g., influencer identification, profiling, prediction). This is an important business topic because it helps companies gain insight about existing and potential customers’ current and future behaviors, and about the likes and dislikes toward a firm’s products and services.
  • 18. 2. What is a social network? What is the need for SNA? A social network is a social structure composed of individuals/people (or groups of individuals or organizations) linked to one another with some type of connections/relationships. Social network analysis (SNA) is the systematic examination of social networks. Dating back to the 1950s, social network analysis is an interdisciplinary field that emerged from social psychology, sociology, statistics, and graph (network) theory. 3. What is social media? How does it relate to Web 2.0? Social media refers to the enabling technologies of social interactions among people in which they create, share, and exchange information, ideas, and opinions in virtual communities and networks. It is a group of Internet-based software applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of user-generated content. 4. What is social media analytics? What are the reasons behind its increasing popularity? Social media analytics refers to the systematic and scientific ways to consume the vast amount of content created by Web- based social media outlets, tools, and techniques for the betterment of an organization’s competitiveness. Data includes anything posted in a social media site. 5. How can you measure the impact of social media analytics? First, determine what your social media goals are. From here, you can use analysis tools such as descriptive analytics, social network analysis, and advanced (predictive, text examining content in online conversations), and ultimately prescriptive
  • 19. analytics tools. ANSWERS TO APPLICATION CASE QUESTIONS FOR DISCUSSION( ( Application Case 7.1: Netflix: Using Big Data to Drive Big Engagement: Unlocking the Power of Analytics to Drive Content and Consumer Insight 1. What does Netflix do? How did they evolve into this current business model? Netflix is a provider and creator of streaming digital content for end-users. The company began as a subscription-based DVD rental company and expanded into digital streaming. Changes in the digital streaming market made it more economically beneficial for the company to also expand into content creation. 2. In the case of Netflix, what was it meant to be data-driven and customer-focused? Four Netflix, being data-driven means that decisions on new content being created or licensed (as well as maintaining existing licenses) should be based on data directly related to customer preferences (based on actual use as well as preferred genres). The decision on how to spend these limited licensing dollars must be made to create the greatest benefit for the greatest number of users in order to maintain current subscriptions as well as drive new sign-ups. This focus on customer demands also meets the aspect of customer focus. 3. How did Netflix use Teradata technologies in its analytics endeavors? The Teradata solution provides two distinct advantages for Netflix. The first is the ability to use an existing, robust cloud- based system to perform analytics functions. This provides
  • 20. security both in terms of redundancy as well as confidence in the technology itself. The second is the ability to use Teradata’s robust discovery engine to perform analytics functions to create a better understanding of customer types and their preferences. Application Case 7.2: AMC Networks Is Using Analytics to Capture New Viewers, Predict Ratings, and Add Value for Advertisers in a Multichannel World 1. What are the common challenges that broadcasting companies are facing today? How can analytics help to alleviate these challenges? Broadcasters are faced with the need to maintain the attention of viewers by providing quality program that meets their current desires, desires that are always in flux. An analytics system can help to alleviate these challenges by evaluating viewers current tastes and desires for future programming. 2. How did AMC leverage analytics to enhance its business performance? AMC has been able to successfully leverage analytics to better understand their customers and use this understanding to market effectively to them as well as ensure that they provide content that meets viewers desires. These analytics systems allow the broadcaster to better understand what different customers want in specific groupings, but also how to market their services effectively to individuals within these groupings. 3. What were the types of text analytics and text minisolutions developed by AMC networks? Can you think of other potential uses of text mining applications in the broadcasting industry? These details are not discussed in the case, but students may be able to identify some of this information through independent research. In general, the analytics being used incorporate data
  • 21. from internal and external sources to determine viewership of individual programs as well as purchases of digital programs. Application Case 7.3: Mining for Lies 1. Why is it difficult to detect deception? Humans tend to perform poorly at deception-detection tasks. This phenomenon is exacerbated in text-based communications. Although some people believe that they can readily identify those who are not being truthful, a summary of deception research showed that, on average, people are only 54 percent accurate in making veracity determinations. 2. How can text/data mining be used to detect deception in text? Through a process known as message feature mining, statements are transcribed for processing, then cues are extracted and selected. Text processing software identifies cues in statements and generates quantified cues. Classification models are trained and tested on quantified cues, and based on this, statements are labeled as truthful or deceptive (e.g., by law enforcement personnel). The feature-selection methods along with 10-fold cross-validation allow researchers to compare the prediction accuracy of different data mining methods (for example, neural networks). 3. What do you think are the main challenges for such an automated system? One challenge is that the training the system depends on humans to ascertain the truthfulness of statements in the training data itself. You can’t know for sure whether these statements are true or false, so you may be using incorrect training samples when “teaching” the machine learning system to predict lies in new text data. (This answer will vary by student.) Application Case 7.4: The Magic Behind the Magic: Instant Access to Information Helps the Orlando Magic Up their Game and the Fan’s Experience 1. According to the application case, what were the main
  • 22. challenges the Orlando Magic was facing? The primary challenges are both financial, with a need to maximize both ticket sales and concessions/merchandising. These challenges can be addressed by tailoring services to meet the needs of customers. 2. How did analytics help the Orlando Magic to overcome some of its most significant challenges on and off the court? In order to meet the challenge of ensuring that season ticket purchases continue, the team used a predictive analysis decision tree model to identify customers that may not renew, and those individuals could receive additional sales and marketing attention. Additionally, coaching staff use analytic tools to better understand all aspects of the game itself including player strengths and weaknesses and opponent strategies. 3. Can you think of other uses of analytics in sports and especially in the case of the Orlando Magic? You can search the Web to find some answers to this question. Student research and responses will vary but may include the idea to use analytics when selecting new players or determining if players ready for contract renewal should be retained. Application Case 7.5: Research Literature Survey with Text Mining 1. How can text mining be used to ease the task of literature review? Text mining enables a semiautomated analysis of large volumes of published literature. Clustering was used in this study to identify the natural groupings of the articles and list the most descriptive terms that characterized those clusters. This led to discovery and exploration of interesting patterns using tabular
  • 23. and graphical representation of their findings. Use of text and data mining can thus speed up and simplify the literature review process for academic researchers. 2. What are the common outcomes of a text mining project on a specific collection of journal articles? Can you think of other potential outcomes not mentioned in this case? Common outcomes include identifying natural clusters of similar articles, helping to identify the optimal number of cluster classifications. Using text mining, you can answer questions such as “Are there clusters that represent different research themes specific to a single journal?” and “Is there a time-varying characterization of those clusters?” Text mining also has other possible applications in literature reviews. For example, sentiment analysis can help to identify positive and negative judgments. Text mining can be used to build taxonomies of concepts and terms within and between research articles. You can find common themes by author as well as by journal. Application Case 7.6: Creating a Unique Digital Experience to Capture Moments That Matter at Wimbledon 1. How did Wimbledon use analytics capabilities to enhance viewers’ experience? Wimbledon undertook a number of analytics projects in order to enhance the viewer experience for the tournament. One project was a redesign of the website for the tournament. Based on their understanding of current users they were able to determine that the majority of users were still accessing the site using desktop browsers, and so were able to work on optimizing the experience for that platform (as well as accommodating the growth in mobile users). Next they were able to tap in to a huge
  • 24. amount of data that was being collected or had been collected over time. Data and analysis from the match came in in real time and was analyzed and displayed as information to viewers. Additionally, NLP systems allowed the tournament to search its vast archives of facts and details that could be retrieved and presented in real time. Finally, Wimbledon used their understanding of the potential demand for this service to make appropriate decisions about housing both the broadcasting and analytics portions of the service in the cloud to provide the necessary bandwidth and security. 2. What were the challenges, proposed solution, and obtained results? Many of the challenges were technical, but those were overcome through the use of existing, trusted IT partners to provide services and create a unique digital presence. The results were very positive, with a significant number of online visitors and a 98% growth in total visits over the previous year. Application Case 7.7: Delivering Individualized Content and Driving Digital Engagement: How Barbour Collected More Than 49,000 New Leads in One Month with Teradata Interactive 1. What does Barbour do? What was the challenge Barbour was facing? Barbour is an English heritage and lifestyle brand renowned for waterproof outerwear that has recently expanded into other luxury fashion goods. The company’s challenge was establishing a direct relationship with its customers because in the past all sales had been made through distributors. While in e-commerce site was launched in 2013, connecting with customers in a crowded digital market space was difficult. 2. What was the proposed analytics solution? The company worked with partner Teradata to create a lead
  • 25. nurturing program. The goal was both to drive immediate sales as well as to gather information that could create more meaningful long-term relationships with customers. The company had access to historical customer information that was generally seen as incomplete. Teradata was able to begin with this information and create marketing campaigns that also collected additional data that allowed for future, more personalized content to be driven to customers through a customer lifecycle program. These activities were integrated with existing marketing and social media campaigns. 3. What were the results? The results were very positive and over a one-month period the company was able to collect more than 49,700 leads within their primary European regions. In addition to very positive responses (60% click through rates) the company was also able to collect and begin analysis on additional customer data that can be used in the lifecycle program. Application Case 7.8: Tito’s Vodka Establishes Brand Loyalty with an Authentic Social Strategy 1. How can social media analytics be used in the consumer products industry? Social media analytics can be incredibly useful for consumer products because it allows the producers/retailers to better understand their customers and connect with them individually based on their preferences. 2. What do you think are the key challenges, potential solutions, and probable results in applying social media analytics in consumer products and services firms? It’s Student opinions will vary, but responses will focus on: · Challenges - crowded market spaces with data in multiple
  • 26. formats from multiple sources · potential solutions - focus on analytics and the ability to better understand customer needs at both the aggregate and individual level · probable results - can be very positive if analytic efforts are successful in identifying customer preferences and targeting individual customers or unique groups with marketing or social media content that drives interest in the brand ANSWERS TO END OF CHAPTER QUESTIONS FOR DISCUSSION( ( ( 1. Explain the relationship among data mining, text mining, and sentiment analysis. Technically speaking, data mining is a process that uses statistical, mathematical, and artificial intelligence techniques to extract and identify useful information and subsequent knowledge (or patterns) from large sets of data. Data mining is the general concept. Text mining is a specific application of data mining: applying it to unstructured text files. Sentiment analysis is a specialized form of text and data mining that identifies and classifies terms in text sources according to sentiment (e.g. judgment, opinion, and emotional content). 2. In your own words, define text mining, and discuss its most popular applications. Students’ answers will vary. 3. What does it mean to induce structure into text-based data? Discuss the alternative ways of inducing structure into them. Text mining, like other data mining approaches, are inductive
  • 27. approaches for finding patterns and trends in data. One difference between text mining and other data mining approaches is the use of natural language processing. Text-based data is inherently unstructured and must be converted to a structured format for predictive modeling or other type of analysis. The third step of the 3-step text mining process utilizes inductive algorithms to classify and structure the corpus of text sources so that knowledge can be extracted. Four possible approaches for inducing structure in text in order to extract knowledge are (a) classification (grouping terms into predefined categories), (b) clustering (coming up with “natural” groupings), (c) association rule learning (finding frequent combinations of terms), and (d) trend analysis (recognizing concept distributions based on specific collections of documents). 4. What is the role of NLP in text mining? Discuss the capabilities and limitations of NLP in the context of text mining. NLP is an important component of text mining. It studies the problem of “understanding” the natural human language, with the view of converting depictions of human language (such as textual documents) into more formal representations (in the form of numeric and symbolic data) that are easier for computer programs to manipulate. The goal of NLP is to move beyond syntax-driven text manipulation (which is often called “word counting”) to a true understanding and processing of natural language that considers grammatical and semantic constraints as well as the context. 5. List and discuss three prominent application areas for text mining. What is the common theme among the three application areas you chose? National security and counterterrorism: The FBI’s system is
  • 28. expected to create a gigantic data warehouse along with a variety of data and text mining modules to meet the knowledge discovery needs of federal, state, and local law enforcement agencies. Biomedical: A system extracts disease–gene relationships from literature accessed via MEDLINE. Marketing: Text mining can be used to increase cross-selling and up-selling by analyzing the unstructured data generated by call centers. The value of all of these applications is significantly increased by extracting knowledge from huge volumes of text-data and documents through the use of text mining tools. 6. What is sentiment analysis? How does it relate to text mining? Sentiment analysis tries to answer the question, “What do people feel about a certain topic?” by digging into opinions of many using a variety of automated tools. It is also known as opinion mining, subjectivity analysis, and appraisal extraction. Sentiment analysis shares many characteristics and techniques with text mining. However, unlike text mining, which categorizes text by conceptual taxonomies of topics, sentiment classification generally deals with two classes (positive versus negative), a range of polarity (e.g., star ratings for movies), or a range in strength of opinion. 7. What are the common challenges with which sentiment analysis deals? Sentiment that appears in text comes in two flavors: explicit, where the subjective sentence directly expresses an opinion
  • 29. (“It’s a wonderful day”), and implicit, where the text implies an opinion (“The handle breaks too easily”). Implicit sentiment analysis is harder to analyze because it may not include words that are obviously evaluations or judgments. Another challenge involves the timeliness of collection/analysis of textual data coming from a wide variety of data sources. A third challenge is the difficulty of identifying whether a piece of text involves sentiment or not, especially with implicit sentiment analysis. The same sorts of issues involving text mining in natural language settings also apply to sentiment analysis. 8. What are the most popular application areas for sentiment analysis? Why? Customer relationship management (CRM) and customer experience management are popular “voice of the customer (VOC)” applications. Other application areas include “voice of the market (VOM)” and “voice of the employee (VOE).” 9. What are the main steps in carrying out sentiment analysis projects? The first step when performing sentiment analysis of a text document is called sentiment detection, during which text data is differentiated between fact and opinion (objective vs. subjective). This is followed by negative-positive (N-P) polarity classification, where a subjective text item is classified on a bipolar range. Following this comes target identification (identifying the person, product, event, etc. that the sentiment is about). Finally come collection and aggregation, in which the overall sentiment for the document is calculated based on the calculations of sentiments of individual phrases and words from the first three steps. 10. What are the two common methods for polarity identification? Explain.
  • 30. Polarity identification can be done via a lexicon (as a reference library) or by using a collection of training documents and inductive machine learning algorithms. The lexicon approach uses a catalog of words, their synonyms, and their meanings, combined with numerical ratings indicating the position on the N-P polarity associated with these words. In this way, affective, emotional, and attitudinal phrases can be classified according to their degree of positivity or negativity. By contrast, the training-document approach uses statistical analysis and machine learning algorithms, such as neural networks, clustering approaches, and decision trees to ascertain the sentiment for a new text document based on patterns from previous “training” documents with assigned sentiment scores. 11. Discuss the differences and commonalities between text mining and Web mining. Text mining is a specific application of data mining: applying it to unstructured text files. Web mining is a specific application of data mining: applying it to information on and about the Web (content, structure, and usage). 12. In your own words, define Web mining, and discuss its importance. Students’ answers will vary. 13. What are the three main areas of Web mining? Discuss the differences and commonalities among these three areas. Web mining consists of three areas: Web content mining, Web structure mining, and Web usage mining. Web content mining refers to the automatic extraction of useful information from Web pages. It may be used to enhance search results produced by search engines.
  • 31. Web structure mining refers to generating interesting information from the links included in Web pages. Web structure mining can also be used to identify the members of a specific community and perhaps even the roles of the members in the community. Web usage mining refers to developing useful information through analysis of Web server logs, user profiles, and transaction information. 14. What is a search engine? Why is it important for businesses? A search engine is a software program that searches for documents (Internet sites or files) based on the keywords (individual words, multi-word terms, or a complete sentence) that users have provided that have to do with the subject of their inquiry. This is the most prominent type of information retrieval system for finding relevant content on the Web. Search engines have become the centerpiece of most Internet-based transactions and other activities. Because people use them extensively to learn about products and services, it is very important for companies to have prominent visibility on the Web; hence the major effort of companies to enhance their search engine optimization (SEO). 15. What is SEO? Who benefits from it? How? Search engine optimization (SEO) is the intentional activity of affecting the visibility of an e-commerce site or a website in a search engine’s natural (unpaid or organic) search results. It involves editing a page’s content, HTML, metadata, and associated coding to both increase its relevance to specific keywords and to remove barriers to the indexing activities of search engines. In addition, SEO efforts include promoting a
  • 32. site to increase its number of inbound links. SEO primarily benefits companies with e-commerce sites by making their pages appear toward the top of search engine lists when users query. 16.What is Web analytics? What are the metrics used in Web analytics? Web analytics, also called Web usage mining, can be considered a part of Web mining, and aims to describe what has happened on the website (employing a predefined, metrics-driven descriptive analytics methodology). There are four main categories of Web analytic metrics: · Website usability: How were they using my website? These involve page views, time on site, downloads, click map, and click paths. · Traffic sources: Where did they come from? These include referral websites, search engines, direct, offline campaigns, and online campaigns. · Visitor profiles: What do my visitors look like? These include keywords, content groupings, geography, time of day, and landing page profiles. · Conversion statistics: What does all this mean for the business? Metrics include new visitors, returning visitors, leads, sales/conversions, and abandonments. These metrics are important because they provide access to a lot of valuable marketing data, which can be leveraged for better insights to grow your business and better document your ROI. The insight and intelligence gained from Web analytics can be used to effectively manage the marketing efforts of an organization and its various products or services.
  • 33. 17. Define social analytics, social network, and social network analysis. What are the relationships among them? Social analytics involves monitoring, analyzing, measuring, and interpreting digital interactions and relationships of people, topics, ideas, and content. It involves mining the textual content created in social media (e.g., sentiment analysis, natural language processing) and analyzing socially established networks (e.g., influencer identification, profiling, prediction). A social network is a social structure composed of individuals/people (or groups of individuals or organizations) linked to one another with some type of connections/relationships. Social network analysis (SNA) is the systematic examination of social networks, and is an interdisciplinary field that emerged from social psychology, sociology, statistics, and graph (network) theory. Social analytics therefore combines text analysis for content and sentiment in online communications with social network analysis to identify and analyze relationships between individuals in a community. 18. What is social media analytics? How is it done? Who does it? What comes out of it? Social media analytics refers to the systematic and scientific ways to consume the vast amount of content created by Web- based social media outlets, tools, and techniques for the betterment of an organization’s competitiveness. It is done using many analytic methods, including text mining, sentiment analysis, and social network analysis. Companies use it to get a better understanding of their customer base, and can gain financial and competitive advantages from doing so. Governments use it to track potential terrorist threats, which can lead to enhanced national security. Social scientists use it to get a better understanding of how communities and societies
  • 34. work, which can provide guidance on how to best manage these societies. ANSWERS TO END OF CHAPTER Exercises( ( Teradata University Network (TUN) and Other Hands-on Exercises 1. Visit teradatauniversitynetwork.com. Identify cases about text mining. Describe recent developments in the field. If you cannot find enough cases at the Teradata University Network Web site, broaden your search to other Web-based resources. Student selection of cases will vary and create differences in reports. 2. Go to teradatauniversitynetwork.com to locate white papers, Web seminars, and other materials related to text mining. Synthesize your findings into a short written report. Student selection and perspectives on different reports will generate a variety of findings. 3. Go to teradatauniversitynetwork.com and find the case study named “eBay Analytics.” Read the case carefully and extend your understanding of it by searching the Internet for additional information, and answer the case questions. Student interpretation and analysis of the information will vary. 4. Go to teradatauniversitynetwork.com and find the sentiment analysis case named “How Do We Fix an App Like That?” Read the description, and follow the directions to download the data and the tool to carry out the exercise. Work on this exercise will vary. 5. Visit teradatauniversitynetwork.com. Identify cases about Web mining. Describe recent developments in the field. If you cannot find enough cases at the Teradata University Network Web site, broaden your search to other Web-based resources. 6.
  • 35. Browse the Web and your library’s digital databases to identify articles that make the linkage between text/Web mining and contemporary business intelligence systems. Student selection of cases and their analysis will differ. Team Assignments and Role-Playing Projects 1. Examine how textual data can be captured automatically using Web-based technologies. Once captured, what are the potential patterns that you can extract from these unstructured data sources? Team selections of technology will create variations in their reports. 2. Interview administrators at your college or executives in your organization to determine how text mining and Web mining could assist them in their work. Write a proposal describing your findings. Include a preliminary cost–benefit analysis in your report. Team interviews of administrators will vary and create differences in their findings. 3. Go to your library’s online resources. Learn how to download attributes of a collection of literature (journal articles) in a specific topic. Download and process the data using a methodology similar to the one explained in Application Case 7.5. Processes at individual institutions will vary as well the articles downloaded. 4. Find a readily available sentiment text data set (see Technology Insights 7.2 for a list of popular data sets) and download it onto your computer. If you have an analytics tool that is capable of text mining, use that. If not, download RapidMiner (http://rapid-i.com) and install it. Also install the Text Analytics add-on for RapidMiner. Process the downloaded
  • 36. data using your text mining tool (i.e., convert the data into a structured form). Build models and assess the sentiment detection accuracy of several classification models (e.g., support vector machines, decision trees, neural networks, logistic regression). Write a detailed report in which you explain your findings and your experiences. Selection of different data sets will result in different models and variances in the final report. 5. Examine how Web-based data can be captured automatically using the latest technologies. Once captured, what are the potential patterns that you can extract from these content-rich, mostly unstructured data sources? Selection of technologies will vary based on student preferences in the data search. Internet Exercises 1. Find recent cases of successful text mining and Web mining applications. Try text and Web mining software vendors and consultancy firms and look for cases or success stories. Prepare a report summarizing five new case studies. Student research and reports will vary. 2. Go to statsoft.com. Select Downloads, and download at least three white papers on applications. Which of these applications might have used the data/text/Web mining techniques discussed in this chapter? Student research and reports will vary. 3. Go to sas.com. Download at least three white papers on applications. Which of these applications might have used the data/text/Web mining techniques discussed in this chapter? Student research and reports will vary. 4. Go to ibm.com. Download at least three white papers on
  • 37. applications. Which of these applications might have used the data/text/Web mining techniques discussed in this chapter? Student research and reports will vary. 5. Go to teradata.com. Download at least three white papers on applications. Which of these applications might have used the data/text/Web mining techniques discussed in this chapter? Student research and reports will vary. 6. Go to clarabridge.com. Download at least three white papers on applications. Which of these applications might have used text mining in a creative way? Student research and reports will vary. 7. Go to kdnuggets.com. Explore the sections on applications as well as software. Find names of at least three additional packages for data mining and text mining. Student research and reports will vary. 8. Survey some Web mining tools and vendors. Identify some Web mining products and service providers that are not mentioned in this chapter. Student research and reports will vary. 9. Go to attensity.com. Download at least three white papers on Web analytics applications. Which of these applications might have used a combination of data/ text/Web mining techniques? Student research and reports will vary. Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall 24 Copyright © 2019 Pearson Education, Inc.
  • 38. A study into peer-to-peer network structure for online games The classic model for online game’s network is client-server (CS) model. The problems that the model faces are bottlenecks, single points of failure and poor flexibility (Ramakrishna,2006). Peer-to-peer (P2P) model brings another way to structure networks. The problems and shortage of CS model can be solved by some of the P2P models. However, P2P models have shortages like security loopholes that is not researched thoroughly. Also, P2P models are not put into practical use since researches are not enough to support a well-designed P2P structure for online games. This essay focuses on the merits and expectations of using P2P model for online games, including introduction of practical CS models and experimental P2P structures. Furthermore, the limitations and expectations of these P2P structures are discussed. Although classic CS network structure is popular in online game design, it is facing serious issues of complexity and scalability. A CS network is a network that consists of two parts: clients and servers. For online games, Yahyavi and Kemme (2013) describes the server as machines that contain the master copies of all variable objects and control the game world, and clients as machines to receive information about the game world. Also, clients need to send updates to servers. Most of the existing online games are based on CS network, for example, the famous Massively Multi-player Online Role-Play Game (MMORPG) World of Warcraft is using multiple CS network to support its 12 million subscribed accounts (Liu, et al., 2015). Basic CS models are popular for their simplicity and control efficiency (Yahyavi and Kemme, 2013), although suffering from scalability, error tolerance and cost. Advanced CS model, taking multi-server model as an example, improved its scalability and fault tolerance, but at the price of isolating players in different regions (the players using European servers of World of Warcraft cannot interact with players using Asian servers), complexity and cost. It is clear that the CS model cannot avoid
  • 39. the high cost of servers, as also claimed by Liu, et al. (2015), and cannot fulfill simplicity and scalability at the same time. Also, as Gibson and Vasconcelos (2018) suggested, the CS structure faces the problem of single point of failure, which means the whole network would collapse if the server goes down. Ramakrishna, et al. (2006) also mentioned network throughput bottlenecks and poor flexibility, but the problems are now seen as trivial matters due to high-speed internet and improved network structures. In conclusion, CS networks are well-developed models, yet are facing development bottlenecks. Researchers are using P2P network to break through the bottlenecks CS networks are facing, yet there exist many challenges for putting P2P network into practical use. P2P network has P2P network is a network in which every machine participates in providing information and calculating data for the whole network (Cassavia, 2018). The role ‘server’ which is played by servers in client-server model is taken by the whole community. Each peer (end device) takes charge of a part of the game world (Yahyavi and Kemme, 2013). The machines are clients and servers at the same time. With the highest potential for scalability, P2P structure can allow any number of players into the game without causing any extra load for the game provider. However, Yahyavi and Kemme (2013) claims that P2P networks face challenges like difficulty in version control, low defense level against malware and cheating. The latter problem is also highlighted by Meng (2018), who designed a model to solve the issue. P2P models in online games are feasible according to various researches with models and relevant tests (Yahyavi and Kemme; Cassavia; Barri, et al., 2016; Gibson, 2018; Liu, et al., 2015), yet are not well-established as runnable game structures. P2P game architectures have many sorting methods that allow systematic analysis. Yahyavi and Kemme (2013) sorted P2P game architectures into three kinds: structured P2P architecture, unstructured P2P architecture and hybrid P2P architectures. Except for the three kinds of P2P architectures, there exist
  • 40. another kind of P2P structure: hybrid CS/P2P network. The hybrid CS/P2P model has the most similar structure in P2P networks to CS models, such that the application of this model is highly feasible. Barri, et al. (2014) developed a CS/P2P network structure for MMORPGs. The MMORPG is divided into two parts: main game and auxiliary game. As defined, the main game is a constantly existing game world with lower player interaction, while the auxiliary game is where a few players temporarily gather as a group with dense player interaction. Barri, et al. decided to use CS model for the main game and P2P for the auxiliary game. The two models are combined by a mapping mechanism, which moves the P2P structure among servers in CS model, reducing the redundancy of communications. This model has high flexibility, scalability and efficiency by test, but fails to examine the problems of churns and cheating. Also, this model is only estimated in a single game genre, so its performance in other types of games is unknown. Pure P2P structures are less familiar, but provide new ways to improve online game architectures. Structured P2P architectures are supported by specific graph structures and mechanisms such as Distributed Hash Table (DHT) that allows every node to send a message to or find other nodes (Yahyavi and Kemme, 2013). Each node is bind with a unique key, such that it could be found, created or deleted with low overhead. Yahyavi and Kemme selected SimMud as an example. Players are distributed into different regions and assigned NodeID. The regions are connected by using ‘the peer with the NodeID closest to the RegionID’ as the leader (Yahyavi and Kemme, pp.18). When a new player joins in, the coordinator who holds the master copy sends its data to the player. When updating the game, the game provider updates the coordinator’s data. The example demonstrates that structured P2P architectures have high efficiency due to their specially optimized connections between nodes, yet fails to examine the problem of malware attack and cheating.
  • 41. Unstructured P2P architectures provide examples of structures without global mechanisms. Mutual notification system is used instead of global mechanisms in unstructured P2P architectures (Yahyavi and Kemme, 2013). Liu, et al. (2015) described a protocol QuOn based on quad-tree structure. In this model, ‘binding neighbors’ is the key of the mutual notification system. Messages are unicast from one node to another, and nodes are connected through the ‘binding neighbors’ method between neighbors. Backups are needed in case the current quad-tree is lost. Two sets of ‘variable control’ tests have been applied to the model, with results demonstrating high scalability. However, the efficiency is suboptimal and should be further researched. In addition, Liu, et al. did not analyze safety issues, such that this protocol cannot be applied to game applications. Few P2P models focused on safety issues, because dealing with safety loopholes decrease the efficiency of the models (Meng, 2018). As a solution to the dilemma, Meng created a trust model called speedTrust inspired by the trust relationship between humans. By Yahyavi and kemme’s (2013) definition, speedTrust is also a hybrid (structured combined with unstructured) P2P model. In this hybrid P2P model, super peers are employed to supervise the communication between two nodes, rewarding the regulated peers and punishing malicious transactions using specific formulas. Also, the model abandoned iterative operations and instead use feedback reputations to calculate a peer’s trust value, decreasing the time complexity of the model. This model offers the best solution to the challenges of P2P model considering both efficiency and safety. However, it did not focus on scalability or game adaptability. If able to be combined with the models above properly, this hybrid model could be the perfect solution for peer-to-peer structures for online games. In conclusion, although research for peer-to-peer network online games exists, there is not a perfect solution for the challenges that peer-to-peer structures face. This essay mentioned four P2P structures: a hybrid CS/P2P method for
  • 42. MMORPG (Barri, et al., 2014); structured P2P architecture SimMud (Yahyavi and Kemme, 2013); unstructured P2P architecture QuOn (Liu, et al., 2015); trust model (also a hybrid P2P model) speedTrust (Meng, 2018). All of the methods solved the single point of failure that client-server model has, and most of the models can scale high and achieve high efficiency. Only one method focused on the safety issues, which are vital for providing safe and fair gaming experience. This method, however, is not applied to games and its scalability is not examined. Without providing an integrated solution to these challenges, the development of P2P online games could be hard and slow. Reference Barri I., Roig C. and Giné F. (2016) ‘Distributing Game Instances in A Hybrid Client-Server/P2P System to Support MMORPG Playability’, Multimed Tools Appl, 75, pp.2005-2029. Cassavia N., et al. (2018) ‘Distributed computing by leveraging and rewarding idling user resources from P2P networks’, J. Parallel Distrib. Comput., 122, pp.81-94. Gibson M. and Vasconcelos W. (2018) ‘A Knowledge-Based Approach to Multiplayer Games in Peer-to-Peer Networks’, Knowledge and Information Systems. Available at: https://doi.org/10.1007/s10115-018- 1295-6 (Accessed: 12 May 2019).
  • 43. Liu L. et al. (2015) ‘Performance Evaluation and Simulation of Peer-to-Peer Protocols for Massively Multiplayer Online Games’, Multimed Tools Appl, 74, pp.2763-2780. Meng X. (2018) ‘SpeedTrust: A Super Peer-guaranteed Trust Model in Hybrid P2P Networks’, The Journal of Supercomputing, 74(6), pp. 2553-2580. Ramakrishna, V. et al. (2006) ‘An Active Self-Optimizing Multiplayer Gaming Architecture’, Cluster Computing, 9(2), pp. 201-215. Yahyavi, A. and Kemme, B. (2013) ‘Peer-to-Peer Architectures for Massively Multiplayer Online Games: A Survey’, ACM Computing Surveys (CSUR), 46(1), pp. 1-51. 1 Final Research Essay – Guidelines for students Overview Your final research essay will test your ability to write about the importance of a concept, process or practice within your subject area. You will be required to: · give an extended definition of the concept, process or practice, considering different points of view · explain the main principles of the concept, process or practice · explain the significance of the concept, process or practice to the general public (or how it can be used in industry, manufacturing etc.)
  • 44. · demonstrate analytical and/or critical thinking skills Academic sources are peer-reviewed journal articles or passages from books/textbooks published by reputable university presses. Semi-academic articles include articles from publications such as New Scientist, The Economist or Financial Times. You should not cite unofficial websites or Wikipedia: these have not been peer-reviewed. You should only cite sources that have been written in English – for this assessment, you should not cite any sources written in Chinese that you have Assessment Criteria In order to pass this assessment, you need to demonstrate that you can: · understand a wide range of demanding, longer texts · use language flexibly and effectively for academic purposes · produce clear, well-structured, detailed writing on complex subjects · show controlled use of organisational patterns, connectors and cohesive devices Your teacher will review the assessment criteria with you. You will be assessed in the following areas: Argument/Task Fulfilment Language Use: Vocabulary Language Use: Grammar Structure and Coherence Academic Style and Conventions
  • 45. Guidelines Please note the following guidelines. 1. You should submit both the printed and electronic copies of your essay with an official cover page. Please include your question on the first page. 2. Your essay should be 1,500 words. Essays below the required length will lose marks. Please indicate the word count on the cover sheet. 3. You should refer to at least seven sources. At least four of these should be fully academic sources from peer-reviewed articles or passages from textbooks. However, up to three appropriate semi-academic sources articles from publications such as New Scientist, The Economist or Financial Times are also acceptable. You should not cite unofficial websites or Wikipedia. All sources should be in English. You should not translate sources from other languages. 4. Plagiarism will be heavily penalized. All essays will be checked by the originality checker Urkund. You must ensure that you fully acknowledge all sources, whether you paraphrase them or quote directly. 5. You should follow the assigned style guide throughout. For Finance students this is APA, for others it is Harvard. The most important thing is that you are consistent throughout. 6. References: Please ensure that you provide a list of references at the end of the paper. Make sure that all references cited in the text are included in the list of references, and that the list does not include any entries that have not been cited in the text. References should be listed alphabetically by author. In the case of more than one publication by the same author(s), list
  • 46. them chronologically. 7. Your assignment should be typed in Calibri, Times New Roman, or Arial font. Font size should be 12. Font colour should be black. Please either double-space your assignment or use 1.5 spacing. 8. Your title/question should be in bold. If you use section headings, these should also be in bold. 9. Please either indent your paragraphs (typically 1.25cm) or leave spaces between them. You should be consistent throughout the essay. 10. Please include page numbers at the bottom of each page. 11. You do not have to include an abstract or table of contents. Essay checklist Read through this essay and check for these possible problems. 1st Highlight anything that looks bad in red in the text and make some notes if you want. 2nd In the table below mark which areas the student needs to pay attention to specifically. Answer the questions No Yes Any notes or advice to help: Is there an introduction which introduces the topic and states why this topic is important and some background information?
  • 47. Is there a thesis statement? Are there key definitions given with some referencing? Are there clear paragraphs with topic sentences? Does the essay follow a logical order and is it easy to follow and understand? Is there a clear conclusion which relates back to the thesis statement and covers any limitations and solutions? Is the essay formatted correctly with coversheet, essay question, double spacing, 12 size font and correct style? Is there a reference list in alphabetical order and following the style guide? Do you think the writer has spell-checked and grammar checked
  • 48. using Grammarly.com? Are there in-text references that match all the references in the reference list? Has the writer used personal language? Can you see any phrasal verbs or evidence of informal style? Check the subject-verb-agreement – is this okay? Check the plural nouns and singular nouns – are they okay? Have you noticed any other issues? If so, give some examples.