2. Businesses and customers are connected by a click
• The Web has shortened the distance between a
business and its customers. It is just a click away.
• These clicks drive the economic models that support
our Web search engines and provide the economic
fuel for an increasing number of businesses.
• The click is at the heart of an economic engine that is
changing the nature of commerce with the near
instantaneous, real-time recording of customer
decisions to buy or not to buy
12/03/18 Professor V. Nagadevara
4. User Behaviors during a Searching Process
VIEW RESULTS Behavior in which the user viewed or scrolled one or more
pages from the results listing. If a results page was present
and the user did not scroll, we counted this as a View
Results Page
With Scrolling User scrolled the results page
Without Scrolling User did not scroll the results page
SELECTION Behavior in which the user makes a selection in the results
listing
Click URL (in results
listing)
Interaction in which the user clicked on a URL of one of the
results in the results page
Next in Set of
Results List
User moved to the Next results page
Previous in Set of
Results List
User moved to the Previous results page.
12/03/18 Professor V. Nagadevara
5. The Basic Premise
• The user-generated Internet data can provide
insight to understanding these users better or
point to needed changes or improvements to
existing Web systems
• It tells us the “what.”
• It does not give the insights into the
motivations or decision processes of that user.
12/03/18 Professor V. Nagadevara
6. Internet or Web Data
• Web Traffic Data
– Traditionally mined out of web server logs. Page
tags is recent phenomenon
• Web Transactional Data
– Information with respect to various transactions
– No. of customers, no. of orders, average order size
etc. It may also include customer demoraphics
12/03/18 Professor V. Nagadevara
7. Internet or Web Data
• Web Server Performance Data
– Web pages have many parts-text, scripts, images,
multimedia…
– Reassembled at the user end
– Higher the “weight” of the page, more is the time
to download and reassemble.
– The 10-second rule becomes important
12/03/18 Professor V. Nagadevara
8. The 10-second rule
• One tenth of second is the limit for a user to feel
that the system is responding “immediately”
• One second is the limit for the user’s flow to
remain uninterrupted, allowing him/her to
notice a delay
• Ten seconds is the upper limit for keeping users
focused on a single task
12/03/18 Professor V. Nagadevara
9. Table 1: Actions Taken After Abandoning Online
Search for Products
Did not Buy
Brought at
Brand Store
Bought at
different
website
Bought at
Discount
store
Bought from
paper
catelogue
All 34% 24% 14% 13% 7%
Age < 25 27% 40% 13% 13% 7%
Age 25 to 34 43% 20% 15% 10% 2%
Age 35 to 44 27% 30% 14% 13% 11%
Age 45 to 49 39% 18% 7% 25% 11%
Age 50 to 54 31% 23% 15% 12% 4%
Age 55+ 33% 7% 20% 13% 13%
Male 29% 24% 15% 19% 5%
Female 41% 21% 13% 7% 9%
12/03/18 Professor V. Nagadevara
10. Internet or Web Data
• User submitted data
– User entry forms (registration forms)
– Obtained from CRM, ERP systems (Customer
loyalty programs)
– Survey data
– Opinions and feedback
– Reviews
12/03/18 Professor V. Nagadevara
11. Data Collection
• Collect behavioral data using an application
that logs user behavior on the Website, along
with other associated measures
• But, this data is inaccurate
– Anonymous logging
– Common use computers
– Bots/Spiders
– cookies, internal visitors, caching servers, and
incorrect page tagging
– Error rates range from 5 to 10 percent
12/03/18 Professor V. Nagadevara
12. Web Log Data
• Trace Data: Internet customer interaction data from
Web systems (traces left behind that indicate human
behaviors)
• Unobtrusive: collection of the data does not interfere
with the natural flow of behavior and events in the
given context
• Nonreactive: there is no observer present where the
behaviors occur to affect the participants’ actions
• Inexpensive to collect (with transaction logging
software)
12/03/18 Professor V. Nagadevara
13. Criteria
• Credibility
– How trustworthy or believable the data collection
method is.
– The Analyst must ensure that the data collection
approach records the data needed to address the
underlying business questions.
12/03/18 Professor V. Nagadevara
14. Criteria
• Validity
– Internal validity: the extent to which the contents of the test,
method, analysis, or procedure measure what they are
supposed to measure.
– Content or construct validity: the extent to which the content
of the test, method, analysis, or procedure adequately
represents all that is required for validation (i.e., are you
collecting and accounting for all that you should collect and
account for).
– External validity: the extent to which one can generalize the
results across populations, situations, environments, and
contexts of the test, method, analysis, or procedure.
12/03/18 Professor V. Nagadevara
15. Criteria
• Reliability:
– is a term used to describe the stability of the
measurement.
– Essentially, reliability addresses whether the
measurement assesses the same thing, in the
same way, in repeated tests.
12/03/18 Professor V. Nagadevara
16. Credibility, validity, and reliability – Six Questions (Holst)
1. Which data is analyzed? The format and
content of recorded trace data.
– With transaction log software, this is much easier
than in other forms of trace data, as logging
applications can be reverse engineered to
articulate exactly what behavioral data is
recorded.
12/03/18 Professor V. Nagadevara
17. Credibility, validity, and reliability – Six Questions (Holst)
2. How is this data defined? Define each
trace measure in a manner that permits
replication on other systems and with other
users.
– As TLA has proliferated in a variety of venues,
more precise definitions of measures are
developing
12/03/18 Professor V. Nagadevara
18. Credibility, validity, and reliability – Six Questions (Holst)
3. What is the population from which the data
was drawn ? Identify the actors, both people and
systems, that created the trace data.
– With transaction logs on the Web, this is sometimes
a difficult issue to address, unless the system
requires some type of logon and these profiles are
then available.
– In the absence of these profiles, the analyst must
rely on demographic surveys, studies of the system’s
user population, or general Web demographics.
12/03/18 Professor V. Nagadevara
19. Credibility, validity, and reliability – Six Questions (Holst)
4. What is the context in which data is
analyzed? Explain the environmental, situational, and
contextual factors.
– Include information about the temporal factors of the data
collection (i.e., the date and time the data was recorded) and
the make-up of the system at the time of the data recording
– Transaction logs have the significant advantage of time
sampling. Record observations at predefined points of time
and then record the action that is taking place, using the
classification of action defined in the ethogram
12/03/18 Professor V. Nagadevara
20. Credibility, validity, and reliability – Six Questions (Holst)
5. What are the boundaries of the analysis?
Do not overreach with the business questions
and findings.
– The implications of the research are confined by
the data and the method of data collection.
– Transaction log data can clearly state whether or
not a user clicked on a link.
– Will not inform us as to why the user clicked on a
link. Was it intentional? Was it a mistake? Did the
user become sidetracked?
12/03/18 Professor V. Nagadevara
21. Credibility, validity, and reliability – Six Questions (Holst)
6. What is the target of the inferences?
Articulate the relationship among the separate
measures either to inform or to make inferences.
•Trace data can be used for both descriptive and predictive
purposes in terms of making inferences. These descriptions and
inferences can be at any level of granularity (i.e., individual,
collection of individuals, organization, etc.).
•Transaction log data is best used for aggregate level analysis. But,
with enough data at the individual level, one can tell a lot from log
data.
12/03/18 Professor V. Nagadevara
22. Analysis
• Generate proper metrics and KPIs
– Commercial sites: overall purchase conversions,
average order size, and items per order.
– Lead generation sites: Overall conversions, conversion
by campaigns, dropouts, and conversions of leads to
actual customers.
– Customer service sites: reducing expenses and
improving customer experiences.
– Advertising on content sites: visits per week, page
viewed per visit, visit length, advertising click ratio,
and ratio of new to returning visitors.
12/03/18 Professor V. Nagadevara
23. Analysis
• indirect analysis: The Analyst is able to collect
the data without introducing any formal
measurement procedure.
• TLA typically focuses on the interaction
behaviors occurring among the users, system,
and information. There are several examples
of utilizing transaction analysis as an indirect
approach
12/03/18 Professor V. Nagadevara
24. Analysis
• Context analysis: Analysis of text documents.
It can be quantitative, qualitative, or a mixed
methods.
– Purpose is to identify patterns in text. It is
unobtrusive and, can be a relatively rapid method
for analyzing large amounts of text. In Web
analytics, it typically focuses on search queries or
analysis of retrieved results. A variety of examples
are available in this area of transaction log
research
12/03/18 Professor V. Nagadevara
25. Analysis
• Secondary analysis: Makes use of already
existing sources of data.
– Refers to the re-analysis of quantitative data rather
than text analysis. Uses data that was collected by
others to address different research questions or
uses different methods of analysis
– Websites collect transaction log data for system
performance analysis. This can be used to address
other questions.
12/03/18 Professor V. Nagadevara
26. Actionable
• Action driven by the data that is in line with
the established KPI. (actionable outcomes)
– Publications that shed insight on user behavior, or
changes to some methods or system.
– In a business, calculated change to improve the
Website or business process that is directly
dependent on the KPI selected.
– generating additional revenue, reducing costs, or
improving the user experience
12/03/18 Professor V. Nagadevara
27. Web Mining
• Web Structure Mining
– Discovers useful knowledge from hyperlinks.
– Discover important web pages
– Discover communities which have common
interests
– Traditional data mining cannot perform this task!
12/03/18 Professor V. Nagadevara
28. Web Mining
• Web Content Mining
– Mining useful knowledge from web page contents
– Classifies or clusters similar web pages based on
content
– Extracts information about products, postings in
fora, customer reviews, discover customer
sentiments
– Traditional data mining can do this well
12/03/18 Professor V. Nagadevara
29. Web Mining
• Web Usage Mining
– Discovers user access patterns
– Uses web log data, click stream data, page tags
– Requires large amount of pre-processing
– Uses many traditional data mining techniques
12/03/18 Professor V. Nagadevara
30. Case Study 1 – Institute for Policy Studies
www.ips-dc.org
• Twelve months of data are used (2011)
• During that period, the IPS received 292,000
visits, 202,000 (69%) from new visitors,
• 16,000 (5.5%) of the visitors were repeat
visitors.
• Visitors came from around the world: 201
countries and territories. The United States
contributed the most traffic, accounting for
78% of the visitors.
12/03/18 Professor V. Nagadevara
31. Issues
• IPS conducts “ad campaigns” nearly every
week with limited success
• Devotes little attention to its loyal visitors
• Too much emphasis on social websites
– On an average day, of the 4711 Facebook users
who “like” the IPS’s site, only 4 people have
checked in. That same day, 16,000 people visited
the organization’s website on their own
12/03/18 Professor V. Nagadevara
32. Time Spent
• Returning visitors averaged 3.38 pages per visit
- new visitors’ 2.33 pages per visit.
• The average length of a visit is 8 min. Returning
visitors spent 15 min on average - new visitors’
average of 5 min,
• 140,000 new visitors leave immediately.
12/03/18 Professor V. Nagadevara
33. Implications
• The visitor loyalty data suggest that more than
10% (12.16%) of the site’s 292,000 visitors
return more than once per month, and 5.47%
of the site’s visitors (16,025 visitors) return
weekly or daily.
• From a strategic communication standpoint,
the number of loyal visitors suggests that
specific message content addressing the
interests of this segment should be developed
12/03/18 Professor V. Nagadevara
34. Implications
• Campaign traffic associated with ad words on
Google accounts for only 1.5% of the total
• Overall, the weekly campaigns have boosted
traffic from “new visitors.” But, bounce rate of
the “new visitors” generated by these
campaigns approaches 80–100% most weeks.
• Campaigns or the strategic messages need to
be reconfigured.
12/03/18 Professor V. Nagadevara
35. Implications
• The bounce rate of AdWord visitors is 7.4%
higher than the site average and the visitor
time on site is 27% lower. These are not IPS’
target
• Other referrals: Wordpress.org (19.48% bounce
rate), Hotsalsa.org (27.19% bounce rate),
Netvibes (34.51% bounce rate), and Wikipedia
(39.73% bounce rate)
12/03/18 Professor V. Nagadevara
36. Top Landing Pages
• Top landing pages were the front page (108,000 visits),
about/join us (19,000), reports/executive (13,000),
staff/Phyllis (5000), and staff/Bob (3000),
• Both “join us” and “executive” had bounce rates
exceeding 80%.
• The home page had only a 44% bounce rate, while staff/
Phyllis and staff/Bob had 56% and 59%. The site average
is 62.5%.
• Phyllis, Bob, and the home page are highly desirable
places to visit
12/03/18 Professor V. Nagadevara
37. Case Study 2 - City of Prague (Oklahoma)
www.CityofPragueOK.org
• A small town of approximately 2400 people
• Immigrants from the Czech Republic (former
Czechoslovakia) settled the town.
• Avatar Meher Baba Heartland Center
http://www.ambhc.org/
• Provides space for multiple story stubs, along
with photos
12/03/18 Professor V. Nagadevara
38. Visitors
• Data for six months
• Received 2559 visits. 2123 were unique visitors, i.e.,
17% (436) returned to the site more than once.
• Averaged 12.01 visits per day, high 32 and a low just 2.
• Overall bounce rate is 45.37%. Oklahoma viewers, the
bounce rate was only 39.71%.
• The bounce rate from the Czech Republic is 60.87%.
• More than half of the sites visitors spend some time on
the site. 20% of the town’s residents come back more
than once
12/03/18 Professor V. Nagadevara
39. Visitors
• Half of the site’s traffic comes from within the state of
Oklahoma (50.68%). An additional 13.13% visited the
site from California, Texas, and Kansas. Ninety percent
were from the US.
• Prague site also had visits from 36 countries.
• Oklahoma visitors spent an average of more than 2
min and viewed an average of 3.3 pages per visit.
Prague is a small enough city that IP addresses of
nearby residents do not sync with their town. Thus,
saying how many Oklahoma visitors visited the site
from in or near Prague is not possible.
12/03/18 Professor V. Nagadevara
40. Key Words
• Some combination of “Prague” and “Oklahoma”
(or “OK”) directed 479 (16%) of the visitors to
the site.
• Visitors also searched specifically for the city of
Prague Oklahoma (332 visitors), Prague lake
(102 visitors), and the Prague police department
(113 visitors). Thus, 40% of the key word
searches were by people who were interested in
information related to Prague, OK.
12/03/18 Professor V. Nagadevara
41. Top Pages
• The most popular page is the home page (2331
visits),
• PragueOK news page (506 visits), the directory
page (346 visits), the city’s contact information
page (296 visits), the police department’s page
(287 visits), the library’s page (262 visits), and
the calendar page (198 visits).
• Prague clearly serves a role in providing
information about a majority of city services.
12/03/18 Professor V. Nagadevara
42. Suggestions
• Creating a system whereby local agencies can
contribute content each week on their own to increase
content in general without increasing the workload of
the webmaster.
• Publicizing the website - add the URL to city stationery,
business cards, other agencies, the local paper, etc.
• Adding links on the home page to all city departments
and services.
• Adding links to local attractions (not many!).
• Clarifying the names of some of the sites pages to better
reflect the content.
12/03/18 Professor V. Nagadevara