Presentation done by Fidel Rebón, Gloria Ocáriz, Jon Argandoña, Jon Kepa Gerrikagoitia and Aurkene Alzua-Sorzabal, during "Intelligence & analytics" workshop, of the ENTER2015 eTourism conference.
Behaviour of virtual visitor based on eShop and DMO websites: A comparative study by means of data mining techniques
1. ENTER 2015 Research Track Slide Number 1
Behaviour of virtual visitor based on
eShop and DMO websites: A comparative
study by means of data mining
techniques
Fidel Rebón, Gloria Ocáriz, Jon Argandoña, Jon Kepa Gerrikagoitia and
Aurkene Alzua-Sorzabal
CICtourGUNE
Donostia - San Sebastián, Spain
fidelrebon@tourgune.org
http://www.tourgune.org
2. ENTER 2015 Research Track Slide Number 2
1. Introduction
2. Related Work
3. Methodology
4. Results and Discussion
5. Conclusion and future work
Agenda
3. ENTER 2015 Research Track Slide Number 3
Introduction
• Internet is providing a global channel for communicating, informing and
purchasing and therefore e-Marketing and e-Commerce activities must be adapted
to it.
• With the increasing diffusion of the Internet, DMOs need to understand the web
user’s behaviors in order to adapt their online marketing strategies to the
prospective customers’ preferences and requirements and by doing so, having an
impact on the traveler decision making process and engagement (Wang &
Fesenmaier, 2006; Xiang & Gretzel, 2010).
• Most of the DMOs are including products and experiences for sale on their web
portals, providing the final consumer contact with private operators that want to have
presence on the DMO portal, i.e Spain.info, Turismo de Tenerife, Barcelona
Turisme, I Amsterdam, Visit London,…
4. ENTER 2015 Research Track Slide Number 4
Introduction
•Destination Web Monitor (DWM) is a defined as “a system to measure, analyse, and
model the behaviour of visitors in different virtual areas in which a destination is
promoted and with the objective of providing benchmarking ratios that facilitate strategic
surveillance and intelligent marketing policies” (Alzua-Sorzabal, Gerrikagoitia, & Rebón,
2014:6).
•The primary objective of a eShop is merely commercial and the DMO website is rather
informative and persuasive.
•This research performs a comparative study between the virtual user behaviour of an
e-shop and a DMO web portal in order to know whether the behaviour of both
websites’ users.
5. ENTER 2015 Research Track Slide Number 5
Related Work
•Web personalization through clustering techniques has been dealt in some research by using Web
Usage Mining in order to discover patterns that are useful to categorize users with similar interests
(Castellano & Torsello, 2009; Zhang, Xu, & Zhou, 2005).
•Relations between the e-shop and the informational website’ users’ behaviour has been mentioned in
some studies (Lee, Qu, & Kim, 2007). This research declares a positive relation between the
consumer intention to search online information in depth and the execution of online purchases.
•Other studies focus on discovering user ́s interest on an e-commerce site using clickstream data and
these value the variables of navigation: category visiting path, browsing frequency and relative length
of access time (Chen & Su, 2013).
•However, clustering has not been performed to understand the differences between users of
different types of websites. Therefore, it is pertinent to conduct a research that analyses the
behaviours of different websites in order to understand differences for better engagement.
6. ENTER 2015 Research Track Slide Number 6
Methodology
3.1. Assumptions
3.2. Criteria for variable selection
3.3. Consolidated process for obtaining variables
3.4. Cluster selection software
7. ENTER 2015 Research Track Slide Number 7
Methodology
3.1. Assumptions
• The frequency of a user visiting a web page and the time that a user spends
on a web page is positive relevant to his/her interest.
• Every user has preference characteristics when he/she visits a website every
time.
• The sequence of the web pages visited by the user is related to his/her
interest.
• Users with similar interest should have similar browsing patterns (Chen &
Su, 2013).
8. ENTER 2015 Research Track Slide Number 8
Methodology
3.2. Criteria for variable selection:
The selected variables of the virtual user behavior on a website have been:
connection time, number of visits at a given time of connection, time length of
connection and number of pages/views or actions taken
Other variables:
•Location: highly depend on the international projection
•Devices or browser: depend on technology and the platform for which the portal
is developed
•Type of traffic: closely linked to marketing campaigns carried out on social
networks
•Bounce rate number: directly consequence of visits were distorted
9. ENTER 2015 Research Track Slide Number 9
Methodology
3.3. Consolidated process for obtaining variables:
ETL processETL process
WEB USAGE MINING
CLUSTERING
Destination Web Monitor: web behaviour measurement and analysis framework
Data collection period:
01/01/2014 - 01/04/2014
10. ENTER 2015 Research Track Slide Number 10
Methodology
3.4. Clustering model and software selection:
• The statistical R language has been chosen to apply the cluster model
algorithm. This language provides proficient features in portability,
computational efficiency and memory management (Ihaka, 2009).
• R supports several kinds of clustering techniques.
• R can be used together with some IDE (Integrated Development
Environment) like Rstudio.
• The selected clustering technique is K-means and the group selection
method is Pseudo-F value (Calinski & Harabasz, 1974)
11. ENTER 2015 Research Track Slide Number 11
Result and Discussion
Pseudo-F value (Calinski &
Harabasz, 1974) suggests 19 cluster
partition to K-means clustering.
The analysis is based on ratios per visit:
•Page Ratio: viewed pages
•Time Ratio: Time spent on the site
The observations are characterised by
an underlying behaviour rather than
by website's volume observations
12. ENTER 2015 Research Track Slide Number 12
Result and Discussion
The contingency table provided by the model for each of the clusters given the origin website:
• There is a clear separation of the origins in the clusters
• 4 of the 19 clusters account for 66% of the observations
13. ENTER 2015 Research Track Slide Number 13
Result and Discussion
DMO eShop
• DMO visitors are highly concentrated in 4
categories.
• 3 of the clusters do not contain DMO
observations at all. DMO users and e-shop behave
different from each other.
• page ratio: 6,6 views
• time ratio: 6,2 mins
•The e-shops are more dispersed over the 19 clusters
•60% of the group in 12 of the 19 clusters. This
suggests that there is a wider variety of users in an e-
shop.
•page ratio: 11,1 views
•time ratio: 8,1 mins
14. ENTER 2015 Research Track Slide Number 14
Result and Discussion
• DMO users (triangle shaped)
tend to visit more pages than the
e-shop users and have a
balanced time usage per visited
page
• e-Shop users (cross shaped)
show less balanced time usage
behaviors
15. ENTER 2015 Research Track Slide Number 15
Conclusion and future work
• The visitors’ behaviour in both platforms is different.
• This implies that appropriate e-marketing policies should be developed
targeting each of the detected behaviors differentiating the website.
• The number of users of an e-Shop does not experiment a pronounced
renovation, so the audience is more loyal than the one in a DMO
• Future works should look for using the detected users’ Typologies and
categorizing them depending on the behaviour of the user and the type of web
the user is navigating on.
• With this knowledge the web owners would find out to whom, to what, to how
and to when to refer to the visitor, ergo, to know the “consumer decision
journey” and strengthen their engagement.
Editor's Notes
The reason why per visit ratios were used instead of the raw variable values is that the observations would have been characterised by the website's volume observations rather than by an underlying behaviour.
En este articulo probó el más habitual, el k-means (teníamos poco tiempo para hacer pruebas, recuerda que el artículo se montó en tres días y Jon llevaba con nosotros poco tiempo) y claramente arrojaba la diferencia de comportamiento tan dispar entre los dos sites que no buscamos otras soluciones.
El pseudo-F lo aprendimos con Jon. Hasta el momento cuando hacíamos clusterizaciones teníamos que ir probando a hacer grupos manualmente (forzando éstos y quedándonos con el que mejor resultado nos daba). Recuerdo cómo al alemán (aquel tan rarito), le preguntamos cómo seleccionabas el número de grupos para las clusterizaciones y nos dijo que probando.
Fortalezas:
El proceso muestra que las distintas observaciones de las webs pueden ser separadas en gran medida de una manera no-supervisada.
La técnica utilizada se puede utilizar en espacios de mayor dimensionalidad y complejidad.
El clustering no requiere de intervenciones humanas más allá de la selección del método, esto es, no hay subjetividad a partir del momento en que se decide escoger el k-means.
Debilidades + justificación/future research:
Las variables están agregadas por día, en futuros estudios trataremos de llevar a cabo estudios a un nivel más micro (a nivel de visita de usuario).
El proceso sólo utiliza dos variables para clusterizar, si se llevara a un nivel micro se podrían utilizar más variables. Esto permitiría explicar porque las webs son diferentes entre sí y no sólo que son diferenciables.
No está claro que utilidad tienen los clusters encontrados ni para que se pueden utilizar. Esto se debe a que el estudio es muy superficial y se llevó a cabo a muy alto nivel.