This document presents a framework for semantically modeling and analyzing user web browsing behavior across multiple sites. It describes a Web Browsing Activity Model (WAM) ontology to formally represent browsing events and sessions. User logs are semantically enriched using this ontology and domain ontologies before being stored in a knowledge base. Temporal logic queries can then be used to find patterns in browsing behavior, such as sessions that started with a publication page and later involved a search engine. An evaluation shows this approach can successfully query logs of thousands of sessions within seconds.
Semantic Analysis of User Browsing Patterns in the Web of Data @USEWOD, WWW2012
1. Enabling Semantic Analysis of User Browsing Patterns
in the Web of Data
M.Sc. Julia Hoxha
Institute of Applied Informatics and Formal Description Methods (AIFB)
Karlsruhe Institute of Technology
USEWOD Workshop @WWW2012
Lyon, France
KIT – University of the State of Baden-Württemberg and
National Laboratory of the Helmholtz Association
www.kit.edu
2. Paper
Hoxha, J., Junghans, M., and Agarwal, S. (2012).
Enabling Semantic Analysis of User Browsing
Patterns in the Web of Data. In 2nd International
Workshop on Usage Analysis and the Web of Data
(USEWOD), 21st International World Wide Web
Conference (WWW2012), Lyon, France, vol. CoRR,
abs/1204.2713.
http://arxiv.org/abs/1204.2713
3. Outline
Introduction
Framework for Behavior Analysis
Semantic Modeling of Cross-site Browsing Behavior
Web Browsing Activity Model (WAM)
Formalization Approach
Querying Behavioral Patterns
Evaluation
Conclusions
J. Hoxha – USEWOD Workshop, Lyon, 2012
3
4. Introduction
Understanding user behavior in accessing Web
resources helps site providers/domain experts:
• Discover user preferences or detect bottlenecks
swrc:Publication
•
ID Time Build adaptive Web sites
User Action
isA
1 [17:11:49:21 http://www.google.de/search?q=Lyon+www2012
• Make appropriate
users, etc.
1 [17:11:49:33] http://dbpedia.org/page/Lyon recommendations to swrc:Proceedings
1
[17:11:49:39] http://data.semanticweb.org/conference/
www/2011/demo/a-demo-search-engine-for-products
ns2:relatedToEvent
dc:creator
How to facilitate the analysis of usage patterns?
HTTP Requests of Usage Logs
swrc:Conference
InProceedi
Event ngs
ns3:based_near
foaf:Person
• Provide formal, semantic description of usage logs
dbpedia:
literal
Populated
Place
• Offer techniques to expressively query patterns
ns1:name
SWDF Domain Ontology
J. Hoxha – USEWOD Workshop, Lyon, 2012
4
5. Modeling and Analysis Framework
Pattern Mining
Analysis
Querying Capabilities
Semantic Formalization
Browsing
Activity
Formalization
Transformation
Preprocessing
Event A
Event B
Event C
Formalization
Selection
---------
Repository
Domain
Ontologies
Semantic Activity Models
Target Data
---------
Event K
Event N
Preprocessed
Data
Annotation
with Domain Ontology
Transformed
Data
Semantic Formalization
Semantic
Activity
Model
Web Browsing Behavior
Monitoring System
Monitoring
Cross-site
Browsing
Activities
?
?
www
www
...
User 1
User n
J. Hoxha – USEWOD Workshop, Lyon, 2012
User Session of browsing Events
Event e1 = (A1, I1, t1)
Type Ai ={content, function}
s: <l1, l2, l3,
Input I1 = {i1,...,ik}
URL l1, Time t1
Event en = (An, In, tn)
Type An
..., ln>In = {i1,...,ik}
Input
URL ln, Time tn
5
6. Definitions
Event
• l full URL invoked, T types, P parameter, t timestamp
Event types
• Tc content type of an event
• Tf function type of an event
Session
• s is ordered sequence of events
•
, s.t. i is the event order in s
• Ts start time and Te end time, s.t.
J. Hoxha – USEWOD Workshop, Lyon, 2012
6
7. Web browsing Activity Model (WAM)
http://www.avis.com/car-rental/reservation/
start-reservation.ac?resForm.pickUpLocation=Lyon
http://data.semanticweb.org/person/julia-hoxha
owa:Parameter
Name
Literal
wam:hasValue
wam:hasName
wam:Output
Variable
Literal
wam:userID
wam:userIP
rdfs:subClassOf
wam:Input
Variable
Domain Ontology
used for semantic
enrichment
Literal
time:Temporal
Entity
wam:Parameter
time:Instant
wam:hasInput
wam:User
time:Interval
wam:hasParameter
wam:hasUser
wam:inInterval
wam:Session
wam:hasTime
Literal
Based on function
and content
wam:hasEvent
wam:order
wam:hasStartEvent
wam:Event
wam:StartEvent
?
wam:hasEndEvent
wam:EventType
wam:function
Type
rdfs:subClassOf
wam:eventURL
wam:EndEvent
event:Event
wam:Function
Type
wam:EventURL
wam:contentType
wam:Content
Type
rdfs:subClassOf
wam:fullURL
Literal
wam:baseURL
wam:BaseURL
wam:<http://greenlinkeddata.org/wam.owl#>
time:<http://www.w3.org/2006/time#>
event: <http://purl.org/NET/c4dm/event.owl#>
rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
7
rdfs:<http://www.w3.org/2000/01/rdf-schema#>
8. Formalization Approach
Formalization based on
WAM ontology
• Step 1. Semantic Enrichment
• Step 2. Extend Knowledge
Base (ABox assertions for
Semantic
Formalization
Transformation
Preprocessing
Selection
---------
events & domain ontology)
• Step 3. RDF Serialization
Target Data
---------
Preprocessed
Data
Event A
Event B
Event C
Event K
Event N
Annotation
with Domain Ontology
Semantic
Activity
Models
Transformed
Data
Semantic Enrichment
•
•
•
For each link in logs, find URI of Web resource
Find RDF representation of the resource (via a Mapping Template)
e.g. SWDF:
Extract ontology classes to which it belongs – used as ContentType of event
http://data.semanticweb.org/person/julia-hoxha/html HTML
(Person, ResearchGroup, Publication, MusicGroup,- etc.)
http://data.semanticweb.org/person/julia-hoxha - URI
http://data.semanticweb.org/person/julia-hoxha/rdf - RDF/XML
J. Hoxha – USEWOD Workshop, Lyon, 2012
8
9. Semantic Analysis
Querying with semantic constraints
Example:
- In how many sessions within Mar-Apr 2011
users searched in Google, afterwards visited a
page in SWDF?
Various levels of abstraction:
e.g. instead of google -> any search engine
or instead of any page -> WWW2011 page
or even higher abstraction -> Conference page
„Conference“
isA
„WWW2011“
isA
e1.time e1.urlBase e1.type
s: <e1, ..., e2, ef >
Address also temporal constraints
regarding the dynamics of user browsing behavior
J. Hoxha – USEWOD Workshop, Lyon, 2012
9
10. Temporal Constraints
Consider real time (timestamps) and abstract time
(order of events) to query usage patterns
Q: find sessions with start time Ts and end time Te containing an event e1 with URL
www.ex1.org, eventually succeeded by another e2 in the session with URL www.ex2.org
We address temporal logics capable of ontological
reasoning
AAistrue at atsome state
true the next state
isis trueat all states
after the initial state s1
along the path
on
• apply temporal operators e.g. next, eventually, always the path
X
LTL Formula in a
(based on Lineal Temporal Logic - LTL)
State Transition System
• query formulated as LTL formula extended with DL axioms
LTL + DL - Proposition A as a set of Abox assertions e.g.
J. Hoxha – USEWOD Workshop, Lyon, 2012
10
11. DL-LTL Query Formulation
Queries formulate
• 1) certain conditions on the session itself
• 2) temporal patterns in the events within the session
Query:
Q (s):
find sessions with start time Ts and end time Te
containing an event e1 with content type “publication”,
eventually succeeded by another e2 with function type “search engine”
2) Temporal patterns within itself
1) Conditions on the session a session
expressed as a DL-LTL formula, e.g.
J. Hoxha – USEWOD Workshop, Lyon, 2012
11
12. Query Answering Approach
Step 1. Check constraints on the session itself
Step 2. Verify temporal constraints applying model
checking technique
Iterate over sessions S={S1, S2,…,Sn}
(a) build a finite state automaton (FSA) for each Si
(b) verification of DL-LTL formula
iterate over the states of FSA to determine whether a
condition holds in the respective state
J. Hoxha – USEWOD Workshop, Lyon, 2012
12
13. Evaluation
Validate feasibility of the
formalization approach
Show feasibility of the query
answering approach
• Query sessions with
different patterns
• Measure performance
Formalization
SWDF
2009
DBPedia
3-3
Monitoring
Period
01.Jul.0912.Jul.09
01.Jul.0912.Jul.09
avg.#sessions
/day
235.9
2899
2831
31893
#sessions
Bing
2.7%
Google
97%
• Only 1.46% of daily sessions containing SPARQL queries
SDWF 2009: % of sessions initiated in the domain
Dbpedia of sessions
DBPedia 2009: %2009
initiated in the domain
13
14. Evaluation (II)
Querying
• answering time varies
slightly for the queries
(~0.15 seconds)
• For up to 1000 sessions
below 1.4 seconds
time (sec)
Q1
J. Hoxha – USEWOD Workshop, Lyon, 2012
• model checking time
is small
• OWL reasoning takes
~ 94% of the overall
answering time
nr. sessions
14
15. Conclusions
Propose a framework for behavior modeling and
analysis:
• Approach for semantic formalization of logs
• Techniques of querying patterns with temporal and
semantic constraints
Challenges and Future Work
•
•
•
•
Find datasets of client-side navigation logs at multiple sites
Domain Ontology acquisition
Classification Techniques to find FunctionType
Optimization of Query Answering
J. Hoxha – USEWOD Workshop, Lyon, 2012
15