Extracting Task Info from Logs

Extracting Task Information
from Past Process Execution Logs

Indentify User Tasks from Past Usage Logs

ISTI-CNR (CNR)

Franco Maria Nardini, Gabriele Tolomei, CNR

Learning Package Categorization

S-Cube

Monitoring and Analysis of SBA

Task Modeling

Extracting Task Information
from Past Process Execution Logs

Connections to the S-Cube IRF

  Conceptual Research Framework:
–  Service Composition and Coordination
–  Service Infrastructure
–  Adaptation and Monitoring

  Logical Run-Time Architecture:
–  Monitoring Engine
–  Adaptation Engine
–  Negotiation Engine
–  Runtime QA Engine
–  Resource Broker

3

Overview

  Introduction
  Goal
  Methodology
  Experiments
  Conclusions

Background Concepts: Usage Logs

  Most complex software systems collect their lifecycle
usage data in log files:
–  Web search engines store a tremendous amount of data about
their users in query logs:
-  e.g., issued queries, timestamps, clicked results, etc.
–  SBS event logs contain several information about service
components exchanging messages
-  e.g., service invocation, service failure, registry querying, etc.

  Usage logs represent a huge source of “hidden”
information (i.e., knowledge)

5

Knowledge Discovery from Usage Logs

  Data Mining algorithms and techniques allow extracting
valuable knowledge from usage logs
  Extracted knowledge may refer to several aspects:
–  e.g., finding usage patterns, modeling user behavior, etc.

  If properly exploited, such knowledge might help
improving the overall quality of the system

6

The Web as a Task-Execution Platform

  Activities people perform are usually composition of
atomic tasks
  The accomplishment of those activities is moving
towards the Web platform
  Examples:
–  planning a travel (overused!)
–  organizing a birthday party
–  getting a U.S. visa
–  etc.

7

Goal

  Re-construct tasks/processes that users perform on the
Web by means of issued queries to search engines:
–  i.e., mining Web-mediated tasks from past issued user queries

  Extracting tasks/processes from historical search data
(i.e., query logs) collected by Web search engines
  Task-based Session Discovery Problem: approached
using clustering-based techniques

9

Query Log Mining

  Idea: cluster queries in a way that queries in the same cluster
are likely to be task-related
  Input: stream of queries issued by one user
  Output: set of clusters of queries representing search tasks
for that user
  Key points:
–  features (e.g., lexical content, time, semantic, etc.)
–  clustering algorithm (e.g., centroid-based, density-based, novel
heuristics)
–  distance metrics (e,g., Jaccard, Levenstein, cosine, etc.)

11

Our Solution

  A graph-based heuristics for discovering queries that are
related to the same search task
  Our technique has proven to outperform state-of-the-art
approaches
  Results was presented in a research paper published at the
4th ACM Conference on Web Search and Data Mining
(WSDM 2011)
–  Identifying Task-based Sessions in Search Engine Query Logs

12

Data Set: 2006 AOL Query Log

14

Evaluation

  We manually extract a set of tasks from a portion of our
testing query log (i.e., ground-truth)
  We run our proposed algorithm and evaluate its accuracy in
discovering the manually-labeled tasks of the ground-truth
  Evaluation is expressed in terms of popular IR-based metrics:
–  Precision
–  Recall
–  F-measure (i.e., harmonic mean of Precision and Recall)
–  Rand
–  Jaccard

15

Implications for SBS domain: Why?

  Our technique was thought for, but not limited to Web search
context
  Service-based Systems could be another suitable context of
application
  Tasks might be single service instances
  Processes might be workflows of orchestrated services
  Query/Task clustering can be considered as a special case of
more general “activity clustering”

17

Implications for SBS domain: How?

  Past usage log data are the key point for applying our
technique
  Once we have logs of performed activities (e.g., service
invocations) we can figure out features
  Then we can cluster activities according to those features on
a task/process-based perspective

18

Conclusions

  We developed a technique for mining tasks/processes from
Web search logs
  Our technique is based on clustering historical search data
according to some features
  This approach might be generalized and applied to several
other contexts (e.g., software-based services)
  We need usage logs from which we can extract suitable
features and common interfaces!

Extracting Task Info from Logs

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (9)

Ähnlich wie Extracting Task Info from Logs

Ähnlich wie Extracting Task Info from Logs (20)

Mehr von virtual-campus

Mehr von virtual-campus (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Extracting Task Info from Logs