1. Extracting Task Information
from Past Process Execution Logs
Indentify User Tasks from Past Usage Logs
ISTI-CNR (CNR)
Franco Maria Nardini, Gabriele Tolomei, CNR
2. Learning Package Categorization
S-Cube
Monitoring and Analysis of SBA
Task Modeling
Extracting Task Information
from Past Process Execution Logs
3. Connections to the S-Cube IRF
Conceptual Research Framework:
– Service Composition and Coordination
– Service Infrastructure
– Adaptation and Monitoring
Logical Run-Time Architecture:
– Monitoring Engine
– Adaptation Engine
– Negotiation Engine
– Runtime QA Engine
– Resource Broker
3
5. Background Concepts: Usage Logs
Most complex software systems collect their lifecycle
usage data in log files:
– Web search engines store a tremendous amount of data about
their users in query logs:
- e.g., issued queries, timestamps, clicked results, etc.
– SBS event logs contain several information about service
components exchanging messages
- e.g., service invocation, service failure, registry querying, etc.
Usage logs represent a huge source of “hidden”
information (i.e., knowledge)
5
6. Knowledge Discovery from Usage Logs
Data Mining algorithms and techniques allow extracting
valuable knowledge from usage logs
Extracted knowledge may refer to several aspects:
– e.g., finding usage patterns, modeling user behavior, etc.
If properly exploited, such knowledge might help
improving the overall quality of the system
6
7. The Web as a Task-Execution Platform
Activities people perform are usually composition of
atomic tasks
The accomplishment of those activities is moving
towards the Web platform
Examples:
– planning a travel (overused!)
– organizing a birthday party
– getting a U.S. visa
– etc.
7
9. Goal
Re-construct tasks/processes that users perform on the
Web by means of issued queries to search engines:
– i.e., mining Web-mediated tasks from past issued user queries
Extracting tasks/processes from historical search data
(i.e., query logs) collected by Web search engines
Task-based Session Discovery Problem: approached
using clustering-based techniques
9
11. Query Log Mining
Idea: cluster queries in a way that queries in the same cluster
are likely to be task-related
Input: stream of queries issued by one user
Output: set of clusters of queries representing search tasks
for that user
Key points:
– features (e.g., lexical content, time, semantic, etc.)
– clustering algorithm (e.g., centroid-based, density-based, novel
heuristics)
– distance metrics (e,g., Jaccard, Levenstein, cosine, etc.)
11
12. Our Solution
A graph-based heuristics for discovering queries that are
related to the same search task
Our technique has proven to outperform state-of-the-art
approaches
Results was presented in a research paper published at the
4th ACM Conference on Web Search and Data Mining
(WSDM 2011)
– Identifying Task-based Sessions in Search Engine Query Logs
12
15. Evaluation
We manually extract a set of tasks from a portion of our
testing query log (i.e., ground-truth)
We run our proposed algorithm and evaluate its accuracy in
discovering the manually-labeled tasks of the ground-truth
Evaluation is expressed in terms of popular IR-based metrics:
– Precision
– Recall
– F-measure (i.e., harmonic mean of Precision and Recall)
– Rand
– Jaccard
15
17. Implications for SBS domain: Why?
Our technique was thought for, but not limited to Web search
context
Service-based Systems could be another suitable context of
application
Tasks might be single service instances
Processes might be workflows of orchestrated services
Query/Task clustering can be considered as a special case of
more general “activity clustering”
17
18. Implications for SBS domain: How?
Past usage log data are the key point for applying our
technique
Once we have logs of performed activities (e.g., service
invocations) we can figure out features
Then we can cluster activities according to those features on
a task/process-based perspective
18
20. Conclusions
We developed a technique for mining tasks/processes from
Web search logs
Our technique is based on clustering historical search data
according to some features
This approach might be generalized and applied to several
other contexts (e.g., software-based services)
We need usage logs from which we can extract suitable
features and common interfaces!