Log Engineering: Towards Systematic Log Mining to Support the Development of Ultra-large Scale Systems
1. 1
Weiyi Shang
Supervisor: Dr. Ahmed E. Hassan
Log Engineering: Towards Systematic Log
Mining to Support the Development of
Ultra-largeScale Systems
2. Automated profiling & instrumentation are
widely used in software engineering
Overhead No domain knowledgeLarge scale
2
3. Logs are a valuable source of information about
system execution
3
Field informationDeveloper experience
foo() {
…
Log_statement(“operation
started”);
…
}
4. Overview of log mining
4
Software
System
Log collection Log analysis
Log
transformation
Goal
5. Finding 1. Little research focuses on logging
statements that reside in the source code.
Finding 2. Little research focuses on logs
generated during the development of system.
5
Software
System
Log collection Log analysis
Log
transformation
Goal
• Types of Logs:
• Platform logs: Hadoop logs [Tan et al.]
• Application logs: Dell DVD store logs [Jiang et al.]
• Sources of Logs:
• Logs from the field: [Kavulia et al.]
• Logs during development: [Jiang et al.]
6. Finding 3. Prior research primarily uses ad hoc
log transformation techniques.
6
Software
System
Log collection Log analysis
Log
transformation
Goal
• Abstracted logs: Log events [Jiang et al.]
• Vectors or sets: Pairs [Jiang et al.], Sequence [Jiang et al.], Suffix arrays
[Nagappan et al.], Time series [Bitincka et al.]
• Graphs: State machines [Tan et al.], Directed Graph[Nagappan et al.]
• Matrixes: [Lou et al.]
7. Finding 4. Prior log mining research does not
address the scalability challenges.
7
Software
System
Log collection Log analysis
Log
transformation
Goal
• Simple calculation: filtering [Salfner et
al. ]
• Directed Graph-based algorithms:
[Nagappan et al.]
• Static analysis: [Yuan et al.]
• Model checking: [Beschastnikh et al.]
• Visualization: [De Pauw et al.]
• Statistical methods: PCA [Xu et al.]
• Data mining techniques: Co-occurrence
analysis [Lou et al.]
• Machine learning techniques: Prediction
[Salfner et al.]
• Other analysis techniques: Compression
[Hassan et al.]
8. Finding 5. There exists limited software log
mining research to support software
development activities
8
Software
System
Log collection Log analysis
Log
transformation
Goal
• Log mining platforms: [Bitincka et al.]
• Log improvements: [Yuan et al.]
• Log mining for system administration
• Anomaly detection [Xu et al.]
• System monitoring [Rabkin et al.]
• Work load recovery and capacity
planning [Kavulia et al.]
• Log mining for software engineering
• Program comprehension:
[Beschastnikh et al.]
• Software testing: [Jiang et al.]
• Empirical studies: [Yuan et al.]
9. Thesis statement
Logs are a valuable yet rarely explored source of knowledge
about a software system and its operation. There is little
research regarding the understanding and evolution of logs.
Systematic and scalable log mining approaches are needed
to support various software development activities (e.g.,
code quality improvement, large scale testing and
deployment of ultra-large scale applications).
9
10. 10
Part 1: Study the challenges associated with
understanding and evolving logging statements
Part 2: Log engineering approaches to support
software development activities
What are the challenges in understanding logging
statements? [Submitted to ICSM 2014]
How do logging statements evolve?
[WCRE 2011 , JESP]
Prioritizing code review and testing efforts using logs
and their churn. [EMSE]
Verifying deployment of Big Data Analytics applications
using logs. [ICSE 2013 ]
11. 11
Part 1: Study the challenges associated with
understanding and evolving logging statements
Part 2: Log engineering approaches to support
software development activities
What are the challenges in understanding
logging statements?
How do logging statements evolve?
Prioritizing code review and testing efforts
using logs and their churn.
Verifying deployment of Big Data Analytics
applications using logs.
12. Motivation: Log understanding is challenging
12
User mailing lists
Hadoop
Cassandra
Zookeeper
14 inquiries asked
about 5 types of
information
2
11
1
6
1
0
5
10
Meaning Cause Context Solution Impact
# inquires
[ICSM 2014 in submission]
13. Approach: Attaching development knowledge
to logs
13
Code
commit
Issue reports
Source code
/*
…
*/
Call graph
Code comments
[ICSM 2014 in submission]
14. Development knowledge can resolve real-life
inquiries
Development knowledge can provide help in resolving 9 out of 14
real-life inquiries from the user mailing list
0
2
4
6
8
10
12
Meaning Cause Context Solution Impact
# not answered inquires
# answered inquires
14
[ICSM 2014 in submission]
15. 15
Part 1: Study the challenges associated with
understanding and evolving logging statements
Part 2: Log engineering approaches to support
software development activities
What are the challenges in understanding
logging statements?
How do logging statements evolve?
Prioritizing code review and testing efforts
using logs and their churn.
Verifying deployment of Big Data Analytics
applications using logs.
16. Motivation:
How to keep Log Processing Apps in sync with
logs?
Release 1 Release 2 Release 3
16
[WCRE 2011 best paper, JSEP]
17. Approach:
Studying log evolution at the execution level
Data
Collection
Log
Abstraction
System
Deployment
time=1, Trying to launch, TaskID=01A
time=$t, Trying to launch, TaskID=$id
Enterprise Application (EA)
17
Log
Events
[WCRE 2011 best paper, JSEP]
19. How do log evolve
over time?
19
Growing &
changing
Document &
track
What types of
modifications
happen to logs?
What information is
conveyed by the
short-lived logs?
Quantity Type Content
8 types
Are mostly
avoidable
Implementation-
level details
Fragile
Maintenance
effort
Results
[WCRE 2011 best paper, JSEP]
20. 20
Part 1: Study the challenges associated with
understanding and evolving logging statements
Part 2: Log engineering approaches to support
software development activities
What are the challenges in understanding
logging statements?
How do logging statements evolve?
Prioritizing code review and testing efforts
using logs and their churn.
Verifying deployment of Big Data Analytics
applications using logs.
21. Approach: Building statistical models for post-
release defects
21
Logistic
Regression
Model
Traditional metrics
Traditional metrics Log-related metrics
Logistic
Regression
Model
• Are log-related metrics significant in the models?
• How much explanatory power improvement can
log-related metrics provide over traditional
metrics?[EMSE]
22. 22
Log density
Average logging level
Log add density
Log delete density
Co-change of log and bug fix
Product Process
Approach:
Defining log-related metrics
Lines of code
Pre-release defects
Total prior commits
log-related metrics
Traditional metrics
Product Process
[EMSE]
23. 23
There is relationship between logging characteristics and software quality.
Results
• In 7 out of 8 studied releases, at least one log-related
metric is statistically significant in enhancing the model
with only traditional metrics.
• The log-related metrics provide up to 40%
improvement over the explanatory power of the
traditional metrics.
0.16.0 to 0.19.0 3.0 to 4.0
[EMSE]
24. 24
Part 1: Study the challenges associated with
understanding and evolving logging statements
Part 2: Log engineering approaches to support
software development activities
What are the challenges in understanding
logging statements?
How do logging statements evolve?
Prioritizing code review and testing efforts
using logs and their churn.
Verifying deployment of Big Data Analytics
applications using logs.
25. How to verify the deployment of Big Data
Analytics Apps?
25
Small sample data and pseudo
cloud
Big data and real-life cloud
How to verify
[ICSE 2013 distinguished paper]
26. Traditional approach for verifying BDA apps
26
Keyword scan
Many false positives!!
Large results, too much
effort to manually
examine
[ICSE 2013 distinguished paper]
27. Overview of our approach
27
Small sample data and pseudo
cloud
Big data and real-life cloud
Underlying platform Underlying platform
Execution
sequences
Execution
sequences
Execution
sequence
delta
[ICSE 2013 distinguished paper]
28. Comparing small and large runs
28
Logs from
testing run
with small
data
Logs from
run with
large data
Execution sequence
E1, E2, E3, E5, E6
Execution sequence
E1, E2, E3, E5, E6
E1, E2, E3, E7, E5, E6
Execution sequence delta
E1, E2, E3, E7, E5, E6
[ICSE 2013 distinguished paper]
29. How precise is our
approach?
Precision
29
Effort Reduction
How much effort
reduction does our
approach provide?
Reduce logs for
manual inspection
by over 86%
Less false positive
[ICSE 2013 distinguished paper]
30. Thesis contribution
• We demonstrate the challenges of understanding
logs.
• We show that logging statements continually
evolve.
• We show that there is a relationship between
logging characteristics and software defects.
• We propose approaches that leverage logs to
verify the deployment of Big Data Analytics
applications.
30
33. Where else can we find the requested
information?
33
Code
commit
Issue reports
Source code
/*
…
*/
Code
comments
Call graph
fetch failure
From method
checkAndInformJobTracker
of file ShuffleScheduler.java
34. Where else can we find the requested
information?
34
Code
commit
Issue reports
Source code
/*
…
*/
Code
comments
Call graph
fetch failure
Notify the JobTracker after every read error, if
`reportReadErrorImmediately' is true or after
every `maxFetchFailuresBeforeReporting' failures
35. Where else can we find the requested
information?
35
Code
commit
Issue reports
Source code
/*
…
*/
Code
comments
Call graph
fetch failure
Called by method
copyFailed in class ShuffleScheduler
36. Where else can we find the requested
information?
36
Code
commit
Issue reports
Source code
/*
…
*/
Code
comments
Call graph
fetch failure
Allow shuffle retries and read-error
reporting to be configurable. Contributed
by Amareshwari Sriramadasu.
37. Where else can we find the requested
information?
37
Code
commit
Issue reports
Source code
/*
…
*/
Code
comments
Call graph
fetch failure
MAPREDUCE-1171.
… This is caused by a behavioral change in
hadoop 0.20.1. …
…One solution I could see is "Provide a config
option... ”…
38. Where else can we find the requested
information?
38
Code
commit
Issue reports
Source code
/*
…
*/
Code
comments
Call graph
fetch failure
Meaning: There is a data reading error.
Cause: One of the possible reasons is a configuration.
Context: The event happens during the shuffle period, while
copying data.
Impact: The event impacts the jobtracker.
Solution: Changing a configuration option would solve the issue.
Amareshwari Sriramadasu is the expert to go to.
39. Step 1: Log Abstraction
reduces the size of logs
39
Log
abstraction
Log Linking
Simplifying
sequences
Example of log lines
Execution events
Jiang et al. JSME 2008
40. Step 2: Log linking
provides context for logs
40
Log
abstraction
Log Linking
Simplifying
sequences
Example of log lines
Execution events
41. Step 3: Sequence simplification
deals with repeated logs
41
Log
abstraction
Log Linking
Simplifying
sequences
Repeated logs:
task t1 read file A.
task t1 read file A.
task t1 read file A.
Remove repetition
and order of events
Editor's Notes
Introduce my self and topic
Title
large
To understand system behavior
Example of why we need it
No-domain knowledge*
Entire system rather than an important part
Logs record important events of system, developers put logs there
Title 5 tto 9
Title 5 tto 9
Priority change figure
Churn=> how logs change
Disconnect between dev and system admin
neon
Emphysize on this slide more for take-home
Priority change figure
Explain LPA
Example of the workload.
Hadoop wordcount.
Dell DVD store.
By knowing the logging method, I get the logging statements
e.g, log4j
Example of the log types: rephrase
Short-lived logs: what’s that
Fragile=> break the LPA
Priority change figure
What traditional metrics is: well studied in practice and accepted by community
Logging level: what’s that
Priority change figure
BDA
Underlying platform: hadoop
3 applications, real ones
First part, we study logging in practice, second part, we propose log mining techniques