Difficulties of comparing code analyzers, or don't forget about usability

Difficulties of comparing code analyzers,
or don't forget about usability
Authors: Evgeniy Ryzhkov, Andrey Karpov

Date: 31.03.2011

Abstract
Users' desire to compare different code analyzers is natural and understandable. However, it's not so
easy to fulfill this desire as it may seem at first sight. The point is that you don't know what particular
factors must be compared.

Introduction
If we eliminate such quite ridiculous ideas like "we should compare the number of diagnosable errors"
or "we should compare the number of tool-generated messages", then even the reasonable parameter
"signal-to-noise ratio" doesn't seem to be an ideal criterion of estimating code analyzers.

You doubt that it's unreasonable to compare the mentioned parameters? Here you are some examples.

What parameters are just unreasonable to compare
Let's take a simple (at first sight) characteristic like the number of diagnostics. It seems that the more
diagnostics, the better. But the general number of rules doesn't matter for the end user who exploits a
particular set of operating systems and compilers. Diagnostic rules which are relevant to systems,
libraries and compilers he doesn't use won't give him anything useful. They even disturb him
overloading the settings system and documentation, and complicate use and integration of the tool.

Here you an analogy: say, a man comes in a store to buy a heater. He is interested in the domestic
appliances department and it's good if this department has a wide range of goods. But the customer
doesn't need other departments. It's OK if he can buy a inflatable boat, cell phone or chair in this store.
But the inflatable boats department doesn't enlarge the range of heaters anyway.

Take, for instance, the Klockwork tool that supports a lot of various systems, including exotic ones. One
of them has a compiler that easily "swallows" this code:

inline int x;

The Klocwork analyzer has a special diagnostic message to detect this anomaly in code: "The 'inline'
keyword is applied to something other than a function or method". Well, it seems good to have such a
diagnostic. But developers using the Microsoft Visual C++ compiler or any other adequate compiler
won't benefit from this diagnostic anyhow. Visual C++ simply doesn't compile this code: "error C2433: 'x'
: 'inline' not permitted on data declarations".

Another example. Some compilers provide poor support of the bool type. So Klockwork may warn you
when a class member is assigned the bool type: "PORTING.STRUCT.BOOL: This checker detects
situations in which a struct/class has a bool member".

"They wrote bool in class! How awful..." It's clear that only few developers will benefit from having this
diagnostic message.

There are plenty of such examples. So it turns out that the number of diagnostic rules in no way is
related to the number of errors an analyzer can detect in a particular project. An analyzer implementing
100 diagnostics and intended for Windows-applications can find much more errors in a project built with
Microsoft Visual Studio than a cross-platform analyzer implementing 1000 diagnostics.

The conclusion is the number of diagnostic rules cannot be relevant when comparing analyzers by
usability.

You may say: "OK, let's compare the number of diagnostics relevant for a particular system then. For
instance, let's single out all the rules to search for errors in Windows-applications". But this approach
doesn't work either. There are two reasons for that:

First, it may be that some diagnostic is implemented in one diagnostic rule in some analyzer and in
several rules in some other analyzer. If you compare them by the number of diagnostics, the latter
analyzer seems better although they both have the same functional to detect a certain type of errors.

Second, implementation of certain diagnostics may be of different quality. For instance, nearly all the
analyzers have the search of "magic numbers". But, say, some analyzer can detect only magic numbers
dangerous from the viewpoint of code migration to 64-bit systems (4, 8, 32, etc) and some other simply
detects all the magic numbers (1, 2, 3, etc). So it won't do if we only write a plus mark for each analyzer
in the comparison table.

They also like to take the characteristic of tool's speed or number of code lines processessed per second.
But it's unreasonable from the viewpoint of practice either. There is no relation between the speed of a
code analyzer and speed of analysis performed by man! First, code analysis is often launched
automatically during night builds. You just must "be in time" for the morning. And second, they often
forget about the usability parameter when comparing analyzers. Well, let's study this issue in detail.

Tool's usability is very important for adequate comparison
The point is that usability of a tool influences the practice of real use of code analyzers very much...

We have checked the eMule project recently with two code analyzers estimating the convenience of this
operation in each case. One of the tools was a static analyzer integrated into some Visual Studio
editions. The second analyzer was our PVS-Studio. We at once encountered several issues when
handling the code analyzer integrated into Visual Studio. And those issues did not relate to the analysis
quality itself or speed.

The first issue is that you cannot save a list of analyzer-generated messages for further examination. For
instance, while checking eMule with the integrated analyzer, I got two thousand messages. No one can
thoroughly investigate them all at once, so you have to examine them for several days. But the
impossibility to save analysis results causes me to re-analyze the project each time, which tires me very
much. PVS-Studio allows you to save analysis results for you to continue examining them later.

The second issue is about the way how processing of duplicate analyzer-messages is implemented. I
mean diagnosis of problems in header files (.h-files). Say the analyzer has detected an issue in an .h-file
included into ten .cpp-files. While analyzing each of these ten .cpp-files, the Visual Studio-integrated

analyzer produces the same message about the issue in the .h-file ten times! Here you are a real sample.
The following message was generated more than ten times while checking eMule:

c:usersevgdocumentsemuleplusdialogmintraybtn.hpp(450):

warning C6054: String 'szwThemeColor' might not be zero-terminated:

Lines: 434, 437, 438, 443, 445, 448, 450

Because of this, analysis results get messy and you have to review almost the same messages. I should
say, PVS-Studio has been filtering duplicate messages instead of showing them to user since the very
beginning.

The third issue is generation of messages on issues in plug-in files (from folders like C:Program Files
(x86)Microsoft Visual Studio 10.0VCinclude). The analyzer built into Visual Studio is not ashamed to
attaint system header files although there is little sense in it. Again, here you are an example. We got
several times one and the same message about system files while checking eMule:

1>c:program files (x86)microsoft

sdkswindowsv7.0aincludews2tcpip.h(729):

warning C6386: Buffer overrun: accessing 'argument 1',

the writable size is '1*4' bytes,

but '4294967272' bytes might be written:

Lines: 703, 704, 705, 707, 713, 714, 715, 720,

721, 722, 724, 727, 728, 729

Nobody will ever edit system files. What for to "curse" them? PVS-Studio has never done that.

Into the same category we can place the impossibility to tell the analyzer not to perform mask-check of
certain files, for instance, all the files "*_generated.cpp" or "c:libs". You may specify exception files in
PVS-Studio.

The fourth issue relates to the very process of handling the list of analyzer-generated messages. Of
course, you may disable any diagnostic messages by code in any code analyzer. But it can be done at
different convenience levels. To be more exact, the question is: should analysis be relaunched to hide
unnecessary messages by code or not. In the Visual-Studio-integrated analyzer, you must rewrite codes
of messages to be disabled in the project's settings and relaunch the analysis. Sure, you hardly can
specify all the "unnecessary" diagnostics, so you will have to relaunch the analysis several times. In PVS-
Studio, you can easily hide and reveal messages by code without relaunching the analysis, which is much
more convenient.

The fifth issue is filtering of messages not only by code but by text as well. For instance, it might be
useful to hide all the messages containing "printf". The analyzer integrated into Visual Studio doesn't
have this feature while PVS-Studio has it.

Finally, the sixth issue is convenience of specifying false alarms to the tool. The #pragma warning disable
mechanism employed in Visual Studio lets you hide a message only relaunching the analysis. The

mechanism in PVS-Studio lets you mark messages as "False Alarm" and hide them without relaunching
the analysis.

All the six above mentioned issues don't relate to code analysis itself yet they are very important since
usability of a tool is that very integral index showing whether it will come to estimating analysis quality
at all.

Let's see what we've got. The static analyzer integrated into Visual Studio checks the eMule project
several times quicker than PVS-Studio. But it took us 3 days to complete work with the Visual Studio's
analyzer (actually it was less but we had to switch to other tasks to have a rest). PVS-Studio took us only
4 hours to complete the work.

Note. What the quantity of errors found is concerned - the both analyzers have shown almost the same
results and found the same errors.

Summary
Comparison of two static analyzers is a very difficult and complex task. And there is no answer to the
question what tool is the best IN GENERAL. You can only speak of what tool is better for a particular
project and user.

Difficulties of comparing code analyzers, or don't forget about usability

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Ähnlich wie Difficulties of comparing code analyzers, or don't forget about usability

Ähnlich wie Difficulties of comparing code analyzers, or don't forget about usability (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Difficulties of comparing code analyzers, or don't forget about usability