How to add a new diagnostic rule into PVS-Studio?

How to add a new diagnostic rule into
PVS-Studio? Days from developers' life...
Author: Evgeniy Ryzhkov

Date: 14.09.2011

We are often asked the question how one can add one's own diagnostic rule into our static analyzer
PVS-Studio. And we always answer that it is very simple: "You just need to write us a letter with your
request and we will add this rule into the analyzer". This interface of adding new rules is convenient to
users. This is the best and most convenient interface actually. It's not so easy to do it on your own as it
may seem. In this post, I will show you the bottom of the iceberg implied in the words "we have added
this simple rule".

OK, let's be honest. PVS-Studio doesn't have a mechanism to enable users to write their own rules. This
is, first of all, an architectural restriction. Some users may get upset because of it. But actually such
people don't understand what they want because the mechanism of adding a new rule is rather
complicated in practice.

For instance, there is rule V536: Be advised that the utilized constant value is represented by an octal
form. It is intended for detecting errors like the following one found in Miranda IM:

static const struct _tag_cpltbl

{

unsigned cp;

const char* mimecp;

} cptbl[] =

{

{ 037, "IBM037" }, // IBM EBCDIC US-Canada

{ 437, "IBM437" }, // OEM United States

{ 500, "IBM500" }, // IBM EBCDIC International

{ 708, "ASMO-708" }, // Arabic (ASMO 708)

...

}

Value 037 is written here instead of 37 just for the sake of alignment, although it is really number 31 in
decimal form because 037 is an octal number according to the C++ rules. The error is rather simple for
diagnosis, isn't it? Indeed: search for a constant, check if it begins with 0 and generate a diagnostic
message. This diagnostic rule can be implemented in an hour. And a static analyzer's user (a skilled
programmer but novice regarding development of such tools) is theoretically ready to make this rule
himself/herself.

But he/she will fail, or rather he/she will succeed but the result won't satisfy him/her because of too
many false reports.

We (PVS-Studio's developers) do it in the following way. Before starting to implement a rule, we at first
write unit-tests for it where we add special marks for those lines that should trigger the rule and leave
those lines that shouldn't. Then we write the first version of the code for diagnostics. As this new rule
passes successfully all the unit-tests written particularly for it, we run full unit-tests to make sure that
the new rule hasn't broken anything. The full unit-tests are run for several tens of minutes. Surely, you
rarely manage "not to break anything" at the first time, so iterations are run again and again until the
unit-tests are passed completely.

The unit-tests are followed by the next step when we run the rule on real projects. We have about 70
projects in our test base (for 2011); these are famous and not very famous open source projects we test
our analyzer on. We have a self-made launcher that opens projects in Visual Studio, launches the
analyzer, saves the log, compares it to the master copy and shows the differences (messages
disappeared, messages added and messages changed). One cycle of running all the projects for Visual
Studio 2005 takes 4 hours on a computer with four cores and 8 Gbytes of memory, the analysis process
being parallelized. Judging by the analysis results we can see how well the new rule is integrated.
Usually we start to study the differences without waiting for analysis to complete because sometimes
it's clear if something has gone wrong. But we may stop running tests even if everything works well and
nothing is broken. It happens when there are too many false reports. If we see that a diagnostic rule is
triggered too often, we make exceptions from this rule.

Let's return to the search of incorrect octal numbers, the V536 rule. It has quite a number of exceptions.
These are the situations when the warning is not generated:

1. The constant's value is below 8.
2. The number is defined through #define.
3. There are more than two octal numbers in one block (the rule was triggered more than twice),
except for 0 and char type.
4. This number is used as an argument in functions: _open, mknod, open, wxMkdir,
wxFileName::Mkdir, wxFileName::AppendDir, chmod.

5. This is a nested block of numbers. We check only the top one. For example: { {090}, {091}, {092},
{093} }.
6. The number is <= 0777 and acts as a part of a statement that contains words: file, File.
7. The number is an argument of a function one parameter of which contains the string "%o".

Sometimes you cannot predict what exceptions there will be before running the rule on real projects.
That's why the iteration "making an exception - running tests - analyzing results" may be repeated many
times.

Then, when all the exceptions are implemented and the number of false reports is tolerable, the tests'
results (saved log-files of detected errors) are defined as a master copy. After that we run the same tests
for other versions of Visual Studio. At present (in 2011), we have support for the three versions of Visual
Studio: Visual Studio 2005/2008/2010. Tests in them run for 4/4.5/5 hours correspondingly, which
makes the total time of 14-15 hours. It may appear that one and the same code causes a different
number of diagnostic messages depending on Visual Studio's version. Usually it is determined by
differences in header files of different SDKs. We try to eliminate the differences so that the tests run in
all the versions of Visual Studio give identical results.

Why is it so important to run all those tiresome tests? Is it only because of false reports? No, it's not only
because of them. The point is that you can't take account of all the possible types of constructs while
making a rule and write it so that it works correctly and moreover doesn't cause the analyzer to crash.
This is what we need all those numerous tests for!

What strange constructs do we mean? Here you are a couple of examples.

Do you know that you may define a variable, for instance, in a way like this:

int const unsigned static a22 = 0;

Do you know that __int64 can be a variable's name?

int __identifier(__int64);

Do you know that braces are not necessary after switch?

switch (0)

if (X) Foo();

Even if you do not use such constructs in code yourself, there can be some in libraries you use.

Finally, having passed all the tests, we may start working on documentation. For each diagnostic rule we
write a help entry in Russian and English. Translation to English also takes some time. Now that the code
is debugged, the tests are passed and documentation is written, the rule appears in the next release of
PVS-Studio.

So, the development cycle of a new diagnostic rule can be briefly represented as follows:

1. An idea of a new rule: either we invent it ourselves, or our users tell us, or we manage to spot it
somewhere.
2. Formulating the rule.
3. Implementation of the rule.

4. Testing of every kind.
5. Analyzing the results. Variants are possible:
a. Improving the rule's formulation, adding exceptions, going to step 2.
b. Rule's implementation is considered established; we may write and translate the
documentation.

The total time of implementing one rule is several days. Usually it is from 2 to 5 days. Naturally, we have
enough experience now and can predict what exceptions from a rule we may need and implement them
right away. We also manage to reduce the number of test runs. But anyway, the key idea is that only
running tests you can formulate the exceptions and eliminate most of the false reports.

Now let's return to the question of how users can add rules themselves. As you can see, there are two
reasons why it's difficult to do: they don't have enough skill to make out good exceptions and a good
test base to run the rule on it.

A question arises: "Then why are there means of creating user-own rules in some famous and
respectable static analyzers?" Perhaps, we can't do it? Partly because of this, yes. But the mechanism of
creating user-own rules serves only one purpose - to compose "rival-comparison matrixes":

Well, users will ask, won't we be enabled to create our rules in PVS-Studio some day? Some day we will
manage to implement this mechanism and not to lose in comparison matrixes :-). But for now, all who
want to create a new rule will receive a link to this post with the words: "We have the best interface for
adding user rules - just write us a letter!"

How to add a new diagnostic rule into PVS-Studio?

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Mehr von Andrey Karpov

Mehr von Andrey Karpov (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

How to add a new diagnostic rule into PVS-Studio?