Difference of code analysis approaches in compilers and specialized tools

Difference of code analysis approaches in
compilers and specialized tools
Author: Andrey Karpov

Date: 01.11.2010

Compilers and third-party static code analyzers have one common task: to detect dangerous code
party
fragments. However, there is a great difference in the types of analysis performed by each kind of these
tools. I will try to show you the differences between these two approaches (and explain their source) by
the example of the Intel C++ compiler and PVS
l PVS-Studio analyzer.

This time, it is the Notepad++ 5.8.2 project that we chose for the test
test.

Notepad++
At first a couple of words about the project we have chosen Notepad++ is an open-source and free
chosen.
source code editor that supports many languages and appears a substitute for the standard Notepad. It
works in the Microsoft Windows environment and is released under the GPL license What I liked about
license.
this project is that it is written in C++ and has a small size - just 73000 lines of code. But what is the most
important, this is a rather accurate project - it is compiled by presence of the /W4 switch in the project
project's
settings and /WX switch that makes analyzers treat each warning as an er error.

Static analysis by compiler
Now let's study the analysis procedure from the viewpoints of a compiler and a separate specialized
s
tool. The compiler is always inclined to generating warnings after processing only very small local code
fragments. This preference is a consequence of very strict performance requirements imposed on the
compiler. It is no coincidence that there exist tools of distributed project build. The time needed to
compile medium and large projects is a significant factor influencing the choice of development
methodology. So if developers can get a 5% performance gain out of the compiler, they will do it it.

Such optimization makes the compiler solider and actually such steps as preprocessing, building AST and
code generation are not so distinct. For instance, I may say relying on some indirect signs that Visual C++
uses different preprocessor algorithms when compiling projects and generating preprocessed "*.i" files.
The compiler also does not need (it is even harmful for it) to store the whole AST. Once the code for
some particular nodes is generated and they are no more needed, they get destroyed right away. During
the compilation process, AST may never exist in the full form. There is simply no need for that - we parse
a small code fragment, generate the code and go further. This saves memory and cache and therefore
increases speed.

The result of this approach is "locality" of warnings. The compiler consciously saves on various
structures that could help it detect higher-level errors. Let's see in practice what local warnings Intel C++
will generate for the Notepad++ project. Let me remind you that the Notepad++ project is built with the
Visual C++ compiler without any warnings with the /W4 switch enabled. But the Intel C++ compiler
certainly has a different set of warnings and I also set a specific switch /W5 [Intel C++]. Moreover, I
would like to have a look at what the Intel C++ compiler calls "remark".

Let's see what kinds of messages we get from Intel C++. Here it found four similar errors where the
CharUpper function is being handled (SEE NOTE AT THE END). Note the "locality" of the diagnosis - the
compiler found just a very dangerous type conversion. Let's study the corresponding code fragment:

wchar_t *destStr = new wchar_t[len+1];

...

for (int j = 0 ; j < nbChar ; j++)

{

if (Case == UPPERCASE)

destStr[j] =

(wchar_t)::CharUpperW((LPWSTR)destStr[j]);

else

destStr[j] =

(wchar_t)::CharLowerW((LPWSTR)destStr[j]);

}

Here we see strange type conversions. The Intel C++ compiler warns us: "#810: conversion from
"LPWSTR={WCHAR={__wchar_t} *}" to "__wchar_t" may lose significant bits". Let's look at the
CharUpper function's prototype.

LPTSTR WINAPI CharUpper(

__inout LPTSTR lpsz

);

The function handles a string and not separate characters at all. But here a character is cast to a pointer
and some memory area is modified by this pointer. How horrible.

Well, actually this is the only horrible issue detected by Intel C++. All the rest are much more boring and
are rather inaccurate code than error-prone code. But let's study some other warnings too.

The compiler generated a lot of #1125 warnings:

"#1125: function "Window::init(HINSTANCE, HWND)" is hidden by "TabBarPlus::init" -- virtual function
override intended?"

These are not errors but just poor naming of functions. We are interested in this message for a different
reason: although it seems to involve several classes for the check, the compiler does not keep special
data - it must store diverse information about base classes anyway, that is why this diagnosis is
implemented.

The next sample. The message "#186: pointless comparison of unsigned integer with zero" is generated
for the meaningless comparisons:

static LRESULT CALLBACK hookProcMouse(

UINT nCode, WPARAM wParam, LPARAM lParam)

{

if(nCode < 0)

{

...

return 0;

}

...

}

The "nCode < 0" condition is always false. It is a good example of good local diagnosis. You may easily
find an error this way.

Let's consider the last warning by Intel C++ and get finished with it. I think you have understood the
concept of "locality".

void ScintillaKeyMap::showCurrentSettings() {

int i = ::SendDlgItemMessage(...);

...

for (size_t i = 0 ; i < nrKeys ; i++)

{

...

}

}

Again we have no error here. It is just poor naming of variables. The "i" variable has the "int" type at
first. Then a new "i" variable of the "size_t" type is defined in the "for()" operator and is being used for
different purposes. At the moment when "size_t i" is defined, the compiler knows that there already
exists a variable with the same name and generates the warning. Again, it did not require the compiler
to store any additional data - it must remember anyway that the "int i" variable is available until the end
of the function's body.

Third-party static code analyzers
Now let's consider specialized static code analyzers. They do not have such severe speed restrictions
since they are launched ten times less frequently than compilers. The speed of their work might get tens
of times slower than code compilation but it is not crucial: for instance, the programmer may work with
the compiler at day and launch a static code analyzer at night to get a report about suspicious fragments
on the morning. It is quite a reasonable approach.

While paying with slow-down for their work, static code analyzers can store the whole code tree,
traverse it several times and store a lot of additional information. It lets them find "spreaded" and high-
level errors.

Let's see what the PVS-Studio static analyzer can find in Notepad++. Note that I am using a pilot version
that is not available for download yet. We will present the new free general-purpose rule set in 1-2
months within the scope of PVS-Studio 4.00.

Surely, the PVS-Studio analyzer finds errors that may be referred to "local" like in case of Intel C++. This
is the first sample:

bool _isPointXValid;

bool _isPointYValid;

bool isPointValid() {

return _isPointXValid && _isPointXValid;

};

The PVS-Studio analyzer informs us: "V501: There are identical sub-expressions to the left and to the
right of the '&&' operator: _isPointXValid && _isPointXValid".

I think the error is clear to you and we will not dwell upon it. The diagnosis is "local" because it is
enough to analyze one expression to perform the check.

Here is one more local error causing incomplete clearing of the _iContMap array:

#define CONT_MAP_MAX 50

int _iContMap[CONT_MAP_MAX];

...

DockingManager::DockingManager()

{

...

memset(_iContMap, -1, CONT_MAP_MAX);

...

}

Here we have the warning "V512: A call of the memset function will lead to a buffer overflow or
underflow". This is the correct code:

memset(_iContMap, -1, CONT_MAP_MAX * sizeof(int));

And now let's go over to more interesting issues. This is the code where we must analyze two branches
simultaneously to see that there is something wrong:

void TabBarPlus::drawItem(

DRAWITEMSTRUCT *pDrawItemStruct)

{

...

if (!_isVertical)

Flags |= DT_BOTTOM;

else

Flags |= DT_BOTTOM;

...

}

PVS-Studio generates the message "V523: The 'then' statement is equivalent to the 'else' statement". If
we review the code nearby, we may conclude that the author intended to write this text:

if (!_isVertical)

Flags |= DT_VCENTER;

else

Flags |= DT_BOTTOM;

And now get brave to meet a trial represented by the following code fragment:

void KeyWordsStyleDialog::updateDlg()

{

...

Style & w1Style =

_pUserLang->_styleArray.getStyler(STYLE_WORD1_INDEX);

styleUpdate(w1Style, _pFgColour[0], _pBgColour[0],

IDC_KEYWORD1_FONT_COMBO, IDC_KEYWORD1_FONTSIZE_COMBO,

IDC_KEYWORD1_BOLD_CHECK, IDC_KEYWORD1_ITALIC_CHECK,

IDC_KEYWORD1_UNDERLINE_CHECK);

Style & w2Style =






Style & w3Style =




IDC_KEYWORD3_BOLD_CHECK, IDC_KEYWORD3_BOLD_CHECK,


Style & w4Style =






...

}

I can say that I am proud of our analyzer PVS-Studio that managed to find an error here. I think you have
hardly noticed it or just have skipped the whole fragment to see the explanation. Code review is almost
helpless before this code. But the static analyzer is patient and pedantic: "V525: The code containing the
collection of similar blocks. Check items '7', '7', '6', '7' in lines 576, 580, 584, 588".

I will abridge the text to point out the most interesting fragment:

styleUpdate(...


...);

styleUpdate(...


...);

styleUpdate(...

IDC_KEYWORD3_BOLD_CHECK, !!! IDC_KEYWORD3_BOLD_CHECK !!!,

...);

styleUpdate(...


...);

This code was most likely written by the Copy-Paste method. As a result, it is
IDC_KEYWORD3_BOLD_CHECK which is used instead of IDC_KEYWORD3_ITALIC_CHECK. The warning
looks a bit strange reporting about numbers '7', '7', '6', '7'. Unfortunately, it cannot generate a clearer
message. These numbers arise from macros like these:

#define IDC_KEYWORD1_ITALIC_CHECK (IDC_KEYWORD1 + 7)

#define IDC_KEYWORD3_BOLD_CHECK (IDC_KEYWORD3 + 6)

The last cited sample is especially significant because it demonstrates that the PVS-Studio analyzer
processed a whole large code fragment simultaneously, detected repetitive structures in it and managed
to suspect something wrong relying on heuristic method. This is a very significant difference in the levels
of information processing performed by compilers and static analyzers.

Some figures
Let's touch upon one more consequence of "local" analysis performed by compilers and more global
analysis of specialized tools. In case of "local analysis", it is difficult to make it clear if some issue is really
dangerous or not. As a result, there are ten times more false alarms. Let me explain this by example.

When we analyzed the Notepad++ project, PVS-Studio generated only 10 warnings. 4 messages out of
them indicated real errors. The result is modest, but general-purpose analysis in PVS-Studio is only
beginning to develop. It will become one of the best in time.

When analyzing the Notepad++ project with the Intel C++ compiler, it generated 439 warnings and 3139
remarks. I do not know how many of them point to real errors, but I found strength to review some part
of these warnings and saw only 4 real issues related to CharUpper (see the above description).

3578 messages are too many for a close investigation of each of them. It turns out that the compiler
offers me to consider each 20-th line in the program (73000 / 3578 = 20). Well, come on, it's not serious.
When you are dealing with a general-purpose analyzer, you must cut off as much unnecessary stuff as
possible.

Those who tried the Viva64 rule set (included into PVS-Studio) may notice that it produces the same
huge amount of false alarms. But we have a different case there: we must detect all the suspicious type

conversions. It is more important not to miss an error than not to produce a false alarm. Besides, the
tool's settings provide a flexible filtering of false alarms.

UPDATE: Note
It turned out that I had written a wrong thing here. There is no error in the sample with CharUpperW
but nobody corrected me. I noticed it myself when I decided to implement a similar rule in PVS-Studio.

The point is that CharUpperW can handle both strings and individual characters. If the high-order part of
a pointer is zero, the pointer is considered a character and not pointer any more. Of course, the WIN API
interface in this place disappointed me by its poorness, but the code in Notepad++ is correct.

By the way, it turns out now that Intel C++ has not found any errors at all.

Difference of code analysis approaches in compilers and specialized tools

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (17)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Difference of code analysis approaches in compilers and specialized tools