SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
Interview with Anatoliy Kuznetsov, the
author of BitMagic C++ library
Author: Andrey Karpov

Date: 08.11.2009


Abstract
In this article, Anatoliy Kuznetsov answers the questions and tells us about the open BitMagic C++
Library.


Introduction
While regularly looking through the Internet-resources related to the sphere of 64-bit programming, I
often came across mentioning about BitMagic C++ Library and that it had gained a lot of benefits from
using 64-bits. I decided to communicate with the library's author and offer him to tell us in an interview
about his research and developments.

The questions are asked by: Andrey Karpov - "Program Verification Systems" company's worker
developing PVS-Studio tool for verification of modern C++ applications.

The answers are given by: Anatoliy Kuznetsov - chief software engineer in NCBI; developer of the open
library BitMagic C++ Library.


Hello, Anatoliy. Please, tell us about yourself. What projects are you
involved in?
Hello Andrey,

I am chief software engineer, at present I am working in the team of searching and visualizing bio-
molecular information in NCBI (National Center for Biotechnology Information). Besides my major
activity, I am the chief developer and architect of the open library BitMagic C++ Library.

By education I am planning engineer, a graduate of the Lobachevskiy University in Nizhniy Novgorod.


What is BitMagic?
BitMagic was developed as a universal template library for working with compressed bit vectors. The
library solves several tasks:

    •   Provides a bit container which is really compatible with STL by ideology. It means that the
        container must support iterators, memory allocators and interact with algorithms and other STL
        containers.
    •   The library can efficiently operate very long and sparse vectors.
    •   Provides a possibility of serialization of vectors for further writing them into databases or
        sending by net.
•   A developer is provided with a set of algorithms for implementing set-theory operations and
        calculating distances and similarity metrics in multidimensional binary spaces.
    •   Much consideration is given to optimization for the popular calculation acceleration systems,
        such as SSE.


In case of what tasks to be solved can BitMagic be of most interest for
developers?
The library turned out to be rather universal and perhaps it wouldn't be easy to list all the possible ways
to use it. At present, the library is of most interest in the following spheres:

    •   Building of bit and inverted indexes for full-text search systems, acceleration of relational
        algebra operations (AND, OR, JOIN etc).
    •   Development of non-standard extensions and indexes for existing databases (Oracle Cartridges,
        MS SQL extended stored procedures). As a rule, such extensions help integrate scientific,
        geographic and other non-standard data into the database.
    •   Development of data mining algorithms.
    •   Development of in-memory indexes and databases.
    •   Development of systems of precise access differentiation with a large number of objects
        (security enhanced databases with differentiation of access to separate fields and columns).
    •   Task management systems (on the computation cluster), systems of real-time tracing of task
        states, storage of task states described as Finite State Machines.
    •   Tasks of representing and storage of strongly connected graphs.


What can you tell about the history of creating BitMagic library? What
prompted you to create it?
For a long time, I and my colleagues had been working with the tasks related to large databases, analysis
and visualization systems. The very first working version demonstrating bit vectors' abilities was shown
by Maxim Shemanaryov (he is the developer of a wonderful 2D vector graphics library Antigrain
Geometry: http://www.antigrain.com). Then, some ideas of equivalent representation of sets were
described by Koen Van Damm, an engineer from Europe who was working on the parsers of
programming languages for verifying complex systems. There were other sources as well. I decided to
systematize it all somehow and present in the form of a library suitable for multiple use in various
projects.


What are the conditions of BitMagic library's distribution? Where can
one download it?
The library is free for commercial and non-commercial use and is available in the form of source texts.
The only restriction is the demand of mentioning the library and its authors when using it in the finite
product.

You can see the materials here: http://bmagic.sourceforge.net.
Am I right supposing that BitMagic gains significant advantages after
being compiled in the 64-bit version?
Really, the library uses a series of optimization methods accelerating work in 64-bit systems or systems
with SIMD commands (128-bit SSE2).

Here are the factors accelerating execution of algorithms:

    •   a wide machine word (logical operations are performed over a wide word);
    •   the programmer (and the compiler) has access to additional registers and lack of registers is not
        so crucial (there is such a disadvantage in x86 architecture);
    •   memory alignment often accelerates operation (128-bit alignment of addresses provides a good
        result);
    •   and of course the possibility to place more objects and data being processed in the memory of
        one program. This is a great plus of the 64-bit version clear to everyone.

At present, the quickest operation is available when using 128-bit SSE2 optimization in a 64-bit program.
This mode combines the double number of x86 registers and the wide machine word to perform logical
operations.

64-bit systems and programs are going through a real Renaissance. Migration of programs on 64-bits
will be faster than moving from 16 to 32. Appearance of 64-bit versions of Windows on mass market and
available toolkits (like the one your company is developing) will stimulate this process. In the
environment of constant growth of systems' complexity and the size of code used in them, such a toolkit
as PVS-Studio is a good help as it reduces efforts and forces release of products.


Tell us about the compression methods used in BitMagic, please.
The current 3.6.0 version of the library uses several compression methods.

    1. "Bitvectors" in memory are split into blocks. If a block is not occupied or is occupied fully, it is
       not allocated. That is, the programmer can set bits in a range very far from zero. Setting of bit
       100,000,000 doesn't lead to an explosion in memory consumption which is often characteristic
       of vectors with two-dimensional linear model.
    2. Blocks in memory can have an equivalent representation in the form of areas - gaps. Actually,
       this is a kind of RLE coding. Unlike RLE, our library doesn't lose the ability to execute logical
       operations or access random bits.
    3. When serializing "bitvectors", a set of other methods is used: conversion into lists of integer
       numbers (representing nulls or ones) and list coding by Elias Gamma Coding method. When
       using these methods, we do lose the ability of random bit access but it is not so crucial for
       writing on the disk in comparison with the reduction of costs on storage and input-output.


Could you give some code examples demonstrating the use of BitMagic
library?

One of the examples simply creates 2 vectors, initializes them and performs the logical operation AND.
Further, the class enumerator is used for iteration and printing of the values saved in the vector.

#include <iostream>
#include "bm.h"

using namespace std;

int main(void)

{

     bm::bvector<>          bv;

     bv[10] = true; bv[100] = true; bv[10000] = true;

     bm::bvector<>          bv2(bv);

     bv2[10000] = false;

     bv &= bv2;

     bm::bvector<>::enumerator en = bv.first();

     bm::bvector<>::enumerator en_end = bv.end();

     for (; en < en_end; ++en) {

           cout << *en << endl;

     }

     return 0;

}

The next example demonstrates serialization of vectors and use of compression mode.

#include <stdlib.h>

#include <iostream>

#include "bm.h"

#include "bmserial.h"

using namespace std;

// This procedure creates very dense bitvector.

// The resulting set will consists mostly from ON (1) bits

// interrupted with small gaps of 0 bits.

//

void fill_bvector(bm::bvector<>* bv)

{

     for (unsigned i = 0; i < MAX_VALUE; ++i) {

           if (rand() % 2500) {
bv->set_bit(i);

         }

     }

}

void print_statistics(const bm::bvector<>& bv)

{

     bm::bvector<>::statistics st;

     bv.calc_stat(&st);

     cout << "Bits count:" << bv.count() << endl;

     cout << "Bit blocks:" << st.bit_blocks << endl;

     cout << "GAP blocks:" << st.gap_blocks << endl;

     cout << "Memory used:"<< st.memory_used << endl;

     cout << "Max.serialize mem.:" <<

             st.max_serialize_mem << endl << endl;;

}

unsigned char* serialize_bvector(

    bm::serializer<bm::bvector<> >& bvs,

    bm::bvector<>& bv)

{

     // It is reccomended to optimize

     // vector before serialization.

     bv.optimize();

     bm::bvector<>::statistics st;

     bv.calc_stat(&st);

     cout << "Bits count:" << bv.count() << endl;

     cout << "Bit blocks:" << st.bit_blocks << endl;

     cout << "GAP blocks:" << st.gap_blocks << endl;

     cout << "Memory used:"<< st.memory_used << endl;

     cout << "Max.serialize mem.:" <<

              st.max_serialize_mem << endl;
// Allocate serialization buffer.

    unsigned char*   buf =

         new unsigned char[st.max_serialize_mem];

    // Serialization to memory.

    unsigned len = bvs.serialize(bv, buf, 0);

    cout << "Serialized size:" << len << endl << endl;

    return buf;

}

int main(void)

{

    bm::bvector<>    bv1;

    bm::bvector<>    bv2;

    //   set DGAP compression mode ON

    bv2.set_new_blocks_strat(bm::BM_GAP);

    fill_bvector(&bv1);

    fill_bvector(&bv2);

    // Prepare a serializer class

    // for best performance it is best

    // to create serilizer once and reuse it

    // (saves a lot of memory allocations)

    //

    bm::serializer<bm::bvector<> > bvs;

    // next settings provide lowest serilized size

    bvs.byte_order_serialization(false);

    bvs.gap_length_serialization(false);

    bvs.set_compression_level(4);

    unsigned char* buf1 = serialize_bvector(bvs, bv1);

    unsigned char* buf2 = serialize_bvector(bvs, bv2);

    // Serialized bvectors (buf1 and buf2) now ready to be

    // saved to a database, file or send over a network.
// ...

      // Deserialization.

      bm::bvector<>           bv3;

      // As a result of desrialization bv3

      // will contain all bits from

      // bv1 and bv3:

      //      bv3 = bv1 OR bv2

      bm::deserialize(bv3, buf1);

      bm::deserialize(bv3, buf2);

      print_statistics(bv3);

      // After a complex operation

      // we can try to optimize bv3.

      bv3.optimize();

      print_statistics(bv3);

      delete [] buf1;

      delete [] buf2;

      return 0;

}


What are your plans on developing BitMagic library?
We wish to implement some new vector compression methods with the ability of parallel data
procession.

Due to mass release of Intel Core i5-i7-i9, it is rational to release the library's version for SSE 4.2. Intel
company added some interesting features which can be efficiently used. The most interesting is the
hardware support of bit number calculation (Population Count).

We are experimenting with nVidia CUDA and other GPGPU. Graphics cards allow you to perform integer
and logical operations today - and their resources can be used for algorithms of working with sets and
compression.


References
    1. Elias Gamma encoding of bit-vector Delta gaps (D-Gaps).
       http://www.viva64.com/go.php?url=517
    2. Hierarchical Compression. http://www.viva64.com/go.php?url=518
    3. D-Gap Compression. http://www.viva64.com/go.php?url=519
4.   64-bit Programming And Optimization. http://www.viva64.com/go.php?url=520
5.   Optimization of memory allocations. http://www.viva64.com/go.php?url=521
6.   Bitvector as a container. http://www.viva64.com/go.php?url=522
7.   128-bit SSE2 optimization. http://www.viva64.com/go.php?url=523
8.   Using BM library in memory saving mode. http://www.viva64.com/go.php?url=524
9.   Efficient distance metrics. http://www.viva64.com/go.php?url=525

Weitere ähnliche Inhalte

Was ist angesagt?

Unpack mechanism of the msgpack-c
Unpack mechanism of the msgpack-cUnpack mechanism of the msgpack-c
Unpack mechanism of the msgpack-cTakatoshi Kondo
 
Memory mapping techniques and low power memory design
Memory mapping techniques and low power memory designMemory mapping techniques and low power memory design
Memory mapping techniques and low power memory designUET Taxila
 
Computer architecture cache memory
Computer architecture cache memoryComputer architecture cache memory
Computer architecture cache memoryMazin Alwaaly
 
Cache management
Cache managementCache management
Cache managementUET Taxila
 
Elements of cache design
Elements of cache designElements of cache design
Elements of cache designRohail Butt
 
Cache memory by Foysal
Cache memory by FoysalCache memory by Foysal
Cache memory by FoysalFoysal Mahmud
 
Address mapping
Address mappingAddress mapping
Address mappingrockymani
 
Cache memory ppt
Cache memory ppt  Cache memory ppt
Cache memory ppt Arpita Naik
 
04 cache memory.ppt 1
04 cache memory.ppt 104 cache memory.ppt 1
04 cache memory.ppt 1Anwal Mirza
 
Cache memory principles
Cache memory principlesCache memory principles
Cache memory principlesbit allahabad
 

Was ist angesagt? (20)

Cachememory
CachememoryCachememory
Cachememory
 
Unpack mechanism of the msgpack-c
Unpack mechanism of the msgpack-cUnpack mechanism of the msgpack-c
Unpack mechanism of the msgpack-c
 
Mapping
MappingMapping
Mapping
 
Memory mapping techniques and low power memory design
Memory mapping techniques and low power memory designMemory mapping techniques and low power memory design
Memory mapping techniques and low power memory design
 
Cache memory
Cache  memoryCache  memory
Cache memory
 
Computer architecture cache memory
Computer architecture cache memoryComputer architecture cache memory
Computer architecture cache memory
 
Cache management
Cache managementCache management
Cache management
 
Cache Memory
Cache MemoryCache Memory
Cache Memory
 
Cache memory
Cache memoryCache memory
Cache memory
 
Elements of cache design
Elements of cache designElements of cache design
Elements of cache design
 
Cache memory by Foysal
Cache memory by FoysalCache memory by Foysal
Cache memory by Foysal
 
04 cache memory
04 cache memory04 cache memory
04 cache memory
 
Address mapping
Address mappingAddress mapping
Address mapping
 
Memory Mapping Cache
Memory Mapping CacheMemory Mapping Cache
Memory Mapping Cache
 
Cache memory
Cache memoryCache memory
Cache memory
 
Cache memory
Cache memoryCache memory
Cache memory
 
Cache Memory
Cache MemoryCache Memory
Cache Memory
 
Cache memory ppt
Cache memory ppt  Cache memory ppt
Cache memory ppt
 
04 cache memory.ppt 1
04 cache memory.ppt 104 cache memory.ppt 1
04 cache memory.ppt 1
 
Cache memory principles
Cache memory principlesCache memory principles
Cache memory principles
 

Andere mochten auch

20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage Content20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage ContentBarry Feldman
 
50 Essential Content Marketing Hacks (Content Marketing World)
50 Essential Content Marketing Hacks (Content Marketing World)50 Essential Content Marketing Hacks (Content Marketing World)
50 Essential Content Marketing Hacks (Content Marketing World)Heinz Marketing Inc
 
Prototyping is an attitude
Prototyping is an attitudePrototyping is an attitude
Prototyping is an attitudeWith Company
 
10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer ExperienceYuan Wang
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanPost Planner
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionIn a Rocket
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting PersonalKirsty Hulse
 

Andere mochten auch (7)

20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage Content20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage Content
 
50 Essential Content Marketing Hacks (Content Marketing World)
50 Essential Content Marketing Hacks (Content Marketing World)50 Essential Content Marketing Hacks (Content Marketing World)
50 Essential Content Marketing Hacks (Content Marketing World)
 
Prototyping is an attitude
Prototyping is an attitudePrototyping is an attitude
Prototyping is an attitude
 
10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media Plan
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming Convention
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting Personal
 

Ähnlich wie Interview with Anatoliy Kuznetsov, the author of BitMagic C++ library

The reasons why 64-bit programs require more stack memory
The reasons why 64-bit programs require more stack memoryThe reasons why 64-bit programs require more stack memory
The reasons why 64-bit programs require more stack memoryPVS-Studio
 
Advanced High-Performance Computing Features of the OpenPOWER ISA
 Advanced High-Performance Computing Features of the OpenPOWER ISA Advanced High-Performance Computing Features of the OpenPOWER ISA
Advanced High-Performance Computing Features of the OpenPOWER ISAGanesan Narayanasamy
 
Comparison of analyzers' diagnostic possibilities at checking 64-bit code
Comparison of analyzers' diagnostic possibilities at checking 64-bit codeComparison of analyzers' diagnostic possibilities at checking 64-bit code
Comparison of analyzers' diagnostic possibilities at checking 64-bit codePVS-Studio
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...Rob Skillington
 
Please do ECE572 requirementECECS 472572 Final Exam Project (W.docx
Please do ECE572 requirementECECS 472572 Final Exam Project (W.docxPlease do ECE572 requirementECECS 472572 Final Exam Project (W.docx
Please do ECE572 requirementECECS 472572 Final Exam Project (W.docxARIV4
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Daniel Lemire
 
ECECS 472572 Final Exam ProjectRemember to check the errata
ECECS 472572 Final Exam ProjectRemember to check the errata ECECS 472572 Final Exam ProjectRemember to check the errata
ECECS 472572 Final Exam ProjectRemember to check the errata EvonCanales257
 
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docxECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docxtidwellveronique
 
ECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docxECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docxtidwellveronique
 
Optimization of 64-bit programs
Optimization of 64-bit programsOptimization of 64-bit programs
Optimization of 64-bit programsPVS-Studio
 
Case Study: Porting a set of point cloud and triangle mesh processing C++ lib...
Case Study: Porting a set of point cloud and triangle mesh processing C++ lib...Case Study: Porting a set of point cloud and triangle mesh processing C++ lib...
Case Study: Porting a set of point cloud and triangle mesh processing C++ lib...PVS-Studio
 
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmGenomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmDmitri Zimine
 
PVS-Studio vs Chromium
PVS-Studio vs ChromiumPVS-Studio vs Chromium
PVS-Studio vs ChromiumPVS-Studio
 
PVS-Studio vs Chromium
PVS-Studio vs ChromiumPVS-Studio vs Chromium
PVS-Studio vs ChromiumAndrey Karpov
 
Challenges in Embedded Development
Challenges in Embedded DevelopmentChallenges in Embedded Development
Challenges in Embedded DevelopmentSQABD
 
Lesson 26. Optimization of 64-bit programs
Lesson 26. Optimization of 64-bit programsLesson 26. Optimization of 64-bit programs
Lesson 26. Optimization of 64-bit programsPVS-Studio
 

Ähnlich wie Interview with Anatoliy Kuznetsov, the author of BitMagic C++ library (20)

embedded C.pptx
embedded C.pptxembedded C.pptx
embedded C.pptx
 
The reasons why 64-bit programs require more stack memory
The reasons why 64-bit programs require more stack memoryThe reasons why 64-bit programs require more stack memory
The reasons why 64-bit programs require more stack memory
 
Advanced High-Performance Computing Features of the OpenPOWER ISA
 Advanced High-Performance Computing Features of the OpenPOWER ISA Advanced High-Performance Computing Features of the OpenPOWER ISA
Advanced High-Performance Computing Features of the OpenPOWER ISA
 
Comparison of analyzers' diagnostic possibilities at checking 64-bit code
Comparison of analyzers' diagnostic possibilities at checking 64-bit codeComparison of analyzers' diagnostic possibilities at checking 64-bit code
Comparison of analyzers' diagnostic possibilities at checking 64-bit code
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 
Please do ECE572 requirementECECS 472572 Final Exam Project (W.docx
Please do ECE572 requirementECECS 472572 Final Exam Project (W.docxPlease do ECE572 requirementECECS 472572 Final Exam Project (W.docx
Please do ECE572 requirementECECS 472572 Final Exam Project (W.docx
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
ECECS 472572 Final Exam ProjectRemember to check the errata
ECECS 472572 Final Exam ProjectRemember to check the errata ECECS 472572 Final Exam ProjectRemember to check the errata
ECECS 472572 Final Exam ProjectRemember to check the errata
 
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docxECECS 472572 Final Exam ProjectRemember to check the errat.docx
ECECS 472572 Final Exam ProjectRemember to check the errat.docx
 
ECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docxECECS 472572 Final Exam ProjectRemember to check the err.docx
ECECS 472572 Final Exam ProjectRemember to check the err.docx
 
Architecture presentation 4
Architecture presentation 4Architecture presentation 4
Architecture presentation 4
 
Optimization of 64-bit programs
Optimization of 64-bit programsOptimization of 64-bit programs
Optimization of 64-bit programs
 
Case Study: Porting a set of point cloud and triangle mesh processing C++ lib...
Case Study: Porting a set of point cloud and triangle mesh processing C++ lib...Case Study: Porting a set of point cloud and triangle mesh processing C++ lib...
Case Study: Porting a set of point cloud and triangle mesh processing C++ lib...
 
Old code doesn't stink
Old code doesn't stinkOld code doesn't stink
Old code doesn't stink
 
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmGenomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
 
PVS-Studio vs Chromium
PVS-Studio vs ChromiumPVS-Studio vs Chromium
PVS-Studio vs Chromium
 
PVS-Studio vs Chromium
PVS-Studio vs ChromiumPVS-Studio vs Chromium
PVS-Studio vs Chromium
 
Challenges in Embedded Development
Challenges in Embedded DevelopmentChallenges in Embedded Development
Challenges in Embedded Development
 
Lesson 26. Optimization of 64-bit programs
Lesson 26. Optimization of 64-bit programsLesson 26. Optimization of 64-bit programs
Lesson 26. Optimization of 64-bit programs
 

Kürzlich hochgeladen

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Kürzlich hochgeladen (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Interview with Anatoliy Kuznetsov, the author of BitMagic C++ library

  • 1. Interview with Anatoliy Kuznetsov, the author of BitMagic C++ library Author: Andrey Karpov Date: 08.11.2009 Abstract In this article, Anatoliy Kuznetsov answers the questions and tells us about the open BitMagic C++ Library. Introduction While regularly looking through the Internet-resources related to the sphere of 64-bit programming, I often came across mentioning about BitMagic C++ Library and that it had gained a lot of benefits from using 64-bits. I decided to communicate with the library's author and offer him to tell us in an interview about his research and developments. The questions are asked by: Andrey Karpov - "Program Verification Systems" company's worker developing PVS-Studio tool for verification of modern C++ applications. The answers are given by: Anatoliy Kuznetsov - chief software engineer in NCBI; developer of the open library BitMagic C++ Library. Hello, Anatoliy. Please, tell us about yourself. What projects are you involved in? Hello Andrey, I am chief software engineer, at present I am working in the team of searching and visualizing bio- molecular information in NCBI (National Center for Biotechnology Information). Besides my major activity, I am the chief developer and architect of the open library BitMagic C++ Library. By education I am planning engineer, a graduate of the Lobachevskiy University in Nizhniy Novgorod. What is BitMagic? BitMagic was developed as a universal template library for working with compressed bit vectors. The library solves several tasks: • Provides a bit container which is really compatible with STL by ideology. It means that the container must support iterators, memory allocators and interact with algorithms and other STL containers. • The library can efficiently operate very long and sparse vectors. • Provides a possibility of serialization of vectors for further writing them into databases or sending by net.
  • 2. A developer is provided with a set of algorithms for implementing set-theory operations and calculating distances and similarity metrics in multidimensional binary spaces. • Much consideration is given to optimization for the popular calculation acceleration systems, such as SSE. In case of what tasks to be solved can BitMagic be of most interest for developers? The library turned out to be rather universal and perhaps it wouldn't be easy to list all the possible ways to use it. At present, the library is of most interest in the following spheres: • Building of bit and inverted indexes for full-text search systems, acceleration of relational algebra operations (AND, OR, JOIN etc). • Development of non-standard extensions and indexes for existing databases (Oracle Cartridges, MS SQL extended stored procedures). As a rule, such extensions help integrate scientific, geographic and other non-standard data into the database. • Development of data mining algorithms. • Development of in-memory indexes and databases. • Development of systems of precise access differentiation with a large number of objects (security enhanced databases with differentiation of access to separate fields and columns). • Task management systems (on the computation cluster), systems of real-time tracing of task states, storage of task states described as Finite State Machines. • Tasks of representing and storage of strongly connected graphs. What can you tell about the history of creating BitMagic library? What prompted you to create it? For a long time, I and my colleagues had been working with the tasks related to large databases, analysis and visualization systems. The very first working version demonstrating bit vectors' abilities was shown by Maxim Shemanaryov (he is the developer of a wonderful 2D vector graphics library Antigrain Geometry: http://www.antigrain.com). Then, some ideas of equivalent representation of sets were described by Koen Van Damm, an engineer from Europe who was working on the parsers of programming languages for verifying complex systems. There were other sources as well. I decided to systematize it all somehow and present in the form of a library suitable for multiple use in various projects. What are the conditions of BitMagic library's distribution? Where can one download it? The library is free for commercial and non-commercial use and is available in the form of source texts. The only restriction is the demand of mentioning the library and its authors when using it in the finite product. You can see the materials here: http://bmagic.sourceforge.net.
  • 3. Am I right supposing that BitMagic gains significant advantages after being compiled in the 64-bit version? Really, the library uses a series of optimization methods accelerating work in 64-bit systems or systems with SIMD commands (128-bit SSE2). Here are the factors accelerating execution of algorithms: • a wide machine word (logical operations are performed over a wide word); • the programmer (and the compiler) has access to additional registers and lack of registers is not so crucial (there is such a disadvantage in x86 architecture); • memory alignment often accelerates operation (128-bit alignment of addresses provides a good result); • and of course the possibility to place more objects and data being processed in the memory of one program. This is a great plus of the 64-bit version clear to everyone. At present, the quickest operation is available when using 128-bit SSE2 optimization in a 64-bit program. This mode combines the double number of x86 registers and the wide machine word to perform logical operations. 64-bit systems and programs are going through a real Renaissance. Migration of programs on 64-bits will be faster than moving from 16 to 32. Appearance of 64-bit versions of Windows on mass market and available toolkits (like the one your company is developing) will stimulate this process. In the environment of constant growth of systems' complexity and the size of code used in them, such a toolkit as PVS-Studio is a good help as it reduces efforts and forces release of products. Tell us about the compression methods used in BitMagic, please. The current 3.6.0 version of the library uses several compression methods. 1. "Bitvectors" in memory are split into blocks. If a block is not occupied or is occupied fully, it is not allocated. That is, the programmer can set bits in a range very far from zero. Setting of bit 100,000,000 doesn't lead to an explosion in memory consumption which is often characteristic of vectors with two-dimensional linear model. 2. Blocks in memory can have an equivalent representation in the form of areas - gaps. Actually, this is a kind of RLE coding. Unlike RLE, our library doesn't lose the ability to execute logical operations or access random bits. 3. When serializing "bitvectors", a set of other methods is used: conversion into lists of integer numbers (representing nulls or ones) and list coding by Elias Gamma Coding method. When using these methods, we do lose the ability of random bit access but it is not so crucial for writing on the disk in comparison with the reduction of costs on storage and input-output. Could you give some code examples demonstrating the use of BitMagic library? One of the examples simply creates 2 vectors, initializes them and performs the logical operation AND. Further, the class enumerator is used for iteration and printing of the values saved in the vector. #include <iostream>
  • 4. #include "bm.h" using namespace std; int main(void) { bm::bvector<> bv; bv[10] = true; bv[100] = true; bv[10000] = true; bm::bvector<> bv2(bv); bv2[10000] = false; bv &= bv2; bm::bvector<>::enumerator en = bv.first(); bm::bvector<>::enumerator en_end = bv.end(); for (; en < en_end; ++en) { cout << *en << endl; } return 0; } The next example demonstrates serialization of vectors and use of compression mode. #include <stdlib.h> #include <iostream> #include "bm.h" #include "bmserial.h" using namespace std; // This procedure creates very dense bitvector. // The resulting set will consists mostly from ON (1) bits // interrupted with small gaps of 0 bits. // void fill_bvector(bm::bvector<>* bv) { for (unsigned i = 0; i < MAX_VALUE; ++i) { if (rand() % 2500) {
  • 5. bv->set_bit(i); } } } void print_statistics(const bm::bvector<>& bv) { bm::bvector<>::statistics st; bv.calc_stat(&st); cout << "Bits count:" << bv.count() << endl; cout << "Bit blocks:" << st.bit_blocks << endl; cout << "GAP blocks:" << st.gap_blocks << endl; cout << "Memory used:"<< st.memory_used << endl; cout << "Max.serialize mem.:" << st.max_serialize_mem << endl << endl;; } unsigned char* serialize_bvector( bm::serializer<bm::bvector<> >& bvs, bm::bvector<>& bv) { // It is reccomended to optimize // vector before serialization. bv.optimize(); bm::bvector<>::statistics st; bv.calc_stat(&st); cout << "Bits count:" << bv.count() << endl; cout << "Bit blocks:" << st.bit_blocks << endl; cout << "GAP blocks:" << st.gap_blocks << endl; cout << "Memory used:"<< st.memory_used << endl; cout << "Max.serialize mem.:" << st.max_serialize_mem << endl;
  • 6. // Allocate serialization buffer. unsigned char* buf = new unsigned char[st.max_serialize_mem]; // Serialization to memory. unsigned len = bvs.serialize(bv, buf, 0); cout << "Serialized size:" << len << endl << endl; return buf; } int main(void) { bm::bvector<> bv1; bm::bvector<> bv2; // set DGAP compression mode ON bv2.set_new_blocks_strat(bm::BM_GAP); fill_bvector(&bv1); fill_bvector(&bv2); // Prepare a serializer class // for best performance it is best // to create serilizer once and reuse it // (saves a lot of memory allocations) // bm::serializer<bm::bvector<> > bvs; // next settings provide lowest serilized size bvs.byte_order_serialization(false); bvs.gap_length_serialization(false); bvs.set_compression_level(4); unsigned char* buf1 = serialize_bvector(bvs, bv1); unsigned char* buf2 = serialize_bvector(bvs, bv2); // Serialized bvectors (buf1 and buf2) now ready to be // saved to a database, file or send over a network.
  • 7. // ... // Deserialization. bm::bvector<> bv3; // As a result of desrialization bv3 // will contain all bits from // bv1 and bv3: // bv3 = bv1 OR bv2 bm::deserialize(bv3, buf1); bm::deserialize(bv3, buf2); print_statistics(bv3); // After a complex operation // we can try to optimize bv3. bv3.optimize(); print_statistics(bv3); delete [] buf1; delete [] buf2; return 0; } What are your plans on developing BitMagic library? We wish to implement some new vector compression methods with the ability of parallel data procession. Due to mass release of Intel Core i5-i7-i9, it is rational to release the library's version for SSE 4.2. Intel company added some interesting features which can be efficiently used. The most interesting is the hardware support of bit number calculation (Population Count). We are experimenting with nVidia CUDA and other GPGPU. Graphics cards allow you to perform integer and logical operations today - and their resources can be used for algorithms of working with sets and compression. References 1. Elias Gamma encoding of bit-vector Delta gaps (D-Gaps). http://www.viva64.com/go.php?url=517 2. Hierarchical Compression. http://www.viva64.com/go.php?url=518 3. D-Gap Compression. http://www.viva64.com/go.php?url=519
  • 8. 4. 64-bit Programming And Optimization. http://www.viva64.com/go.php?url=520 5. Optimization of memory allocations. http://www.viva64.com/go.php?url=521 6. Bitvector as a container. http://www.viva64.com/go.php?url=522 7. 128-bit SSE2 optimization. http://www.viva64.com/go.php?url=523 8. Using BM library in memory saving mode. http://www.viva64.com/go.php?url=524 9. Efficient distance metrics. http://www.viva64.com/go.php?url=525