SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Remove Branches in
   BitVector Select Operations
         - marisa 0.2.2 -
                Susumu Yata
                  @s5yata
                 Brazil, Inc.

                                  1
30 March 2013      Brazil, Inc.
Who I Am
Job
   Brazil, Inc. (groonga developer)
   We need R&D software engineers.


Personal research & development
   Tries
       darts-clone, marisa-trie, etc.
   Corpus
       Nihongo Web Corpus 2010 (NWC 2010)
                                             2
30 March 2013           Brazil, Inc.
Relationships between BitVector and Marisa.

  BitVector and Marisa

                                                3
30 March 2013              Brazil, Inc.
BitVector
What‟s BitVector?
   A sequence of bits


Operations
   BitVector::get(i)
   BitVector::rank(i)
   BitVector::select(i)


                                   4
30 March 2013       Brazil, Inc.
BitVector – Get Operations
Interface
   BitVector::get(i)


Description
   The i-th bit (“0” or “1”)

     0     1    2   …   i–1      i     i+1   …   n-2   n-1
     0     0    1   …    0       1      1    …   0     0


                              Get!
                                                             5
30 March 2013           Brazil, Inc.
BitVector – Rank Operations
Interface
   BitVector::rank(i)


Description
   The number of “1”s up to the i-th bit

     0     1    2   …     i–1    i     i+1   …   n-2   n-1
     0     0    1   …     0      1      1    …   0     0


         How many “1”s?
                                                             6
30 March 2013           Brazil, Inc.
BitVector – Select Operations
Interface
   BitVector::select(i)


Description
   The position of the i-th “1”

     0     1    2   …    …      …      …   …   n-2   n-1
     0     0    1   …    …      …      …   …   0     0


                Where is the i-th “1”?
                                                           7
30 March 2013           Brazil, Inc.
Marisa
 Who‟s Marisa?
   An ordinary human magician

 What‟s Marisa?
   A static and space-efficient dictionary

 Data structure
   Recursive LOUDS-based Patricia tries

 Site
   http://code.google.com/p/marisa-trie
                                              8
30 March 2013        Brazil, Inc.
Marisa – Patricia
Patricia is a labeled tree.
      Keys = Tree + Labels

                                             Node    Label
 ID        Key                                1       “Ar”
                                         4
  0    “Argentina”             1              2     “Brazil”
  1     “Armenia”                        5    3        „C‟
                      0        2
  2      “Brazil”                             4     “gentina”
                                         6
  3     “Canada”               3              5     “menia”
  4     “Cyprus”                         7    6     “anada”
                                              7     “yprus”

                                                                9
30 March 2013             Brazil, Inc.
Marisa – Recursiveness
Unfortunately, this margin is too small…
   Keys = Tree + Labels
   Labels = Tree + Labels
   Labels = Tree + Labels <– Reasonable
   Labels = Tree + Labels
   Labels = Tree + Labels
   Labels = Tree + Labels
   Labels = Tree + Labels
   …
                                           10
30 March 2013       Brazil, Inc.
Marisa – BitVector Usage
LOUDS
   Level-Order Unary Degree Sequence


Terminal flags
   A node is terminal (“1”) or not (“0”).


Link flags
   A node has a link to its multi-byte label
    (“1”) or has a built-in single-byte label (“0”).
                                                  11
30 March 2013         Brazil, Inc.
Marisa – BitVector Usage
LOUDS
   BitVector::get(), select()


Terminal flags
   BitVector::get(), rank(), select()


Link flags
   BitVector::get(), rank()

                                         12
30 March 2013        Brazil, Inc.
How to implement Rank/Select operations.

  Implementations

                                             13
30 March 2013              Brazil, Inc.
Rank Dictionary
Index structures
   r_idx[x].abs = rank(512・x)
       x = 0, 1, 2, …
   r_idx[x].rel[y] =
     rank(512・x + 64・y) – rank(512・x)
       Y = 1, 2, 3, … , 7


Calculation
   abs + rel + popcnt()
                                            14
30 March 2013                Brazil, Inc.
Rank Operations
Time complexity = O(1)
       512              512               512             512             512

         r_idx.abs

             64    64         64     64         64   64         64   64

                   r_idx.rel

                                                64

                                   popcnt()
                                                                                15
30 March 2013                        Brazil, Inc.
Select Dictionary
Index structure
   s_idx[x] = select(512・x)
       i = 0, 1, 2, …


Calculation
   Limit the range by using s_idx.
   Limit the range by using r_idx[x].abs.
   Limit the range by using r_idx[x].rel[y].
   Find the i-th “1” in the range.
                                                16
30 March 2013            Brazil, Inc.
Select Operations
        s_idx                                                s_idx
  512       512        512         512         512         512       512

                  r_idx.abs                   r_idx.abs


           64     64   64     64         64    64     64     64

                  r_idx.rel                          r_idx.rel

                                         64

                               Final round
                                                                           17
30 March 2013                 Brazil, Inc.
Select Final Round
Binary search & table lookup
   Three-level branches
                                     if

                  if                                    if

         if                if                  if                if


     8        8        8        8         8         8        8        8



                Table lookup
                                                                          18
30 March 2013                   Brazil, Inc.
How to remove the branches in the final round.

  Improvements

                                                   19
30 March 2013               Brazil, Inc.
Original
// x is the final 64-bit block (uint64_t).
x = x – ((x >> 1) & MASK_55);
x = (x & MASK_33) + ((x >> 2) & MASK_33);
x = (x + (x >> 4)) & MASK_0F;
x *= MASK_01;          // Tricky popcount
if (i < ((x >> 24) & 0xFF)) { // The first-level branch
  if (i < ((x >> 8) & 0xFF)) { // The second-level branch
    if (i < (x & 0xFF)) {       // The third-level branch
      // The first byte contains the i-th “1”.
    } else {
      // The second byte contains the i-th “1”.
                                                       20
30 March 2013           Brazil, Inc.
Tips – Tricky PopCount
       0        1       1       1       0       0       1       0


x = x – ((x >> 1) & MASK_55);
           1                2               0               1


x = (x & MASK_33) + ((x >> 2) & MASK_33);
                    3                               1


x = (x + (x >> 4)) & MASK_0F;
                                    4

                                                                    21
30 March 2013                       Brazil, Inc.
Tips – Tricky PopCount
// MASK_01 = 0x0101010101010101ULL;
// x = x | (x << 8) | (x << 16) | (x << 24) | …;
x *= MASK_01;

       4        1    3      5       2     6        3   4




      28             23            15              7


                24         20             13           4



                                                           22
30 March 2013              Brazil, Inc.
+ SSE2 (After PopCount)
// y[0 … 7] = i + 1;
__m128i y = _mm_cvtsi64_si128((i + 1) * MASK_01);
__m128i z = _mm_cvtsi64_si128(x);

// Compare the 16 8-bit signed integers in y and z.
// y[k] = (y[k] > z[k]) ? 0xFF : 0x00;
y = _mm_cmpgt_epi8(y, z);          // PCMPGTB

// The j-th byte contains the i-th “1”.
// TABLE is a 128-byte pre-computed table.
uint8_t j = TABLE[_mm_movemask_epi8(y)];
                                                      23
30 March 2013           Brazil, Inc.
Tips – PCMPGTB
y = _mm_cvtsi64_si128((i + 1) * MASK_01);
      20        20   20     20      20     20     20     20


z = _mm_cvtsi64_si128(x);
      28        24   23     20       15     13     7      4


// y[k] = (y[k] > z[k]) ? 0xFF : 0x00;
y = _mm_cmpgt_epi8(y, z);

     0x00   0x00     0x00   0x00   0xFF    0xFF   0xFF   0xFF


                                                                24
30 March 2013               Brazil, Inc.
+ Tricks (After Comparison)
uint64_t j = _mm_cvtsi128_si64(y);

// Calculation without TABLE
j = ((j & MASK_01) * MASK_01) >> 56;

// Calculation with BSR
j = (63 – __builtin_clzll(j + 1)) / 8;

// Calculation with popcnt (SSE4.2 or SSE4a)
j = __builtin_popcountll(j) / 8;

                                               25
30 March 2013              Brazil, Inc.
– SSE2 (Simple and Fast)
// x is the final 64-bit block (uint64_t).
x = x – ((x >> 1) & MASK_55);
x = (x & MASK_33) + ((x >> 2) & MASK_33);
x = (x + (x >> 4)) & MASK_0F;
x *= MASK_01;        // Tricky popcount

uint64_t y = (i + 1) * MASK_01;
uint64_t z = x | MASK_80;
// Compare the 8 7-bit unsigned integers in y and z.
z = (z – y) & MASK_80;
uint8_t j = __builtin_ctzll(z) / 8;
                                                       26
30 March 2013           Brazil, Inc.
Tips – Comparison
uint64_t y = (i + 1) * MASK_01;
     0x14   0x14   0x14   0x14   0x14    0x14   0x14   0x14


uint64_t z = x | MASK_80;
     0x9C   0x98   0x97   0x94   0x8F    0x8D   0x87   0x84


// Compare the 8 7-bit unsigned integers in y and z.
z = (z – y) & MASK_80;

     0x80   0x80   0x80   0x80   0x00    0x00   0x00   0x00


                                                              27
30 March 2013             Brazil, Inc.
+ SSSE3 (For PopCount)
// Get lower nibbles and upper nibbles of x.
__m128i lower = _mm_cvtsi64_si128(x & MASK_0F);
__m128i upper = _mm_cvtsi64_si128(x & MASK_F0);
upper = _mm_srli_epi32(upper, 4);
// Use PSHUFB for counting “1”s in each nibble.
__m128i table =
   _mm_set_epi8(4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, 2, 1, 1, 0);
lower = _mm_shuffle_epi8(table, lower);
upper = _mm_shuffle_epi8(table, upper);
// Merge the counts to get the number of “1”s in each byte.
x = _mm_cvtsi128_si64(_mm_add_epi8(lower, upper));
x *= MASK_01;
                                                               28
30 March 2013               Brazil, Inc.
Tips – PSHUFB
lower = _mm_cvtsi64_si128(x & MASK_0F);
         12           8           7           4           15           13           7           4


table = _mm_set_epi8(4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, …);
     4        3   3       2   3       2   2       1   3        2   2        1   2       1   1       0


// Perform a parallel 16-way lookup.
lower = _mm_shuffle_epi8(table, lower);

         2            1           3           1           4            3            3           1


                                                                                                        29
30 March 2013                             Brazil, Inc.
How effective the improvements are.

  Evaluation

                                          30
30 March 2013              Brazil, Inc.
Environment
OS
   Mac OSX 10.8.3 (64-bit)
CPU
   Core i7 3720QM – Ivy Bridge
   2.6GHz – up to 3.6GHz
Compiler
   Apple LLVM version 4.2 (clang-425.0.24)
    (based on LLVM 3.2svn)

                                              31
30 March 2013       Brazil, Inc.
Data
Source
   Japanese Wikipedia page titles
   gzip –cd jawiki-20130328-all-titles-in-
    ns0.gz | LC_ALL=C sort –R > data


Details
   Number of keys: 1,367,750
   Average length: 21.14 bytes
   Total length: 28,919,893 bytes
                                              32
30 March 2013       Brazil, Inc.
Binaries
marisa 0.2.1
   ./configure CXX=clang++ --enable-popcnt
   make
   tools/marisa-benchmark < data


marisa 0.2.2
   ./configure CXX=clang++ --enable-sse4
   make
   tools/marisa-benchmark < data
                                              33
30 March 2013      Brazil, Inc.
Results – marisa 0.2.1
Without improvements
   #Tries        Size     Build      Lookup Reverse      Prefix   Predict
                 [KB]    [Kqps]       [Kqps] [Kqps]      [Kqps]    [Kqps]
         1      11,811    724         1,105      1,223   1,038       711
        2       8,639     632          790        877     753       453
        3       8,001      621         750        816     708       406
        4       7,788      591         723        791     687        391
        5       7,701     590           712       781     680       384

   Baseline

                                                                            34
30 March 2013                     Brazil, Inc.
Results – marisa 0.2.2
With improvements
   #Tries        Size     Build      Lookup Reverse      Prefix   Predict
                 [KB]    [Kqps]       [Kqps] [Kqps]      [Kqps]    [Kqps]
         1      11,811    757         1,198      1,359   1,115      772
        2       8,639     657          873       1,000    820       503
        3       8,001      621          817       924     770       453
        4       7,788      613         797        900     752       438
        5       7,701      610         787        884     737       427

   Same size
   Faster operations
                                                                            35
30 March 2013                     Brazil, Inc.
Results – Improvements
Improvement ratios
   #Tries       Size   Build      Lookup Reverse       Prefix   Predict
                 [%]    [%]          [%]     [%]          [%]      [%]
         1      0.00   +4.56      +8.42       +11.12   +7.42     +8.58
        2       0.00   +3.96     +10.52       +14.03   +8.90    +11.04
        3       0.00   0.00       +8.93       +13.24   +8.76    +11.58
        4       0.00   +3.72     +10.24       +13.78   +9.46    +12.02
        5       0.00   +3.39     +10.53       +13.19   +8.38    +11.20

   Same size
   Faster operations
                                                                          36
30 March 2013                  Brazil, Inc.
Conclusion
   “Any sufficiently advanced technology
      is indistinguishable from magic.”


    “Any sufficiently advanced technique
      is indistinguishable from magic.”


                “You are magician.”
                                           37
30 March 2013         Brazil, Inc.

Weitere ähnliche Inhalte

Was ist angesagt?

学術会議 ITシンポジウム資料「プライバシー保護技術の概観と展望」
学術会議 ITシンポジウム資料「プライバシー保護技術の概観と展望」学術会議 ITシンポジウム資料「プライバシー保護技術の概観と展望」
学術会議 ITシンポジウム資料「プライバシー保護技術の概観と展望」Hiroshi Nakagawa
 
Cache-Oblivious データ構造入門 @DSIRNLP#5
Cache-Oblivious データ構造入門 @DSIRNLP#5Cache-Oblivious データ構造入門 @DSIRNLP#5
Cache-Oblivious データ構造入門 @DSIRNLP#5Takuya Akiba
 
TiDB at PayPay
TiDB at PayPayTiDB at PayPay
TiDB at PayPayPingCAP
 
Popcntによるハミング距離計算
Popcntによるハミング距離計算Popcntによるハミング距離計算
Popcntによるハミング距離計算Norishige Fukushima
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Databricks
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Agile Testing Alliance
 
MySQL Buffer Management
MySQL Buffer ManagementMySQL Buffer Management
MySQL Buffer ManagementMIJIN AN
 
PHPとブロックチェーンを使ったwebアプリ開発
PHPとブロックチェーンを使ったwebアプリ開発PHPとブロックチェーンを使ったwebアプリ開発
PHPとブロックチェーンを使ったwebアプリ開発Naota Takahashi
 
Sparklens: Understanding the Scalability Limits of Spark Applications with R...
 Sparklens: Understanding the Scalability Limits of Spark Applications with R... Sparklens: Understanding the Scalability Limits of Spark Applications with R...
Sparklens: Understanding the Scalability Limits of Spark Applications with R...Databricks
 
emscriptenでC/C++プログラムをwebブラウザから使うまでの難所攻略
emscriptenでC/C++プログラムをwebブラウザから使うまでの難所攻略emscriptenでC/C++プログラムをwebブラウザから使うまでの難所攻略
emscriptenでC/C++プログラムをwebブラウザから使うまでの難所攻略祐司 伊藤
 
C++と仲良くなるためのn問 ~ポインタ編~ #ladiescpp
C++と仲良くなるためのn問 ~ポインタ編~ #ladiescppC++と仲良くなるためのn問 ~ポインタ編~ #ladiescpp
C++と仲良くなるためのn問 ~ポインタ編~ #ladiescppcocodrips
 
Skip list vinay khimsuriya_200430723005
Skip list vinay khimsuriya_200430723005Skip list vinay khimsuriya_200430723005
Skip list vinay khimsuriya_200430723005vinaykhimsuriya1
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxData
 
Optimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningOptimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningDatabricks
 
GAN을 이용한 캐릭터 리소스 제작 맛보기
GAN을 이용한 캐릭터 리소스 제작 맛보기GAN을 이용한 캐릭터 리소스 제작 맛보기
GAN을 이용한 캐릭터 리소스 제작 맛보기기룡 남
 
Fibonacci search
Fibonacci searchFibonacci search
Fibonacci searchneilluiz94
 
In-memory Database and MySQL Cluster
In-memory Database and MySQL ClusterIn-memory Database and MySQL Cluster
In-memory Database and MySQL Clustergrandis_au
 

Was ist angesagt? (20)

学術会議 ITシンポジウム資料「プライバシー保護技術の概観と展望」
学術会議 ITシンポジウム資料「プライバシー保護技術の概観と展望」学術会議 ITシンポジウム資料「プライバシー保護技術の概観と展望」
学術会議 ITシンポジウム資料「プライバシー保護技術の概観と展望」
 
Cache-Oblivious データ構造入門 @DSIRNLP#5
Cache-Oblivious データ構造入門 @DSIRNLP#5Cache-Oblivious データ構造入門 @DSIRNLP#5
Cache-Oblivious データ構造入門 @DSIRNLP#5
 
TiDB at PayPay
TiDB at PayPayTiDB at PayPay
TiDB at PayPay
 
Popcntによるハミング距離計算
Popcntによるハミング距離計算Popcntによるハミング距離計算
Popcntによるハミング距離計算
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...
 
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
 
MySQL Buffer Management
MySQL Buffer ManagementMySQL Buffer Management
MySQL Buffer Management
 
PHPとブロックチェーンを使ったwebアプリ開発
PHPとブロックチェーンを使ったwebアプリ開発PHPとブロックチェーンを使ったwebアプリ開発
PHPとブロックチェーンを使ったwebアプリ開発
 
Fp growth
Fp growthFp growth
Fp growth
 
Sparklens: Understanding the Scalability Limits of Spark Applications with R...
 Sparklens: Understanding the Scalability Limits of Spark Applications with R... Sparklens: Understanding the Scalability Limits of Spark Applications with R...
Sparklens: Understanding the Scalability Limits of Spark Applications with R...
 
emscriptenでC/C++プログラムをwebブラウザから使うまでの難所攻略
emscriptenでC/C++プログラムをwebブラウザから使うまでの難所攻略emscriptenでC/C++プログラムをwebブラウザから使うまでの難所攻略
emscriptenでC/C++プログラムをwebブラウザから使うまでの難所攻略
 
Lec 1 indexing and hashing
Lec 1 indexing and hashing Lec 1 indexing and hashing
Lec 1 indexing and hashing
 
C++と仲良くなるためのn問 ~ポインタ編~ #ladiescpp
C++と仲良くなるためのn問 ~ポインタ編~ #ladiescppC++と仲良くなるためのn問 ~ポインタ編~ #ladiescpp
C++と仲良くなるためのn問 ~ポインタ編~ #ladiescpp
 
Projection Matrices
Projection MatricesProjection Matrices
Projection Matrices
 
Skip list vinay khimsuriya_200430723005
Skip list vinay khimsuriya_200430723005Skip list vinay khimsuriya_200430723005
Skip list vinay khimsuriya_200430723005
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
 
Optimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningOptimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File Pruning
 
GAN을 이용한 캐릭터 리소스 제작 맛보기
GAN을 이용한 캐릭터 리소스 제작 맛보기GAN을 이용한 캐릭터 리소스 제작 맛보기
GAN을 이용한 캐릭터 리소스 제작 맛보기
 
Fibonacci search
Fibonacci searchFibonacci search
Fibonacci search
 
In-memory Database and MySQL Cluster
In-memory Database and MySQL ClusterIn-memory Database and MySQL Cluster
In-memory Database and MySQL Cluster
 

Kürzlich hochgeladen

Specialize in a MSc within Biomanufacturing, and work part-time as Process En...
Specialize in a MSc within Biomanufacturing, and work part-time as Process En...Specialize in a MSc within Biomanufacturing, and work part-time as Process En...
Specialize in a MSc within Biomanufacturing, and work part-time as Process En...Juli Boned
 
B.tech civil major project by Deepak Kumar
B.tech civil major project by Deepak KumarB.tech civil major project by Deepak Kumar
B.tech civil major project by Deepak KumarDeepak15CivilEngg
 
207095666-Book-Review-on-Ignited-Minds-Final.pptx
207095666-Book-Review-on-Ignited-Minds-Final.pptx207095666-Book-Review-on-Ignited-Minds-Final.pptx
207095666-Book-Review-on-Ignited-Minds-Final.pptxpawangadkhe786
 
drug book file on obs. and gynae clinical pstings
drug book file on obs. and gynae clinical pstingsdrug book file on obs. and gynae clinical pstings
drug book file on obs. and gynae clinical pstingsKarishma7720
 
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证eqaqen
 
Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...
Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...
Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...gajnagarg
 
K Venkat Naveen Kumar | GCP Data Engineer | CV
K Venkat Naveen Kumar | GCP Data Engineer | CVK Venkat Naveen Kumar | GCP Data Engineer | CV
K Venkat Naveen Kumar | GCP Data Engineer | CVK VENKAT NAVEEN KUMAR
 
Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...
Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...
Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...vershagrag
 
Guide to a Winning Interview May 2024 for MCWN
Guide to a Winning Interview May 2024 for MCWNGuide to a Winning Interview May 2024 for MCWN
Guide to a Winning Interview May 2024 for MCWNBruce Bennett
 
Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Top profile Call Girls In godhra [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In godhra [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In godhra [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In godhra [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Simple, 3-Step Strategy to Improve Your Executive Presence (Even if You Don't...
Simple, 3-Step Strategy to Improve Your Executive Presence (Even if You Don't...Simple, 3-Step Strategy to Improve Your Executive Presence (Even if You Don't...
Simple, 3-Step Strategy to Improve Your Executive Presence (Even if You Don't...Angela Justice, PhD
 
Miletti Gabriela_Vision Plan for artist Jahzel.pdf
Miletti Gabriela_Vision Plan for artist Jahzel.pdfMiletti Gabriela_Vision Plan for artist Jahzel.pdf
Miletti Gabriela_Vision Plan for artist Jahzel.pdfGabrielaMiletti
 
Personal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando NegronPersonal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando Negronnegronf24
 
Joshua Minker Brand Exploration Sports Broadcaster .pptx
Joshua Minker Brand Exploration Sports Broadcaster .pptxJoshua Minker Brand Exploration Sports Broadcaster .pptx
Joshua Minker Brand Exploration Sports Broadcaster .pptxsportsworldproductio
 
Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Novo Nordisk Kalundborg. We are expanding our manufacturing hub in Kalundborg...
Novo Nordisk Kalundborg. We are expanding our manufacturing hub in Kalundborg...Novo Nordisk Kalundborg. We are expanding our manufacturing hub in Kalundborg...
Novo Nordisk Kalundborg. We are expanding our manufacturing hub in Kalundborg...Juli Boned
 
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...gynedubai
 
怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制
怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制
怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制yynod
 
Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...nirzagarg
 

Kürzlich hochgeladen (20)

Specialize in a MSc within Biomanufacturing, and work part-time as Process En...
Specialize in a MSc within Biomanufacturing, and work part-time as Process En...Specialize in a MSc within Biomanufacturing, and work part-time as Process En...
Specialize in a MSc within Biomanufacturing, and work part-time as Process En...
 
B.tech civil major project by Deepak Kumar
B.tech civil major project by Deepak KumarB.tech civil major project by Deepak Kumar
B.tech civil major project by Deepak Kumar
 
207095666-Book-Review-on-Ignited-Minds-Final.pptx
207095666-Book-Review-on-Ignited-Minds-Final.pptx207095666-Book-Review-on-Ignited-Minds-Final.pptx
207095666-Book-Review-on-Ignited-Minds-Final.pptx
 
drug book file on obs. and gynae clinical pstings
drug book file on obs. and gynae clinical pstingsdrug book file on obs. and gynae clinical pstings
drug book file on obs. and gynae clinical pstings
 
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
一比一定(购)中央昆士兰大学毕业证(CQU毕业证)成绩单学位证
 
Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...
Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...
Top profile Call Girls In bhubaneswar [ 7014168258 ] Call Me For Genuine Mode...
 
K Venkat Naveen Kumar | GCP Data Engineer | CV
K Venkat Naveen Kumar | GCP Data Engineer | CVK Venkat Naveen Kumar | GCP Data Engineer | CV
K Venkat Naveen Kumar | GCP Data Engineer | CV
 
Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...
Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...
Low Cost Coimbatore Call Girls Service 👉📞 6378878445 👉📞 Just📲 Call Ruhi Call ...
 
Guide to a Winning Interview May 2024 for MCWN
Guide to a Winning Interview May 2024 for MCWNGuide to a Winning Interview May 2024 for MCWN
Guide to a Winning Interview May 2024 for MCWN
 
Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hubli [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In godhra [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In godhra [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In godhra [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In godhra [ 7014168258 ] Call Me For Genuine Models We...
 
Simple, 3-Step Strategy to Improve Your Executive Presence (Even if You Don't...
Simple, 3-Step Strategy to Improve Your Executive Presence (Even if You Don't...Simple, 3-Step Strategy to Improve Your Executive Presence (Even if You Don't...
Simple, 3-Step Strategy to Improve Your Executive Presence (Even if You Don't...
 
Miletti Gabriela_Vision Plan for artist Jahzel.pdf
Miletti Gabriela_Vision Plan for artist Jahzel.pdfMiletti Gabriela_Vision Plan for artist Jahzel.pdf
Miletti Gabriela_Vision Plan for artist Jahzel.pdf
 
Personal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando NegronPersonal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando Negron
 
Joshua Minker Brand Exploration Sports Broadcaster .pptx
Joshua Minker Brand Exploration Sports Broadcaster .pptxJoshua Minker Brand Exploration Sports Broadcaster .pptx
Joshua Minker Brand Exploration Sports Broadcaster .pptx
 
Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Rampur [ 7014168258 ] Call Me For Genuine Models We...
 
Novo Nordisk Kalundborg. We are expanding our manufacturing hub in Kalundborg...
Novo Nordisk Kalundborg. We are expanding our manufacturing hub in Kalundborg...Novo Nordisk Kalundborg. We are expanding our manufacturing hub in Kalundborg...
Novo Nordisk Kalundborg. We are expanding our manufacturing hub in Kalundborg...
 
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
<DUBAI>Abortion pills IN UAE {{+971561686603*^Mifepristone & Misoprostol in D...
 
怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制
怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制
怎样办理哥伦比亚大学毕业证(Columbia毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Shivamogga [ 7014168258 ] Call Me For Genuine Model...
 

X86opti 05 s5yata

  • 1. Remove Branches in BitVector Select Operations - marisa 0.2.2 - Susumu Yata @s5yata Brazil, Inc. 1 30 March 2013 Brazil, Inc.
  • 2. Who I Am Job Brazil, Inc. (groonga developer) We need R&D software engineers. Personal research & development Tries darts-clone, marisa-trie, etc. Corpus Nihongo Web Corpus 2010 (NWC 2010) 2 30 March 2013 Brazil, Inc.
  • 3. Relationships between BitVector and Marisa. BitVector and Marisa 3 30 March 2013 Brazil, Inc.
  • 4. BitVector What‟s BitVector? A sequence of bits Operations BitVector::get(i) BitVector::rank(i) BitVector::select(i) 4 30 March 2013 Brazil, Inc.
  • 5. BitVector – Get Operations Interface BitVector::get(i) Description The i-th bit (“0” or “1”) 0 1 2 … i–1 i i+1 … n-2 n-1 0 0 1 … 0 1 1 … 0 0 Get! 5 30 March 2013 Brazil, Inc.
  • 6. BitVector – Rank Operations Interface BitVector::rank(i) Description The number of “1”s up to the i-th bit 0 1 2 … i–1 i i+1 … n-2 n-1 0 0 1 … 0 1 1 … 0 0 How many “1”s? 6 30 March 2013 Brazil, Inc.
  • 7. BitVector – Select Operations Interface BitVector::select(i) Description The position of the i-th “1” 0 1 2 … … … … … n-2 n-1 0 0 1 … … … … … 0 0 Where is the i-th “1”? 7 30 March 2013 Brazil, Inc.
  • 8. Marisa  Who‟s Marisa? An ordinary human magician  What‟s Marisa? A static and space-efficient dictionary  Data structure Recursive LOUDS-based Patricia tries  Site http://code.google.com/p/marisa-trie 8 30 March 2013 Brazil, Inc.
  • 9. Marisa – Patricia Patricia is a labeled tree. Keys = Tree + Labels Node Label ID Key 1 “Ar” 4 0 “Argentina” 1 2 “Brazil” 1 “Armenia” 5 3 „C‟ 0 2 2 “Brazil” 4 “gentina” 6 3 “Canada” 3 5 “menia” 4 “Cyprus” 7 6 “anada” 7 “yprus” 9 30 March 2013 Brazil, Inc.
  • 10. Marisa – Recursiveness Unfortunately, this margin is too small… Keys = Tree + Labels Labels = Tree + Labels Labels = Tree + Labels <– Reasonable Labels = Tree + Labels Labels = Tree + Labels Labels = Tree + Labels Labels = Tree + Labels … 10 30 March 2013 Brazil, Inc.
  • 11. Marisa – BitVector Usage LOUDS Level-Order Unary Degree Sequence Terminal flags A node is terminal (“1”) or not (“0”). Link flags A node has a link to its multi-byte label (“1”) or has a built-in single-byte label (“0”). 11 30 March 2013 Brazil, Inc.
  • 12. Marisa – BitVector Usage LOUDS BitVector::get(), select() Terminal flags BitVector::get(), rank(), select() Link flags BitVector::get(), rank() 12 30 March 2013 Brazil, Inc.
  • 13. How to implement Rank/Select operations. Implementations 13 30 March 2013 Brazil, Inc.
  • 14. Rank Dictionary Index structures r_idx[x].abs = rank(512・x) x = 0, 1, 2, … r_idx[x].rel[y] = rank(512・x + 64・y) – rank(512・x) Y = 1, 2, 3, … , 7 Calculation abs + rel + popcnt() 14 30 March 2013 Brazil, Inc.
  • 15. Rank Operations Time complexity = O(1) 512 512 512 512 512 r_idx.abs 64 64 64 64 64 64 64 64 r_idx.rel 64 popcnt() 15 30 March 2013 Brazil, Inc.
  • 16. Select Dictionary Index structure s_idx[x] = select(512・x) i = 0, 1, 2, … Calculation Limit the range by using s_idx. Limit the range by using r_idx[x].abs. Limit the range by using r_idx[x].rel[y]. Find the i-th “1” in the range. 16 30 March 2013 Brazil, Inc.
  • 17. Select Operations s_idx s_idx 512 512 512 512 512 512 512 r_idx.abs r_idx.abs 64 64 64 64 64 64 64 64 r_idx.rel r_idx.rel 64 Final round 17 30 March 2013 Brazil, Inc.
  • 18. Select Final Round Binary search & table lookup Three-level branches if if if if if if if 8 8 8 8 8 8 8 8 Table lookup 18 30 March 2013 Brazil, Inc.
  • 19. How to remove the branches in the final round. Improvements 19 30 March 2013 Brazil, Inc.
  • 20. Original // x is the final 64-bit block (uint64_t). x = x – ((x >> 1) & MASK_55); x = (x & MASK_33) + ((x >> 2) & MASK_33); x = (x + (x >> 4)) & MASK_0F; x *= MASK_01; // Tricky popcount if (i < ((x >> 24) & 0xFF)) { // The first-level branch if (i < ((x >> 8) & 0xFF)) { // The second-level branch if (i < (x & 0xFF)) { // The third-level branch // The first byte contains the i-th “1”. } else { // The second byte contains the i-th “1”. 20 30 March 2013 Brazil, Inc.
  • 21. Tips – Tricky PopCount 0 1 1 1 0 0 1 0 x = x – ((x >> 1) & MASK_55); 1 2 0 1 x = (x & MASK_33) + ((x >> 2) & MASK_33); 3 1 x = (x + (x >> 4)) & MASK_0F; 4 21 30 March 2013 Brazil, Inc.
  • 22. Tips – Tricky PopCount // MASK_01 = 0x0101010101010101ULL; // x = x | (x << 8) | (x << 16) | (x << 24) | …; x *= MASK_01; 4 1 3 5 2 6 3 4 28 23 15 7 24 20 13 4 22 30 March 2013 Brazil, Inc.
  • 23. + SSE2 (After PopCount) // y[0 … 7] = i + 1; __m128i y = _mm_cvtsi64_si128((i + 1) * MASK_01); __m128i z = _mm_cvtsi64_si128(x); // Compare the 16 8-bit signed integers in y and z. // y[k] = (y[k] > z[k]) ? 0xFF : 0x00; y = _mm_cmpgt_epi8(y, z); // PCMPGTB // The j-th byte contains the i-th “1”. // TABLE is a 128-byte pre-computed table. uint8_t j = TABLE[_mm_movemask_epi8(y)]; 23 30 March 2013 Brazil, Inc.
  • 24. Tips – PCMPGTB y = _mm_cvtsi64_si128((i + 1) * MASK_01); 20 20 20 20 20 20 20 20 z = _mm_cvtsi64_si128(x); 28 24 23 20 15 13 7 4 // y[k] = (y[k] > z[k]) ? 0xFF : 0x00; y = _mm_cmpgt_epi8(y, z); 0x00 0x00 0x00 0x00 0xFF 0xFF 0xFF 0xFF 24 30 March 2013 Brazil, Inc.
  • 25. + Tricks (After Comparison) uint64_t j = _mm_cvtsi128_si64(y); // Calculation without TABLE j = ((j & MASK_01) * MASK_01) >> 56; // Calculation with BSR j = (63 – __builtin_clzll(j + 1)) / 8; // Calculation with popcnt (SSE4.2 or SSE4a) j = __builtin_popcountll(j) / 8; 25 30 March 2013 Brazil, Inc.
  • 26. – SSE2 (Simple and Fast) // x is the final 64-bit block (uint64_t). x = x – ((x >> 1) & MASK_55); x = (x & MASK_33) + ((x >> 2) & MASK_33); x = (x + (x >> 4)) & MASK_0F; x *= MASK_01; // Tricky popcount uint64_t y = (i + 1) * MASK_01; uint64_t z = x | MASK_80; // Compare the 8 7-bit unsigned integers in y and z. z = (z – y) & MASK_80; uint8_t j = __builtin_ctzll(z) / 8; 26 30 March 2013 Brazil, Inc.
  • 27. Tips – Comparison uint64_t y = (i + 1) * MASK_01; 0x14 0x14 0x14 0x14 0x14 0x14 0x14 0x14 uint64_t z = x | MASK_80; 0x9C 0x98 0x97 0x94 0x8F 0x8D 0x87 0x84 // Compare the 8 7-bit unsigned integers in y and z. z = (z – y) & MASK_80; 0x80 0x80 0x80 0x80 0x00 0x00 0x00 0x00 27 30 March 2013 Brazil, Inc.
  • 28. + SSSE3 (For PopCount) // Get lower nibbles and upper nibbles of x. __m128i lower = _mm_cvtsi64_si128(x & MASK_0F); __m128i upper = _mm_cvtsi64_si128(x & MASK_F0); upper = _mm_srli_epi32(upper, 4); // Use PSHUFB for counting “1”s in each nibble. __m128i table = _mm_set_epi8(4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, 2, 1, 1, 0); lower = _mm_shuffle_epi8(table, lower); upper = _mm_shuffle_epi8(table, upper); // Merge the counts to get the number of “1”s in each byte. x = _mm_cvtsi128_si64(_mm_add_epi8(lower, upper)); x *= MASK_01; 28 30 March 2013 Brazil, Inc.
  • 29. Tips – PSHUFB lower = _mm_cvtsi64_si128(x & MASK_0F); 12 8 7 4 15 13 7 4 table = _mm_set_epi8(4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, …); 4 3 3 2 3 2 2 1 3 2 2 1 2 1 1 0 // Perform a parallel 16-way lookup. lower = _mm_shuffle_epi8(table, lower); 2 1 3 1 4 3 3 1 29 30 March 2013 Brazil, Inc.
  • 30. How effective the improvements are. Evaluation 30 30 March 2013 Brazil, Inc.
  • 31. Environment OS Mac OSX 10.8.3 (64-bit) CPU Core i7 3720QM – Ivy Bridge 2.6GHz – up to 3.6GHz Compiler Apple LLVM version 4.2 (clang-425.0.24) (based on LLVM 3.2svn) 31 30 March 2013 Brazil, Inc.
  • 32. Data Source Japanese Wikipedia page titles gzip –cd jawiki-20130328-all-titles-in- ns0.gz | LC_ALL=C sort –R > data Details Number of keys: 1,367,750 Average length: 21.14 bytes Total length: 28,919,893 bytes 32 30 March 2013 Brazil, Inc.
  • 33. Binaries marisa 0.2.1 ./configure CXX=clang++ --enable-popcnt make tools/marisa-benchmark < data marisa 0.2.2 ./configure CXX=clang++ --enable-sse4 make tools/marisa-benchmark < data 33 30 March 2013 Brazil, Inc.
  • 34. Results – marisa 0.2.1 Without improvements #Tries Size Build Lookup Reverse Prefix Predict [KB] [Kqps] [Kqps] [Kqps] [Kqps] [Kqps] 1 11,811 724 1,105 1,223 1,038 711 2 8,639 632 790 877 753 453 3 8,001 621 750 816 708 406 4 7,788 591 723 791 687 391 5 7,701 590 712 781 680 384 Baseline 34 30 March 2013 Brazil, Inc.
  • 35. Results – marisa 0.2.2 With improvements #Tries Size Build Lookup Reverse Prefix Predict [KB] [Kqps] [Kqps] [Kqps] [Kqps] [Kqps] 1 11,811 757 1,198 1,359 1,115 772 2 8,639 657 873 1,000 820 503 3 8,001 621 817 924 770 453 4 7,788 613 797 900 752 438 5 7,701 610 787 884 737 427 Same size Faster operations 35 30 March 2013 Brazil, Inc.
  • 36. Results – Improvements Improvement ratios #Tries Size Build Lookup Reverse Prefix Predict [%] [%] [%] [%] [%] [%] 1 0.00 +4.56 +8.42 +11.12 +7.42 +8.58 2 0.00 +3.96 +10.52 +14.03 +8.90 +11.04 3 0.00 0.00 +8.93 +13.24 +8.76 +11.58 4 0.00 +3.72 +10.24 +13.78 +9.46 +12.02 5 0.00 +3.39 +10.53 +13.19 +8.38 +11.20 Same size Faster operations 36 30 March 2013 Brazil, Inc.
  • 37. Conclusion “Any sufficiently advanced technology is indistinguishable from magic.” “Any sufficiently advanced technique is indistinguishable from magic.” “You are magician.” 37 30 March 2013 Brazil, Inc.