Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Fast indexes with roaring #gomtl-10

190 Aufrufe

Veröffentlicht am

Presentation on Roaring bitmaps for the Go Montreal meetup (Go 10th anniversary).

Roaring bitmaps are a standard indexing data structure. They are
widely used in search and database engines. For example, Lucene, the
search engine powering Wikipedia relies on Roaring. The Go library
roaring implements Roaring bitmaps in Go. It is used in several
popular systems such as InfluxDB, Pilosa and Bleve. This library is
used in production in several systems, it is part of the Awesome Go
collection. After presenting the library, we will cover some advanced
Go topics such as the use of assembly language, unsafe mappings, and
so forth.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Fast indexes with roaring #gomtl-10

  1. 1. Fastindexeswithroaring DanielLemireandcollaborators blog:https://lemire.me twitter:@lemire UniversitéduQuébec(TÉLUQ) Montreal Fastindexeswithroaring-DanielLemire#gomtl-10November19th.
  2. 2. TheroaringGolibraryisusedby CloudTorrent runv InfluxDB Pilosa Bleve lindb Elasticell SourceGraph M3 trident PartoftheAwesomeGocollection. Fastindexeswithroaring-DanielLemire#gomtl-10November19th.
  3. 3. Sets Afundamentalconcept(setsofdocuments,identifiers,tuples...) Forperformance,weoftenworkwithsetsofintegers(identifiers).→ Fastindexeswithroaring-DanielLemire#gomtl-10November19th.
  4. 4. tests: ? intersections: ,unions: ,differences: Similarity(Jaccard/Tanimoto): Iteration x ∈ S S ∩2 S1 S ∪2 S1 S ∖2 S1 ∣S ∩1 S ∣/∣S ∪1 1 S ∣2 Fastindexeswithroaring-DanielLemire#gomtl-10November19th.
  5. 5. Howtoimplementsets? hashtables( map[int]bool ) bitmap:willf/bitset compressedbitmaps Fastindexeswithroaring-DanielLemire#gomtl-10November19th.
  6. 6. Hashtables bitmap := map[int]bool{} for v := 0; v <= 1000000; v += 100 { bitmap[v] = true } Memoryusageperentry? Fastindexeswithroaring-DanielLemire#gomtl-10November19th.
  7. 7. About18bytes(notbits)perentry source Fastindexeswithroaring-DanielLemire#gomtl-10November19th.
  8. 8. in-orderaccessiskindofterrible [15,3,0,6,11,4,5,9,12,13,8,2,1 ,14,10,7] [15,3,0,6,11,4,5,9,12,13,8,2 ,1,14,10,7] [15,3 ,0,6,11,4,5,9,12,13,8,2,1,14,10,7] [15,3,0,6,11,4 ,5,9,12,13,8,2,1,14,10,7] [15,3,0,6,11,4,5 ,9,12,13,8,2,1,14,10,7] [15,3,0,6 ,11,4,5,9,12,13,8,2,1,14,10,7] Fastindexeswithroaring-DanielLemire#gomtl-10November19th.
  9. 9. Bitmaps Efficientwaytorepresentsetsofintegers. Forexample,0,1,3,4becomes 0b11011 or"27". 0b00001 0b01001 0b11001 0b11011 {0} → {0, 3} → {0, 3, 4} → {0, 1, 3, 4} → Fastindexeswithroaring-DanielLemire#gomtl-10November19th.
  10. 10. Manipulateabitmap 64-bitprocessor. Given x ,wordindexis x/64 andbitindex x % 64 . add(x) { array[x / 64] |= (1 << (x % 64)) } Fastindexeswithroaring-DanielLemire#gomtl-10November19th.
  11. 11. Howfastisit? index = x / 64 -> a shift mask = 1 << ( x % 64) -> a shift array[ index ] |- mask -> a OR with memory Onebitevery cyclesbecauseofsuperscalarity≈ 1.65 Fastindexeswithroaring-DanielLemire#gomtl-10November19th.
  12. 12. Bitparallelism Intersectionbetween{0,1,3}and{1,3} asingleANDoperation between 0b1011 and 0b1010 . Resultis 0b1010 or{1,3}. Nobranching! Fastindexeswithroaring-DanielLemire#gomtl-10November19th.
  13. 13. Bitsetscantaketoomuchmemory {1,32000,64000}:1000bytesforthreevalues Weusecompression! Fastindexeswithroaring-DanielLemire#gomtl-10November19th.
  14. 14. Git(GitHub)useEWAH Run-lengthencoding Example: est Codelongrunsof0sor1sefficiently. https://github.com/git/git/blob/master/ewah/bitmap.c 000000001111111100 00000000 − 11111111 − 00 Fastindexeswithroaring-DanielLemire#gomtl-10November19th.
  15. 15. Complexity Intersection: or In-placeunion( ): or O(∣S ∣ +1 ∣S ∣)2 O(min(∣S ∣, ∣S ∣))1 2 S ←2 S ∪1 S2 O(∣S ∣ +1 ∣S ∣)2 O(∣S ∣)2 Fastindexeswithroaring-DanielLemire#gomtl-10November19th.
  16. 16. RoaringBitmaps Java,C,Go(interoperable) Roaringbitmaps
  17. 17. for v := 0; v <= 1000000; v += 100 { add v to set } bytes/value map[int]bool 18bytes willf.BitSet 22bytes roaring 2bytes Roaringbitmaps
  18. 18. Roaringbitmaps(http://roaringbitmap.org/)arefoundin: ApacheLuceneandderivativesystemssuchasSolrandElasticsearch, ApacheDruid, ApacheSpark, ApacheHive, YandexClickHouse, NetflixAtlas, LinkedInPinot, Whoosh, MicrosoftVisualStudioTeamServices(VSTS), Intel'sOptimizedAnalyticsPackage(OAP), eBay'sApacheKylin, Roaringbitmaps
  19. 19. anddozensmore! Roaringbitmaps
  20. 20. Hybridmodel Setofcontainers sortedarrays({1,20,144}) bitset(0b10000101011) runs([0,10],[15,20]) Roaringbitmaps
  21. 21. Roaringbitmaps
  22. 22. Seehttps://github.com/RoaringBitmap/RoaringFormatSpec Roaringbitmaps
  23. 23. Roaring Allcontainersaresmall(8kB),fitinCPUcache Wepredicttheoutputcontainertypeduringcomputations E.g.,whenarraygetstoolarge,weswitchtoabitset Unionoftwolargearraysismaterializedasabitset... Roaringbitmaps
  24. 24. UseRoaringforbitmapcompressionwheneverpossible.Donotuseotherbitmap compressionmethods(Wangetal.,SIGMOD2017) Roaringbitmaps
  25. 25. Goissues Roaringbitmaps
  26. 26. Goisshyaboutinlining Won'tinlinesomesmallfunctionsthatcontainabranch? func (b *BitSet) Set(i uint) *BitSet { b.extendSetMaybe(i) b.set[i>>log2WordSize] |= 1 << (i & (wordSize - 1)) return b } https://lemire.me/blog/2017/09/05/go-does-not-inline-functions-when-it-should/ Roaringbitmaps
  27. 27. Goguardstoomuch bits.OnesCount64(x) Roaringbitmaps
  28. 28. 0x1093534 0fb63d22810c00 MOVZX 0xc8122(IP), DI 0x109353b 4084ff TESTL DI, DI 0x109353e 7407 JE 0x1093547 0x1093540 f3480fb8f6 POPCNT SI, SI 0x1093545 ebd6 JMP 0x109351d 0x1093547 4889442418 MOVQ AX, 0x18(SP) 0x109354c 4889542410 MOVQ DX, 0x10(SP) 0x1093551 48894c2420 MOVQ CX, 0x20(SP) 0x1093556 48893424 MOVQ SI, 0(SP) 0x109355a e801ffffff CALL math/bits.OnesCount64(SB) 0x109355f 488b742408 MOVQ 0x8(SP), SI 0x1093564 488b442418 MOVQ 0x18(SP), AX 0x1093569 488b4c2420 MOVQ 0x20(SP), CX 0x109356e 488b542410 MOVQ 0x10(SP), DX 0x1093573 488b5c2440 MOVQ 0x40(SP), BX 0x1093578 eba3 JMP 0x109351d Roaringbitmaps
  29. 29. Castingasliceistricky func byteSliceAsUint16Slice(slice []byte) (result []uint16) { // here we create a new slice holder if len(slice)%2 != 0 { panic("Slice size should be divisible by 2") } // reference: https://go101.org/article/unsafe.html // make a new slice header bHeader := (*reflect.SliceHeader)(unsafe.Pointer(&slice)) rHeader := (*reflect.SliceHeader)(unsafe.Pointer(&result)) // transfer the data from the given slice to a new variable (our result) rHeader.Data = bHeader.Data rHeader.Len = bHeader.Len / 2 rHeader.Cap = bHeader.Cap / 2 // instantiate result and use KeepAlive so data isn't unmapped. runtime.KeepAlive(&slice) // it is still crucial, GC can free it) // return result return } Roaringbitmaps
  30. 30. Tolearnmore... Blog(twiceaweek):https://lemire.me/blog/ GitHub:https://github.com/lemire Homepage:https://lemire.me/en/ CRSNG:FasterCompressedIndexesOnNext-GenerationHardware(2017-2022) Twitter@lemire Roaringbitmaps

×