Advanced querying

•Als KEY, PDF herunterladen•

2 gefällt mir•907 views

strmpnk

Crea

Technologie Business

Advanced Querying
Brian Mitchell (strmpnk)

Query

finding the right information
scanning and processing data

Query

finding the right information
scanning and processing data
traversing data structures

Everything ends up in some sort of data structure.

B-tree
B-tree B-tree
B-tree B-tree B-tree B-tree B-tree B-tree B-tree

shallow, append only, compressed, awesome

I/O
all of your data structures are limited by the medium

Throughput (MB/s) Latency (microseconds)

3000

2250

1500

750

0
HDD SSD RAM

Obviously RAM is good.
Cheap too.
Not unlimited.

all your data

working set

Keep it in RAM

"Working Set"

• recently accessed documents

• replicating documents

• compaction files

• index files

Controlling Working Set Size

• smaller documents

• short object keys, less repetition

• smaller databases

• increases locality and minimizes compaction overhead

• fewer or smaller views

• multi-purpose

• avoid repeating document data

Primary Index
Your first line of defense against bloat

Function of a Primary Index
In Couchbase

Key Doc

One File
Always Fresh, No Extra Cleaning

Secondary Index
aka. View
• Projects a new sequence

• Custom mapped values

• M-N

• Links back to source document

View Techniques
• Join by collation

• Page by key

• Foreign includes

• Cheap aggregates

• Flexible grouping

Join By Collation
Contact A Contact B Note for A Note for B Note for A

Join By Collation
Contact A Contact B Note for A Note for B Note for A

Emit

A B A-note B-note A-note

Join By Collation
Contact A Contact B Note for A Note for B Note for A

Emit

A A-note A-note B B-note

Page By Key
limit=2

A B C D E
limit=2&start_key=Bufff0

Foreign Includes

A B
Reference

_id=A _id=B

Cheap Aggregates
• It pays to know your data well

• Reduce values are stored inline with the view
b-tree

• Small values take very little space

• Nice built-in reduce functions

• Not just for user visible data

Flexible Grouping
2008-10-02 2008-08-17 2009-02-12

Emit

[2008,10] [2008, 8] [2009, 2]

20% 20% 20% 20%
10% 10% 10% 10%
70% 70% 70% 70%

20% 20% 20% 20%

10% 10% 10% 10%
70% 70% 70% 70%

20% 20% 20% 20%
10% 10% 10% 10%
70% 70% 70% 70%

Manual Indexing
• Store an index as a document

• Good properties for mostly static indexing

• Cluster friendly

• Create custom constrains (uniqueness)

• Snapshot of a slow query for speed

GeoCouch
• R-tree based

• First-class Erlang

• improved with view engine refactor

• Can be abused for multi-dimensional queries

• more than just geo-data

CouchDB Lucene

• Based on CouchDB Externals

• Limited to Couchbase Single Server

• Faceted queries

• Full-text indexing

Hybrid
• Application managed

• Allow stand alone service to work with
Couchbase cluster

• i.e. Solr, Redis, PostgreSQL

• Complex concurrency

• More moving parts

Weitere ähnliche Inhalte

Andere mochten auch

How and why governments should use OpenStreetMap - Pete Lancaster - State of ...OSMFstateofthemap

How releasing faster changes testingDr. Alexander Schwartz

Student Mentoring Programs: The Why's, How's, and MoreCindy Pao

An Introduction to Multisite - WordCamp Phoenixvegasgeek

Yippee-IA: All you need to know about Information Architecture in 5 minutesChris How

GitHub for the Rest of UsMorten Rand-Hendriksen

HTTP 2Pedro Araujo

Beyond php - it's not (just) about the codeWim Godden

Make your web apps "Go, Go" like Power RangersKarolina Szczur

Martijn van Exel - Collaborate to compete: Regain your Competitive Edge with osmOSMFstateofthemap

TypeScript kata: The TDD StyleRonnie Hegelund

Marketing Your Tech Talent - OSCON 2014deirdrestraughan

Lean Agile Adoption Enterprise Challenges - XP 2012Fabio Armani

Is having no limits a limitation [distilled version]Ben Brignell

Vetting Plugins : WordCamp Columbus 2015Jessica C. Gardner

AfriGadget @ Webmontag Frankfurt, June 6, 2011Juergen Eichholz

Engaging students in publishing on the internet early in their careersUS Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure

Benchmarking APIs - LNUG February 2014Matteo Figus

Presenting the work of OSMF Working Groups - State of the Map 2013OSMFstateofthemap

Web Frontend development: tools and good practices to (re)organize the chaosMatteo Papadopoulos

Andere mochten auch (20)

How and why governments should use OpenStreetMap - Pete Lancaster - State of ...

How releasing faster changes testing

Student Mentoring Programs: The Why's, How's, and More

An Introduction to Multisite - WordCamp Phoenix

Yippee-IA: All you need to know about Information Architecture in 5 minutes

GitHub for the Rest of Us

HTTP 2

Beyond php - it's not (just) about the code

Make your web apps "Go, Go" like Power Rangers

Martijn van Exel - Collaborate to compete: Regain your Competitive Edge with osm

TypeScript kata: The TDD Style

Marketing Your Tech Talent - OSCON 2014

Lean Agile Adoption Enterprise Challenges - XP 2012

Is having no limits a limitation [distilled version]

Vetting Plugins : WordCamp Columbus 2015

AfriGadget @ Webmontag Frankfurt, June 6, 2011

Engaging students in publishing on the internet early in their careers

Benchmarking APIs - LNUG February 2014

Presenting the work of OSMF Working Groups - State of the Map 2013

Web Frontend development: tools and good practices to (re)organize the chaos

Ähnlich wie Advanced querying

MongoDB: What, why, whenEugenio Minardi

Inside Wordnik's ArchitectureTony Tam

6910 week 3 - web metircs and toolsSeth Garske

Biug 20112026 dimensional modeling and mdx best practicesItay Braun

No sql Databasemymail2ashok

Introducing DynamoDBAmazon Web Services

An Elastic Metadata Store for eBay’s Media PlatformMongoDB

Running MongoDB in the CloudTony Tam

MongoDB in FSMongoDB

TechEd AU 2014: Microsoft Azure DocumentDB Deep DiveIntergen

Python - A Comprehensive Programming LanguageTsungWei Hu

Austin Scales- Clickstream Analytics at Bazaarvoicebazaarvoice_engineering

Scaling PostgreSQL with SkytoolsGavin Roy

Introduction to Azure DocumentDBRadenko Zec

Sizing Your MongoDB ClusterMongoDB

Microsoft's Big Play for Big DataAndrew Brust

Mongodb - Scaling write performanceDaum DNA

IT Press Tour #17 - OpenIO & TechnologyOpenIO Object Storage

Building a High Performance Analytics PlatformSantanu Dey

Bio bigdata Mk Kim

Ähnlich wie Advanced querying (20)

MongoDB: What, why, when

Inside Wordnik's Architecture

6910 week 3 - web metircs and tools

Biug 20112026 dimensional modeling and mdx best practices

No sql Database

Introducing DynamoDB

An Elastic Metadata Store for eBay’s Media Platform

Running MongoDB in the Cloud

MongoDB in FS

TechEd AU 2014: Microsoft Azure DocumentDB Deep Dive

Python - A Comprehensive Programming Language

Austin Scales- Clickstream Analytics at Bazaarvoice

Scaling PostgreSQL with Skytools

Introduction to Azure DocumentDB

Sizing Your MongoDB Cluster

Microsoft's Big Play for Big Data

Mongodb - Scaling write performance

IT Press Tour #17 - OpenIO & Technology

Building a High Performance Analytics Platform

Bio bigdata

Kürzlich hochgeladen

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Search Engine Optimization SEO PDF for 2024.pdfRankYa

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Kürzlich hochgeladen (20)

Human Factors of XR: Using Human Factors to Design XR Systems

DevoxxFR 2024 Reproducible Builds with Apache Maven

Powerpoint exploring the locations used in television show Time Clash

Unraveling Multimodality with Large Language Models.pdf

Dev Dives: Streamline document processing with UiPath Studio Web

Advanced Test Driven-Development @ php[tek] 2024

Vector Databases 101 - An introduction to the world of Vector Databases

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

WordPress Websites for Engineers: Elevate Your Brand

Search Engine Optimization SEO PDF for 2024.pdf

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

Gen AI in Business - Global Trends Report 2024.pdf

Scanning the Internet for External Cloud Exposures via SSL Certs

Connect Wave/ connectwave Pitch Deck Presentation

Ensuring Technical Readiness For Copilot in Microsoft 365

The Future of Software Development - Devin AI Innovative Approach.pdf

Streamlining Python Development: A Guide to a Modern Project Setup

Nell’iperspazio con Rocket: il Framework Web di Rust!

Advanced querying

1. Advanced Querying Brian Mitchell (strmpnk)

2. Query

3. Query finding the right information

4. Query finding the right information scanning and processing data

5. Query finding the right information scanning and processing data traversing data structures

6. Everything ends up in some sort of data structure.

7. B-tree B-tree B-tree B-tree B-tree B-tree B-tree B-tree B-tree B-tree shallow, append only, compressed, awesome

8. I/O

9. I/O all of your data structures are limited by the medium Throughput (MB/s) Latency (microseconds) 3000 2250 1500 750 0 HDD SSD RAM

10. Obviously RAM is good. Cheap too. Not unlimited.

11. Not unlimited.

12. all your data working set

13. all your data working set Keep it in RAM

14. "Working Set" • recently accessed documents • replicating documents • compaction files • index files

15. Controlling Working Set Size • smaller documents • short object keys, less repetition • smaller databases • increases locality and minimizes compaction overhead • fewer or smaller views • multi-purpose • avoid repeating document data

16. Primary Index Your first line of defense against bloat

17. Function of an Index Key Value

18. Function of a Primary Index In Couchbase Key Doc

19. Uniqueness A B C

20. Uniqueness A B C B

21. Uniqueness Semantic Keying A B C B

22. One File Always Fresh, No Extra Cleaning

23. Secondary Index aka. View • Projects a new sequence • Custom mapped values • M-N • Links back to source document

24. View Techniques • Join by collation • Page by key • Foreign includes • Cheap aggregates • Flexible grouping

25. Join By Collation Contact A Contact B Note for A Note for B Note for A

26. Join By Collation Contact A Contact B Note for A Note for B Note for A Emit A B A-note B-note A-note

27. Join By Collation Contact A Contact B Note for A Note for B Note for A Emit A A-note A-note B B-note

28. Page By Key A B C D E

29. Page By Key limit=2 A B C D E

30. Page By Key limit=2 A B C D E limit=2&start_key=Bufff0

31. Foreign Includes A B Emit a a

32. Foreign Includes A B Reference _id=A _id=B

33. Cheap Aggregates • It pays to know your data well • Reduce values are stored inline with the view b-tree • Small values take very little space • Nice built-in reduce functions • Not just for user visible data

34. Flexible Grouping 2008-10-02 2008-08-17 2009-02-12 Emit [2008,10] [2008, 8] [2009, 2]

35. Flexible Grouping 2008-10-02 2008-08-17 2009-02-12 Emit [2008,10] [2008, 8] [2009, 2]

36. Flexible Grouping 2008-10-02 2008-08-17 2009-02-12 Emit [2008,10] [2008, 8] [2009, 2]

37. Traditional CouchDB

38. 20% 10% 70%

39. 20% 20% 20% 20% 10% 10% 10% 10% 70% 70% 70% 70% 20% 20% 20% 20% 10% 10% 10% 10% 70% 70% 70% 70% 20% 20% 20% 20% 10% 10% 10% 10% 70% 70% 70% 70%

40. Clustering

41. Single Key

42. Single Key

43. Single Key

44. Single Key

45. Query

46. Query

47. Query

48. Query

49. Alternatives

50. Manual Indexing • Store an index as a document • Good properties for mostly static indexing • Cluster friendly • Create custom constrains (uniqueness) • Snapshot of a slow query for speed

51. GeoCouch • R-tree based • First-class Erlang • improved with view engine refactor • Can be abused for multi-dimensional queries • more than just geo-data

52. CouchDB Lucene • Based on CouchDB Externals • Limited to Couchbase Single Server • Faceted queries • Full-text indexing

53. Hybrid • Application managed • Allow stand alone service to work with Couchbase cluster • i.e. Solr, Redis, PostgreSQL • Complex concurrency • More moving parts

54. Fin twitter: @strmpnk email: b@p2p.io

Hinweis der Redaktion

This presentation shares some tips on how I've gotten CouchDB to perform well for me in the past as well as things to looks forward to in the future.\n\nAdvanced is kind of a distraction. CouchDB is simple so what you see here shouldn't be that different from basic queries.\n
Queries always end up being about data. All of our data is inside special purpose data structures. Our control of the query depends on understanding and controlling these structures.\n
Queries always end up being about data. All of our data is inside special purpose data structures. Our control of the query depends on understanding and controlling these structures.\n
Queries always end up being about data. All of our data is inside special purpose data structures. Our control of the query depends on understanding and controlling these structures.\n
Everything. Even when it's calculated live, in memory. Not all of these are created equal however. Fortunately CouchDB keeps it simple and presents one general structure for most use cases.\n
I won't cover B-trees in depth here. Wikipedia is a good start if you're wondering. Keep in mind that CouchDB has a specific incarnation that gives us some special properties.\n
Cornerstone to all databases, I/O will decide if your ideas fly or fail. Feeding your intense, networked, interactive software of today requires a serious study of I/O characteristics.\n
Throughput and latency tend to be the measurements of choice. Notice how big of a jump RAM is. Imagine how many CPU cycles o e HDD seek is.\n
So let's keep RAM in mind. Couchbase does make good use of RAM in their clustered product for documents but it's not available for queries.\n
Usually enough but this should actually be measured. How, well, let's look at what I call a "working set".\n
All of your data might exist somewhere on disk. That doesn't mean it can't have those disk pages cached in RAM. Keep it there. Try to keep data clustered on disk so you have better page cache and buffer cache efficiency.\n
All of your data might exist somewhere on disk. That doesn't mean it can't have those disk pages cached in RAM. Keep it there. Try to keep data clustered on disk so you have better page cache and buffer cache efficiency.\n
What a working set is.\n
Controlling the working set by tuning your database design. This talk will focus on views for queries but all of these point matter. Measure because it better add up or your performance will be painfully slow.\n
I always like to start talking about indexing by declaring that it's already there. We already have an automatic index. I call this the primary index, but that's just me.\n
Key-value anyone? How do we make key based access fast. How do we accelerate random access vs sequential access. It's all about data layout. It equates to an index.\n
Key-value applies to CouchDB.\n
A nice property of this key index is that it provides a method of uniques. I hear this question all the time. "How do I constrain fields of a document to a unique value?" Short answer is _id.\n
This leads beautifully to revision based concurrency. Semantic keying is a good idea, even if it's not in you primary index, but why wait to build a view?\n
Finally, my favorite part of the primary document tree is that it's just one file. No duplication of information, do your overhead is nice and small. It's always fresh too, unlike views.\n
\n\n
These are just a few ideas I've made up names for.\n
\n\n
\n\n
\n\n
\n\n
\n\n
\n\n
It's pretty obvious how this key design helps turn joins into a range query.\n
\n\n
\n\n
\n\n
\n\n
\n\n
_rev can also be passed, but be careful as revisions can be pruned during compaction.\n
They don't cost much so it pays to have default reduce functions. It's all about knowing your data better.\n
\n\n
\n\n
\n\n
When you have one big database, you pay all costs all at once. Compaction costs, for example, can be huge.\n
When you have many smaller databases, costs can be paid for incrementally. Compaction will take much less overhead for example.\n
\n\n
Key access is fast. Simple.\n
Key access is fast. Simple.\n
Key access is fast. Simple.\n
\n\n
\n\n
\n\n
Merging queries means you might have cases with partial results.\n
\n\n
It's still an option, especially if you need certain performance on a cluster.\n
Available as part of Couchbase Single/Mobile.\n
CouchDB, Couchbase Single only.\n
Good way to extend an existing cluster. Up to the application layer.\n
\n\n

Advanced querying

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Advanced querying

Ähnlich wie Advanced querying (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Advanced querying

Hinweis der Redaktion