I have a good shard key now what - Advanced Sharding

•Als PPTX, PDF herunterladen•

1 gefällt mir•471 views

Are you're currently sharding your deployment or thinking about doing so. It will cover what to expect and what you should consider during the process, with references to basic sharding resources. It will also cover what to look for when running a sharded cluster. Finally, it will provide an overview of a new tool that makes understanding chunk sizes easy.

Daten & Analysen

I have a good shard key now what?
David Murphy , Mongo Master
Lead DBA, ObjectRocket
@dmurphy_data @objectrocket

Background
• 16 yrs in databases, development, & system engineering
• Lead DBA @ ObjectRocket
• Mongo Master with a focus on sharding, chunks, and
scaling mongo beyond normal means.

Quick Sharding Introduction
Why you might need to shard:
• Scaling Reads
• Fast Queries
• Scaling Writes
• Balancing nodes
What are things should you consider?
• Shard Key is Immutable
• Targeted vs Scatter-Gather
• Sharding Overhead
• Some commands no longer work
http://bit.ly/1oXYDfm - Kenny Gorman sharding talk
http://bit.ly/ZTtDI1 - Several other sharding talks

The balancer is great…
… but you can also perform some proactive checks to predict issues.
Understanding the reasons why not all chunks are equal
Counting Chunks per shard without sh.status()
Finding out how many chunkMove’s or splits occur at a time
Getting the size of all chunks for a collection

Ways to Count Chunks
Many functions for this:
• sh.status()
• db.collection.stats()
Simple query for counting chunks:
• db.chunks.count({ns:"database.collection"});
Why I prefer using Aggregation:
• Faster as you only look at one type of data
• Provides chunk count per shard
• Can programmatically use the output

Tracking Chunks and Moves
The config.changelog tracks chunk changes.
The types of changes to chunks are
• split
• moveChunk
• moveChunk.start
• moveChunk.from
• moveChunk.to
• moveChunk.commit
Chunk Collection Examples
http://bit.ly/112tXEx
ChangeLog Examples
http://bit.ly/1oZlyqN

Sizing Chunks
• dataSize command - Very expensive
Scans all documents to be 100% accurate
• find().count() * avgObjSize on chunk ranges
Rough size estimate based on stats
• ChunkHunter.py (On GitHub http://bit.ly/1yWQf9T )
New community script from ObjectRocket to:
a) copy chunks meta data to new collection
b) scan each chunk there with dataSize or count
c) output summary

Example ChunkHunter Output
Total Chunks in Cluster

Example ChunkHunter Output
Total Chunks Populated

Example ChunkHunter Output
Total Chunks processed

Example ChunkHunter Output
>250k Documents per chunk

Example ChunkHunter Output
>64M per Chunk

Some example queries might be
• Give me all Jumbo:true chunks
• Order Jumbo chunks by documents so I can split them
• Order Jumbo chunks by size so I can split them
• Aggregate Jumbo chunks to count them per shard
• Give me all Jumbo on Shard “X”
Find all these and more at:
http://bit.ly/1oYVGeJ

The Future…
Will be releasing tools to simplify:
• Manual Splits
• Moving of Chunks
These are not replacements for the balancer but ways to
help it stay on track if things deviate. As opposed to waiting
for an incident and fixing it in a panic.

Contact
@dmurphy_data
@objectrocket
david@objectrocket.com
https://www.objectrocket.com
WE ARE HIRING! (DBA,DEVOPS, and more)
https://www.objectrocket.com/careers
Resources:
ChunkHunter.py http://bit.ly/1yWQf9T
Chunk Hunter Example Scripts bit.ly/1oYVGeJ
Chunk Collection Queries http://bit.ly/112tXEx
Changelog Queries http://bit.ly/1oZlyqN
Presentations:
Kenny Gorman Sharding - bit.ly/1oXYDfm
Other Sharding Links - bit.ly/ZTtDI1

Empfohlen

Sharding Methods for MongoDBMongoDB

Шардинг в MongoDB, Henrik Ingo (MongoDB)Ontico

Mongo presentation confShridhar Joshi

Back to Basics 2017: Introduction to ShardingMongoDB

Cavity DataAlan Dean

Streaming kafka search utility for Mozilla's BagheeraVarunkumar Manohar

HBaseCon 2013: OpenTSDB at BoxCloudera, Inc.

Back to Basics Webinar 6: Production DeploymentMongoDB

Empfohlen

Sharding Methods for MongoDBMongoDB

Шардинг в MongoDB, Henrik Ingo (MongoDB)Ontico

Mongo presentation confShridhar Joshi

Back to Basics 2017: Introduction to ShardingMongoDB

Cavity DataAlan Dean

Streaming kafka search utility for Mozilla's BagheeraVarunkumar Manohar

HBaseCon 2013: OpenTSDB at BoxCloudera, Inc.

Back to Basics Webinar 6: Production DeploymentMongoDB

Mongodb beijingconf yottaa_3.3Yottaa

Drinking from the firehose: Logging at scale with ELKCteodorski

Geospatial dataMostafaAliAbbas

Info 2402 assignment 2_ crawlerShahriar Rafee

MongoSF - Spatial MongoDB in OpenShift - script fileSteven Pousty

Database novelty detectionMostafaAliAbbas

London MongoDB User Group April 2011Rainforest QA

Elastic meetup june16Miguel Bosin

mongoDB PerformanceMoshe Kaplan

Elasticsearch - Zero to HeroDaniel Ziv

AlluxioChristophe Marchal

RasterFrames + STACSimeon Fitch

InfluxDb and Grafana fighting with dataIvan Vaskevych

MongoDB for AnalyticsMongoDB

China micro grid technology progress and prospects forecast report, 2013-2018Qianzhan Intelligence

China dredging engineering industry development prospect and investment strat...Qianzhan Intelligence

ProTravelPlus - Direct sponsor Tim Sebertdavorka38

China excavator manufacturing industry production & marketing demand and inve...Qianzhan Intelligence

China methanol industry market research and investment forecast reportQianzhan Intelligence

China midwestern cement industry production and marketing demand and investme...Qianzhan Intelligence

China high end equipment manufacturing park development pattern and investmen...Qianzhan Intelligence

QuizzatrixRajat Bhattacharjya

Weitere ähnliche Inhalte

Was ist angesagt?

Mongodb beijingconf yottaa_3.3Yottaa

Drinking from the firehose: Logging at scale with ELKCteodorski

Geospatial dataMostafaAliAbbas

Info 2402 assignment 2_ crawlerShahriar Rafee

MongoSF - Spatial MongoDB in OpenShift - script fileSteven Pousty

Database novelty detectionMostafaAliAbbas

London MongoDB User Group April 2011Rainforest QA

Elastic meetup june16Miguel Bosin

mongoDB PerformanceMoshe Kaplan

Elasticsearch - Zero to HeroDaniel Ziv

AlluxioChristophe Marchal

RasterFrames + STACSimeon Fitch

InfluxDb and Grafana fighting with dataIvan Vaskevych

MongoDB for AnalyticsMongoDB

Was ist angesagt? (14)

Mongodb beijingconf yottaa_3.3

Drinking from the firehose: Logging at scale with ELK

Geospatial data

Info 2402 assignment 2_ crawler

MongoSF - Spatial MongoDB in OpenShift - script file

Database novelty detection

London MongoDB User Group April 2011

Elastic meetup june16

mongoDB Performance

Elasticsearch - Zero to Hero

Alluxio

RasterFrames + STAC

InfluxDb and Grafana fighting with data

MongoDB for Analytics

Andere mochten auch

China micro grid technology progress and prospects forecast report, 2013-2018Qianzhan Intelligence

China dredging engineering industry development prospect and investment strat...Qianzhan Intelligence

ProTravelPlus - Direct sponsor Tim Sebertdavorka38

China excavator manufacturing industry production & marketing demand and inve...Qianzhan Intelligence

China methanol industry market research and investment forecast reportQianzhan Intelligence

China midwestern cement industry production and marketing demand and investme...Qianzhan Intelligence

China high end equipment manufacturing park development pattern and investmen...Qianzhan Intelligence

QuizzatrixRajat Bhattacharjya

China auto parts and components manufacturing industry in depth market resear...Qianzhan Intelligence

Prototyping With WordPress: No Coding Requiredfreshlybakedpixels

Epoch finalKateryna Filonenko

China coal industry development trend and investment strategic decision repor...Qianzhan Intelligence

China fluorine chemical industry indepth research and investment strategic pl...Qianzhan Intelligence

China construction quality testing industry market forecast and competition s...Qianzhan Intelligence

Andere mochten auch (14)

China micro grid technology progress and prospects forecast report, 2013-2018

China dredging engineering industry development prospect and investment strat...

ProTravelPlus - Direct sponsor Tim Sebert

China excavator manufacturing industry production & marketing demand and inve...

China methanol industry market research and investment forecast report

China midwestern cement industry production and marketing demand and investme...

China high end equipment manufacturing park development pattern and investmen...

Quizzatrix

China auto parts and components manufacturing industry in depth market resear...

Prototyping With WordPress: No Coding Required

Epoch final

China coal industry development trend and investment strategic decision repor...

China fluorine chemical industry indepth research and investment strategic pl...

China construction quality testing industry market forecast and competition s...

Ähnlich wie I have a good shard key now what - Advanced Sharding

Lightning Talk: MongoDB ShardingMongoDB

Probabilistic Data Structures (Edmonton Data Science Meetup, March 2018)Kyle Davis

[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León

MongoDB Pros and Consjohnrjenson

Big Data vs Data WarehousingThomas Kejser

Lightning Talk: What You Need to Know Before You Shard in 20 MinutesMongoDB

Monitoring your Python with Prometheus (Python Ireland April 2015)Brian Brazil

5 Pitfalls to Avoid with MongoDBTim Callaghan

Managing your black friday logs Voxxed LuxembourgDavid Pilato

Introduction to MongoDBSean Laurent

Sharding why,what,when, howDavid Murphy

2011 mongo sf-scalingMongoDB

MongoDB Tokyo - Monitoring and QueueingBoxed Ice

Silicon Valley Code Camp 2016 - MongoDB in productionDaniel Coupal

Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion

Python VS GOOfir Nir

MongoDB Basics UnileonJuan Antonio Roy Couto

Approximate "Now" is Better Than Accurate "Later"NUS-ISS

DEFCON 23 - Jason Haddix - how do i shot webFelipe Prado

Design for Scale / Surge 2010Christopher Brown

Ähnlich wie I have a good shard key now what - Advanced Sharding (20)

Lightning Talk: MongoDB Sharding

Probabilistic Data Structures (Edmonton Data Science Meetup, March 2018)

[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary

MongoDB Pros and Cons

Big Data vs Data Warehousing

Lightning Talk: What You Need to Know Before You Shard in 20 Minutes

Monitoring your Python with Prometheus (Python Ireland April 2015)

5 Pitfalls to Avoid with MongoDB

Managing your black friday logs Voxxed Luxembourg

Introduction to MongoDB

Sharding why,what,when, how

2011 mongo sf-scaling

MongoDB Tokyo - Monitoring and Queueing

Silicon Valley Code Camp 2016 - MongoDB in production

Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018

Python VS GO

MongoDB Basics Unileon

Approximate "Now" is Better Than Accurate "Later"

DEFCON 23 - Jason Haddix - how do i shot web

Design for Scale / Surge 2010

Kürzlich hochgeladen

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083

Halmar dropshipping via API with DroFxolyaivanovalion

Introduction-to-Machine-Learning (1).pptxfirstjob4

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Capstone Project on IBM Data Analytics ProgramMoniSankarHazra

April 2024 - Crypto Market Report's Analysismanisha194592

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

Carero dropshipping via API with DroFx.pptxolyaivanovalion

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45

Midocean dropshipping via API with DroFxolyaivanovalion

Kürzlich hochgeladen (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure

Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...

Halmar dropshipping via API with DroFx

Introduction-to-Machine-Learning (1).pptx

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Capstone Project on IBM Data Analytics Program

April 2024 - Crypto Market Report's Analysis

VidaXL dropshipping via API with DroFx.pptx

Carero dropshipping via API with DroFx.pptx

Schema on read is obsolete. Welcome metaprogramming..pdf

100-Concepts-of-AI by Anupama Kate .pptx

CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

Log Analysis using OSSEC sasoasasasas.pptx

Determinants of health, dimensions of health, positive health and spectrum of...

Midocean dropshipping via API with DroFx

I have a good shard key now what - Advanced Sharding

1. I have a good shard key now what? David Murphy , Mongo Master Lead DBA, ObjectRocket @dmurphy_data @objectrocket

2. Background • 16 yrs in databases, development, & system engineering • Lead DBA @ ObjectRocket • Mongo Master with a focus on sharding, chunks, and scaling mongo beyond normal means.

3. Quick Sharding Introduction Why you might need to shard: • Scaling Reads • Fast Queries • Scaling Writes • Balancing nodes What are things should you consider? • Shard Key is Immutable • Targeted vs Scatter-Gather • Sharding Overhead • Some commands no longer work http://bit.ly/1oXYDfm - Kenny Gorman sharding talk http://bit.ly/ZTtDI1 - Several other sharding talks

4. What does a shared cluster look like?

5. The balancer is great… … but you can also perform some proactive checks to predict issues. Understanding the reasons why not all chunks are equal Counting Chunks per shard without sh.status() Finding out how many chunkMove’s or splits occur at a time Getting the size of all chunks for a collection

6. Ways to Count Chunks Many functions for this: • sh.status() • db.collection.stats() Simple query for counting chunks: • db.chunks.count({ns:"database.collection"}); Why I prefer using Aggregation: • Faster as you only look at one type of data • Provides chunk count per shard • Can programmatically use the output

7. Tracking Chunks and Moves The config.changelog tracks chunk changes. The types of changes to chunks are • split • moveChunk • moveChunk.start • moveChunk.from • moveChunk.to • moveChunk.commit Chunk Collection Examples http://bit.ly/112tXEx ChangeLog Examples http://bit.ly/1oZlyqN

8. Sizing Chunks • dataSize command - Very expensive Scans all documents to be 100% accurate • find().count() * avgObjSize on chunk ranges Rough size estimate based on stats • ChunkHunter.py (On GitHub http://bit.ly/1yWQf9T ) New community script from ObjectRocket to: a) copy chunks meta data to new collection b) scan each chunk there with dataSize or count c) output summary

9. Example ChunkHunter Output

10. Example ChunkHunter Output Total Chunks in Cluster

11. Example ChunkHunter Output Total Chunks Populated

12. Example ChunkHunter Output Total Chunks processed

13. Example ChunkHunter Output >250k Documents per chunk

14. Example ChunkHunter Output >64M per Chunk

15. What did we add to the doc?

16. Some example queries might be • Give me all Jumbo:true chunks • Order Jumbo chunks by documents so I can split them • Order Jumbo chunks by size so I can split them • Aggregate Jumbo chunks to count them per shard • Give me all Jumbo on Shard “X” Find all these and more at: http://bit.ly/1oYVGeJ

17. The Future… Will be releasing tools to simplify: • Manual Splits • Moving of Chunks These are not replacements for the balancer but ways to help it stay on track if things deviate. As opposed to waiting for an incident and fixing it in a panic.

18. Contact @dmurphy_data @objectrocket david@objectrocket.com https://www.objectrocket.com WE ARE HIRING! (DBA,DEVOPS, and more) https://www.objectrocket.com/careers Resources: ChunkHunter.py http://bit.ly/1yWQf9T Chunk Hunter Example Scripts bit.ly/1oYVGeJ Chunk Collection Queries http://bit.ly/112tXEx Changelog Queries http://bit.ly/1oZlyqN Presentations: Kenny Gorman Sharding - bit.ly/1oXYDfm Other Sharding Links - bit.ly/ZTtDI1

Hinweis der Redaktion

collections - Contains collection name, if its dropped, and any shard key straight line query for similar results: db.chunks.aggregate({$match:{ns:"<<full.namespace>>"}},{$group:{_id:"$shard",count:{$sum:1}}})
collections - Contains collection name, if its dropped, and any shard key
Examples of aggregation commands?
Caution on using dataSize?