Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 128 Anzeige

KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB

Herunterladen, um offline zu lesen

窪田 博昭、楽天株式会社
『Mongo Tokyo 2012』講演資料

Rakuten MongoDBの特徴、それを活かした使い方などを楽天・インフォシークニュースの事例などを通じて紹介します。
また機能、性能の検証結果、運用ノウハウ、弱点!の共有。PHPドライバのパッチなども公開します。

窪田 博昭、楽天株式会社
『Mongo Tokyo 2012』講演資料

Rakuten MongoDBの特徴、それを活かした使い方などを楽天・インフォシークニュースの事例などを通じて紹介します。
また機能、性能の検証結果、運用ノウハウ、弱点!の共有。PHPドライバのパッチなども公開します。

Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (18)

Ähnlich wie KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB (20)

Anzeige

Weitere von Rakuten Group, Inc. (20)

Aktuellste (20)

Anzeige

KVSの性能、RDBMSのインデックス、更にMapReduceを併せ持つAll-in-One NoSQL: MongoDB

  1. 1. KVSの性能 RDBMSのインデックス 更にMapReduceを併せ持つ All-in-one NoSQL 楽 天 株 式 会 社 開 発 部 ア ーキ テ ク ト G 窪 田 博 昭 | 2 0 1 2 年 1 月 1 8 日 1
  2. 2. Introduction Agenda • Introduction • How to use mongo on the news.infoseek.co.jp 2
  3. 3. Introduction 3
  4. 4. Who am I ? 4
  5. 5. Introduction Profile Name: 窪田 博昭 Hiroaki Kubota Company: Rakuten Inc. Unit: ACT = Development Unit Architect Group Mail: hiroaki.kubota@mail.rakuten.com Hobby: Futsal , Golf Recent: My physical power has gradual declined... twitter : crumbjp github: crumbjp 5
  6. 6. How to take advantages of the Mongo for the infoseek news 6
  7. 7. For instance of our page 7
  8. 8. Page structure 8
  9. 9. Layout / Components Layout Components 9
  10. 10. Albatross structure Internet Request SessionDB LayoutDB Gat page layout MongoDB WEB ReplSet MongoDB ReplSet Get components Call APIs Memcache API Retrieve data ContentsDB MongoDB ReplSet 10
  11. 11. Albatross structure Developer HTML markup LayoutDB Set page layout & Deploy API API settings CMS Batch servers MongoDB ReplSet Set components Insert Data API servers ContentsDB MongoDB ReplSet 11
  12. 12. CMS Layout editor 12
  13. 13. CMS 13
  14. 14. CMS 14
  15. 15. MapReduce 15
  16. 16. MapReduce Our usage We have never used MapReduce as regular operation. However, We have used it for some irreglar case. • To search the invalid articles that should be removed because of someone’s mistakes... • To analyze the number of new articles posted a day. • To analyze the updated number an article. • We get start considering to use it regularly for the social data analyzing before long ... 16
  17. 17. Structure & Performance 17
  18. 18. Structure We are using very poor machine (Virtual machine) !! • Intel(R) Xeon(R) CPU X5650 2.67GHz 1core!! • 4GB memory • 50 GB disk space ( iScsi ) • CentOS5.5 64bit • mongodb 1.8.0 – ReplicaSet 5 nodes ( + 1 Arbiter) – Oplog size 1.2GB – Average object size 1KB 18
  19. 19. Structure Researched environment We’ve also researched following environments... • Virtual machine 1 core – 1kb data , 6,000,000 documents – 8kb data , 200,000 documents • Virtual machine 3 core – 1kb data , 6,000,000 documents – 8kb data , 200,000 documents • EC2 large instance – 2kb data , 60,000,000 documents. ( 100GB ) 19
  20. 20. Performance I found the formula for making a rough estimation of QPS 1~8 kb documents + 1 unique index C = Number of CPU cores (Xeon 2.67 GHz) DD = Score of ‘dd’ command (byte/sec) S = Document size (byte) • GET qps = 4500 × C • SET(fsync) bytes/s = 0.05×DD ÷ S • SET(nsync) qps = 4500 BUT... have chance of STALE 20
  21. 21. Performance example (on EC2 large) 21
  22. 22. Performance example (on EC2 large) Environment and amount of data EC2 large instance – 2kb data , 60,000,000 documents. ( 100GB ) – 1 unique index Data-type { shop: 'someone', item: 'something', description: 'item explanation sentences...‘ } 22
  23. 23. Performance example (on EC2 large) Batch insert (1000 documents) fsync=true 17906 sec (=289 min) (=3358 docs/sec) Ensure index (background=false) 4049 sec (=67min) 1. primary 2101 sec (=35min) 2. secondary 1948 sec (=32min) 23
  24. 24. Performance example (on EC2 large) Add one node 5833sec (=97min) 1. Get files 2GB×48 2120 sec (=35min) 2. _id indexing 1406 sec (=23min) 3. uniq indexing 2251 sec (=38min) 4. other processes 56 sec (=1 min) 24
  25. 25. Performance example (on EC2 large) Group by • Reduce by unique index & map & reduce – 368 msec db.data.group({ key: { shop: 1}, cond: { shop: 'someone' }, reduce: function ( o , p ) { p.sum++; }, initial: { sum: 0 } }); 25
  26. 26. Performance example (on EC2 large) MapReduce • Scan all data 3116sec (=52min) – number of key = 39092 db.data.mapReduce( function(){ emit(this.shop,1); }, function(k,v){ var ret=0; v.forEach( function (value){ ret+=value; }); return ret; }, { query: {}, inline: 1, out: 'Tmp' } ); 26
  27. 27. Major problems... 27
  28. 28. Indexing 28
  29. 29. Index probrem Online indexisng is completely useless even if last version (2.0.2) Indexing is lock operation in default. Indexing operation can run as background on the primary. But... It CANNOT run as background on the secondary Moreover the all secondary’s indexing run at the same time !! Result in above... All slave freezes ! orz... 29
  30. 30. Present indexing ( default ) 30
  31. 31. Index probrem Present indexing ( default ) Primary save Batch Secondary Secondary Secondary Client Client Client Client Client 31
  32. 32. Index probrem Present indexing ( default ) Primary ensureIndex Lock Cannot Batch write Indexing Secondary Secondary Secondary Client Client Client Client Client 32
  33. 33. Index probrem Present indexing ( default ) Primary finished Batch Complete SYNC SYNC SYNC Secondary Secondary Secondary Lock Lock Lock Indexing Indexing Indexing Cannot read !! Client Client Client Client Client 33
  34. 34. Index probrem Ideal indexing ( default ) Primary Batch Complete Secondary Secondary Secondary Complete Complete Complete Client Client Client Client Client 34
  35. 35. Present indexing ( background ) 35
  36. 36. Index probrem Present indexing ( background ) Primary save Batch Secondary Secondary Secondary Client Client Client Client Client 36
  37. 37. Index probrem Present indexing ( background ) ensureIndex(background) Primary Slow down... Slowdown Batch Indexing Secondary Secondary Secondary Client Client Client Client Client 37
  38. 38. Index probrem Present indexing ( background ) Primary finished Batch Complete SYNC SYNC SYNC Secondary Secondary Secondary Lock Lock Lock Indexing Indexing Indexing Cannot read !! Client Client Client Client Client 38
  39. 39. Index probrem Present indexing ( background ) Primary finished Batch Background Complete don’t work indexing SYNC SYNC SYNC Secondary Secondary Secondary on the Lock secondaries Lock Lock Indexing Indexing Indexing Cannot read !! Client Client Client Client Client 39
  40. 40. Index probrem Present indexing ( background ) Primary finished Batch Complete SYNC SYNC SYNC Secondary Secondary Secondary Lock Lock Lock Indexing Indexing Indexing Cannot read !! Client Client Client Client Client 40
  41. 41. Index probrem Ideal indexing ( background ) Primary Batch Complete Secondary Secondary Secondary Complete Complete Complete Client Client Client Client Client 41
  42. 42. Probable 2.1.X indexing 42
  43. 43. Index probrem Accoding to mongodb.org this probrem will fix in 2.1.0 But not released formally. So I checked out the source code up to date. Certainlly it’ll be fixed ! Moreover it sounds like it’ll run as foreground when slave status isn’t SECONDARY (it means RECOVERING ) 43
  44. 44. Index probrem Probable 2.1.X indexing Primary save Batch Secondary Secondary Secondary Client Client Client Client Client 44
  45. 45. Index probrem Probable 2.1.X indexing ensureIndex(background) Primary Slow down... Slowdown Batch Indexing Secondary Secondary Secondary Client Client Client Client Client 45
  46. 46. Index probrem Probable 2.1.X indexing Primary finished Batch Complete SYNC SYNC SYNC Secondary Secondary Secondary Slowdown Slowdown Slowdown Indexing Indexing Indexing Slow down... Client Client Client Client Client 46
  47. 47. Index probrem Probable 2.1.X indexing Primary Batch Complete Secondary Secondary Secondary Complete Complete Complete Client Client Client Client Client 47
  48. 48. Index probrem Background indexing 2.1.X But I think it’s not enough. I think it can be fatal for the system that the all secondaries slowdown at the same time !! So... 48
  49. 49. Ideal indexing 49
  50. 50. Index probrem Ideal indexing Primary save Batch Secondary Secondary Secondary Client Client Client Client Client 50
  51. 51. Index probrem Ideal indexing ensureIndex(background) Primary Slow down... Slowdown Batch Indexing Secondary Secondary Secondary Client Client Client Client Client 51
  52. 52. Index probrem Ideal indexing Primary finished Batch Complete ensureIndex Recovering Secondary Secondary Indexing Client Client Client Client Client 52
  53. 53. Index probrem Ideal indexing Primary Batch Complete ensureIndex Secondary Recovering Secondary Complete Indexing Client Client Client Client Client 53
  54. 54. Index probrem Ideal indexing Primary Batch Complete ensureIndex Secondary Secondary Recovering Complete Complete Indexing Client Client Client Client Client 54
  55. 55. Index probrem Ideal indexing Primary Batch Complete Secondary Secondary Secondary Complete Complete Complete Client Client Client Client Client 55
  56. 56. Index probrem But ... I easilly guess it’s difficult to apply for current Oplog It would be great if I can operate indexing manually at each secondaries 56
  57. 57. I suggest Manual indexing 57
  58. 58. Index probrem Manual indexing Primary save Batch Secondary Secondary Secondary Client Client Client Client Client 58
  59. 59. Index probrem Manual indexing Primary ensureIndex(manual,background) Slow down... Slowdown Batch Indexing Secondary Secondary Secondary Client Client Client Client Client 59
  60. 60. Index probrem Manual indexing Primary finished Batch Complete Secondary Secondary Secondary Client Client Client Client Client 60
  61. 61. Index probrem Manual indexing Primary finished Batch Complete Secondary Secondary Secondary The secondaries don’t sync automatically Client Client Client Client Client 61
  62. 62. Index probrem Manual indexing Primary finished Batch Complete Secondary Secondary Secondary Client Client Client Client Client 62
  63. 63. Index probrem Manual indexing Primary Batch Complete ensureIndex(manual) Recovering Secondary Secondary Indexing Client Client Client Client Client 63
  64. 64. Index probrem Manual indexing Primary Batch Complete ensureIndex(manual) Secondary Recovering Secondary Complete Indexing Client Client Client Client Client 64
  65. 65. Index probrem Manual indexing Primary Batch Complete ensureIndex(manual,background) Secondary Secondary Secondary Slowdown Complete Complete Indexing Client Client Client Client Client 65
  66. 66. Index probrem Manual indexing Primary Batch Complete It needs to support ensureIndex(manual,background) background operation Secondary Secondary Secondary Slowdown Complete Complete Indexing Just in case,if the ReplSet has only one Secondary Client Client Client Client Client 66
  67. 67. Index probrem Manual indexing Primary Batch Complete ensureIndex(manual,background) Secondary Secondary Secondary Slowdown Complete Complete Indexing Client Client Client Client Client 67
  68. 68. Index probrem Manual indexing Primary Batch Complete Secondary Secondary Secondary Complete Complete Complete Client Client Client Client Client 68
  69. 69. That’s all about Indexing problem 69
  70. 70. Struggle to control the sync 70
  71. 71. STALE 71
  72. 72. Unknown log & Out of control the ReplSet We often suffered from going out of control the Secondaries... • Secondaries change status repeatedly in a moment between Secondary and Recovering (1.8.0) • Then we found the strange line in the log... [rsSync] replSet error RS102 too stale to catch up 72
  73. 73. What’s Stale ? stale [stéil] (レベル:社会人必須 ) powered by goo.ne.jp • 〈食品・飲料などが〉新鮮でない(⇔fresh); • 気の抜けた, 〈コーヒーが〉香りの抜けた, • 〈パンが〉ひからびた, 堅くなった, • 〈空気・臭(にお)いなどが〉むっとする, • いやな臭いのする 73
  74. 74. What’s Stale ? stale [stéil] (レベル:社会人必須 ) powered by goo.ne.jp • 〈食品・飲料などが〉新鮮でない(⇔fresh); • 気の抜けた, 〈コーヒーが〉香りの抜けた, • 〈パンが〉ひからびた, 堅くなった, • 〈空気・臭(にお)いなどが〉むっとする, • いやな臭いのする どうも非常によろしくないらしい・・・ 74
  75. 75. Mechanizm of being stale 75
  76. 76. ReplicaSet Client mongod mongod Database Oplog Database Oplog Primary Secondary 76
  77. 77. Replication (simple case) 77
  78. 78. ReplicaSet Client mongod mongod Database Oplog Database Oplog Primary Secondary 78
  79. 79. Insert & Replication 1 A Client Insert mongod mongod Insert A A Database Oplog Database Oplog Primary Secondary 79
  80. 80. Insert & Replication 1 Client Sync Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 80
  81. 81. Replication (busy case) 81
  82. 82. Stale Client mongod mongod Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 82
  83. 83. Insert & Replication 2 B Client Insert Insert B B Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 83
  84. 84. Insert & Replication 2 C Client Insert Insert C C Insert B B Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 84
  85. 85. Insert & Replication 2 A Client Update Update A Insert C C Insert B B Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 85
  86. 86. Insert & Replication 2 Client Check Oplog Update A Insert C C Insert B B Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 86
  87. 87. Insert & Replication 2 Client Sync Update A Update A Insert C Insert C C Insert B C Insert B B Insert A B Insert A A A Database Oplog Database Oplog Primary Secondary 87
  88. 88. Replication (more busy) 88
  89. 89. Stale Client mongod mongod Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 89
  90. 90. Stale B Client Insert Insert B B Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 90
  91. 91. Stale C Client Insert Insert C C Insert B B Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 91
  92. 92. Stale A Client Update Update A Insert C C Insert B B Insert A Insert A A A Database Oplog Database Oplog Primary Secondary 92
  93. 93. Stale C Client Update Update C Update A C Insert C B Insert B Insert A A Insert A A Database Oplog Database Oplog Primary Secondary 93
  94. 94. Stale D Client Insert Insert D D Update C C Update A B Insert C Insert A A Insert B A Database Insert A Database Oplog Primary Secondary 94
  95. 95. Stale Client [Inset A] not found !! Check Oplog Insert D D Update C C Update A B Insert C Insert A A Insert B A Database Insert A Database Oplog Primary Secondary 95
  96. 96. Stale Client [Inset A] not found !! Check Oplog It cannot get infomation about [Insert B]. Insert D D Update C C Update A So cannot sync !! B Insert C Insert A A Insert B A It’s called STALE Database Insert A Database Oplog Primary Recovering 96
  97. 97. Stale We have to understand the importance of adjusting oplog size We can specify the oplog size as one of the command line option Only at the first time per the dbpath that is also specified as a command line. Also we cannot change the oplog size without clearing the dbpath. Be careful ! 97
  98. 98. Replication (Join as a new node) 98
  99. 99. InitialSync Client mongod Insert D D Update C C Update A B Insert C A Database Oplog Primary 99
  100. 100. InitialSync Client mongod mongod Insert D D Update C C Update A B Insert C A Database Oplog Database Oplog Primary Startup 100
  101. 101. InitialSync Client Get last Oplog Insert D D Update C C Update A B Insert C Insert D A Database Oplog Database Oplog Primary Recovering 101
  102. 102. InitialSync D Client C B A Cloning DB Insert D D Update C C Update A B Insert C Insert D A Database Oplog Database Oplog Primary Recovering 102
  103. 103. InitialSync D Client C B A Cloning DB Insert D D Update C C Update A B Insert C Insert D A A Database Oplog Database Oplog Primary Recovering 103
  104. 104. InitialSync E D Client Insert C B A Cloning DB E Insert E D Insert D C Update C B B Update A Insert D A A Insert C Database Oplog Database Oplog Primary Recovering 104
  105. 105. InitialSync B Client Update Cloning DB complete E Update B D Insert E D C Insert D C B Update C B Insert D A Update A A Database Oplog Database Oplog Primary Recovering 105
  106. 106. InitialSync Client Check Oplog E Update B D Insert E D C Insert D C B Update C B Insert D A A Database Oplog Database Oplog Primary Recovering 106
  107. 107. InitialSync Client Sync E Update B E D Insert E D Update B C Insert D C Insert E B Update C B Insert D A A Database Oplog Database Oplog Primary Secondary 107
  108. 108. Additional infomation From source code. ( I’ve never examed these... ) Secondary will try to sync from other Secondaries when it cannot reach the Primary or might be stale against the Primary. There is a bit of chance that sync problem not occured if the secondary has old Oplog or larger Oplog space than Primary 108
  109. 109. Sync from another secondary Client Insert D Insert D D Update C D Update C C Update A C Update A B Insert C Insert A B Insert C A Insert B A A Insert B Database Insert A Database Oplog Database Insert A Primary Secondary Secondary 109
  110. 110. Sync from another secondary Client [Inset A] not found !! Check Oplog Insert D Insert D D Update C D Update C C Update A C Update A B Insert C Insert A B Insert C A Insert B A A Insert B Database Insert A Database Oplog Database Insert A Primary Secondary Secondary 110
  111. 111. Sync from another secondary Client But found at the other secondary So it’s able to sync Check Oplog Insert D Insert D D Update C D Update C C Update A C Update A B Insert C Insert A B Insert C A Insert B A A Insert B Database Insert A Database Oplog Database Insert A Primary Secondary Secondary 111
  112. 112. Sync from the other secondary Client But found at the other secondary So it’s able to sync Sync Insert D Insert D Insert D D Update C D Update C D Update C C Update A C Update A C Update A B Insert C B Insert C B Insert C A Insert B A Insert B A Insert B Insert A Insert A Insert A Database Database Database Primary Secondary Secondary 112
  113. 113. That’s all about sync 113
  114. 114. Others... 114
  115. 115. Disk space 115
  116. 116. Disk space Data fragment into any DB files sparsely... We met the unfavorable circumstance in our DBs This circumstance appears at some of our collections around 3 months after we launched the services db.ourcol.storageSize() = 16200727264 (15GB) db.ourcol.totalSize() = 16200809184 db.ourcol.totalIndexSize() = 81920 db.outcol.dataSize() = 2032300 (2MB) What’s happen to them !! 116
  117. 117. Disk space Data fragment into any DB files sparsely... It’s seems like to be caused by the specific operation that insert , update and delete over and over. Anyway we have to shrink the using disk space regularly just like PostgreSQL’s vacume. But how to do it ? 117
  118. 118. Disk space Shrink the using disk spaces MongoDB offers some functions for this case. But couldn’t use in our case ! repairdatabase: Only runable on the Primary. It needs long time and BLOCK all operations !! compact: Only runable on the Secondary. Zero-fill the blank space instead of shrink disk spaces. So cannot shrink... 118
  119. 119. Disk space Our measurements For temporary collection: To issue drop-command regularly. For other collections: 1. Get rid of one secondary from the ReplSet. 2. Shut down this. 3. Remove all DB files. 4. Join to the ReplSet. 5. Do these operations one after another. 6. Step down the Primary. (Change Primary node) 7. At last, do 1 – 4 operations on prior Primary. 119
  120. 120. PHP client 120
  121. 121. PHP client We tried 1.4.4 and 1.2.2 1.4.4: There is some critical bugs around connection pool. We struggled to invalidate the broken connection. I think, you should use 1.2.X instead of 1.4.X 1.2.2: It seems like to be fixed around connection pool. But there are 2 critical bugs ! – Socket handle leak – Useless sleep However, This version is relatively stable 121 as long as to fix these bugs
  122. 122. PHP client We tried 1.4.4 and 1.2.2 https://github.com/crumbjp/Personal - mongo1.2.2.non-wait.patch - mongo1.2.2.sock-leak.patch 122
  123. 123. PHP client 123
  124. 124. Closing 124
  125. 125. Closing What’s MongoDB ? It has very good READ performance. We can use mongo instead of memcached. if we can allow the limited write performance. Die hard ! MongoDB have high availability even if under a severe stress.. Can use easilly without deep consideration We can manage to do anything after getting start to use. Let’s forget any awkward trivial things that have bothered us. How to treat the huge data ? How to put in the cache system ? How to keep the availablity ? And so on .... 125
  126. 126. Closing Keep in mind Sharding is challenging... It’s last resort ! It’s hard to operate. In particular, to maintain config-servers. [Mongos] is also difficult to keep alive. I want the way to failover Mongos. Mongo is able to run on the poor environment but... You should ONLY put aside the large diskspace Huge write is sensitive Adjust the oplog size carefully Indexing function has been unfinished Cannot apply index online 126
  127. 127. All right, Have fun !! 127
  128. 128. Thank you for your listening 128

×