SlideShare ist ein Scribd-Unternehmen logo
1 von 13
Downloaden Sie, um offline zu lesen
Information
•Tweet: Hashtag #jawsdays #ijaws
•Please register you on ijaws on Doorkeeper (Next
meetup on Mid April)
•There’s the JOB board behind the wall
Self-Introduction
•Ryuji Tamagawa@facebook
•tamagawa_ryuji@twitter
•Software Developer working in
Osaka
•Translator (for O’Reilly)
•Loves performance tuning
Introducing MongoDB
•Hybrid of NoSQL and RDB
•Easily Scales (up to certain point)
•Stores JSON document as ‘BSON’
•Has Seconday Index ( on any part of JSON Doc), Query Optimizer
•Replication, Sharding ready
To make MongoDB runs fast on AWS
•You have to understand:
•its architectural feature of memory
management
•Workload pattern of your application
•Size of your ‘HOT’ data
What’s the ‘HOT’ data?
•‘Hot’ Data is what accessed frequently
•Ex: If you simply write data like access logs and transfer
them to somewhere else, ‘hot’ spot could be very small
•If the collection has indexes, one write can make many
places hot
MongoDB does not manage memory
•Most DBMS has built-in MMS,
but MongoDB doesn’t.
•MongoDB accesses database
files through ‘Memory
mapped files’: Let the OS
manage the buffer
Traditional RDB
Memory
Buffer
DB Files
MongoDB
Memory Mapped
DB Files
OS
App
The Rules of Thumb about Memory
•Give enough memory to the OS to hold ‘HOT’ data
•Don’t forget about the indexes
•Use dedicated EC2 instances
Keep your data safe with Replication
•Using ReplicaSet, you can distribute
your data to many places easily
•You have choices to keep your data
safe from crashes
•EBS or Instance Store : trade off
between cost, safety, performance
Primary
Secondary Secondary
Try MongoDB’s Replicaset with:
https://bitbucket.org/tamagawa_ryuji/mongodb_replicaset_playground_on_vagrant
Storage Performance Evaluated
• Converted Wikipadia-ja’s page data (about 1,700,000
documents) to JSON
• Write them to MongoDB on EC2 from another instance
• Data writer is a simple python application with
pymongo driver running 4 processes
Storage Performance Evaluated
Instance Type
Instance
Cost(Spot)
Storage Time to finish
ebs-normal 0:10:55
ephemeral0 0:07:36
PIOPS 1500 0:08:26
ephemeral0 0:10:22
PIOPS 1500 0:09:02
ephemeral0 0:05:19
m3.large $0.09
m3.xlarge
(SSD instance store)
$0.16
hi1.4xlarge
(Storage Optimized)
$0.50
Comparing Instance Types
Instance
Type
CPU ECU Memory Storage Cost
Memory
($/GB)
CPU
($/ECU)
Storage
($/100GB)
m3.medium 1 3 3.75 1 x 4 SSD $0.17 $0.05 $0.06 $4.28
m3.large 2 6.5 7.5 1 x 32 SSD $0.34 $0.05 $0.05 $1.07
m3.xlarge 4 13 15 2 x 40 SSD $0.68 $0.05 $0.05 $0.86
m3.2xlarge 8 26 30 2 x 80 SSD $1.37 $0.05 $0.05 $0.86
m2.xlarge 2 6.5 17.1 1 x 420 $0.51 $0.03 $0.08 $0.12
m2.2xlarge 4 13 34.2 1 x 850 $1.01 $0.03 $0.08 $0.12
m2.4xlarge 8 26 68.4 2 x 840 $2.02 $0.03 $0.08 $0.12
cr1.8xlarge 32 88 244 2 x 120 SSD $4.31 $0.02 $0.05 $1.80
i2.xlarge 4 14 30.5 1 x 800 SSD $1.05 $0.03 $0.08 $0.13
i2.2xlarge 8 27 61 2 x 800 SSD $2.10 $0.03 $0.08 $0.13
i2.4xlarge 16 53 122 4 x 800 SSD $4.20 $0.03 $0.08 $0.13
i2.8xlarge 32 104 244 8 x 800 SSD $8.40 $0.03 $0.08 $0.13
hs1.8xlarge 16 35 117 24 x 2048 $5.67 $0.05 $0.16 $0.01
THANK YOU !
YOUR CONTACTS ARE WELCOME !!

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 
Intro To Mongo Db
Intro To Mongo DbIntro To Mongo Db
Intro To Mongo Dbchriskite
 
MongoDB NYC Python
MongoDB NYC PythonMongoDB NYC Python
MongoDB NYC PythonMike Dirolf
 
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DB
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DBHow to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DB
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DBMicrosoft Tech Community
 
Omaha Rails User Group - Ec2
Omaha Rails User Group - Ec2Omaha Rails User Group - Ec2
Omaha Rails User Group - Ec2BrightMix
 
Explore the Cosmos (DB) with .NET Core 2.0
Explore the Cosmos (DB) with .NET Core 2.0Explore the Cosmos (DB) with .NET Core 2.0
Explore the Cosmos (DB) with .NET Core 2.0Jeremy Likness
 
Presentation: mongo db & elasticsearch & membase
Presentation: mongo db & elasticsearch & membasePresentation: mongo db & elasticsearch & membase
Presentation: mongo db & elasticsearch & membaseArdak Shalkarbayuli
 
What to know about Amazon Elastic Block Store (EBS)
What to know about Amazon Elastic Block Store (EBS)What to know about Amazon Elastic Block Store (EBS)
What to know about Amazon Elastic Block Store (EBS)LCloud
 
AWS Customer Presentation - HotPads
AWS Customer Presentation - HotPadsAWS Customer Presentation - HotPads
AWS Customer Presentation - HotPadsAmazon Web Services
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architectureBishal Khanal
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBLee Theobald
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBMongoDB
 
Introduction into CouchDB / Jan Lehnardt
Introduction into CouchDB / Jan LehnardtIntroduction into CouchDB / Jan Lehnardt
Introduction into CouchDB / Jan LehnardtBBC Web Developers
 
AWS Cloud experience concepts tips and tricks
AWS Cloud experience concepts tips and tricksAWS Cloud experience concepts tips and tricks
AWS Cloud experience concepts tips and tricksDirk Harms-Merbitz
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Burak TUNGUT
 
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...Athens Big Data
 
PostgreSQL is the new NoSQL - at Devoxx 2018
PostgreSQL is the new NoSQL  - at Devoxx 2018PostgreSQL is the new NoSQL  - at Devoxx 2018
PostgreSQL is the new NoSQL - at Devoxx 2018Quentin Adam
 

Was ist angesagt? (20)

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Intro To Mongo Db
Intro To Mongo DbIntro To Mongo Db
Intro To Mongo Db
 
MongoDB NYC Python
MongoDB NYC PythonMongoDB NYC Python
MongoDB NYC Python
 
MongoDB @ fliptop
MongoDB @ fliptopMongoDB @ fliptop
MongoDB @ fliptop
 
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DB
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DBHow to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DB
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DB
 
Omaha Rails User Group - Ec2
Omaha Rails User Group - Ec2Omaha Rails User Group - Ec2
Omaha Rails User Group - Ec2
 
Explore the Cosmos (DB) with .NET Core 2.0
Explore the Cosmos (DB) with .NET Core 2.0Explore the Cosmos (DB) with .NET Core 2.0
Explore the Cosmos (DB) with .NET Core 2.0
 
Ec2onrails
Ec2onrailsEc2onrails
Ec2onrails
 
Presentation: mongo db & elasticsearch & membase
Presentation: mongo db & elasticsearch & membasePresentation: mongo db & elasticsearch & membase
Presentation: mongo db & elasticsearch & membase
 
What to know about Amazon Elastic Block Store (EBS)
What to know about Amazon Elastic Block Store (EBS)What to know about Amazon Elastic Block Store (EBS)
What to know about Amazon Elastic Block Store (EBS)
 
AWS Customer Presentation - HotPads
AWS Customer Presentation - HotPadsAWS Customer Presentation - HotPads
AWS Customer Presentation - HotPads
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
 
Introduction into CouchDB / Jan Lehnardt
Introduction into CouchDB / Jan LehnardtIntroduction into CouchDB / Jan Lehnardt
Introduction into CouchDB / Jan Lehnardt
 
AWS Cloud experience concepts tips and tricks
AWS Cloud experience concepts tips and tricksAWS Cloud experience concepts tips and tricks
AWS Cloud experience concepts tips and tricks
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5
 
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
 
PostgreSQL is the new NoSQL - at Devoxx 2018
PostgreSQL is the new NoSQL  - at Devoxx 2018PostgreSQL is the new NoSQL  - at Devoxx 2018
PostgreSQL is the new NoSQL - at Devoxx 2018
 
MongoDb - Details on the POC
MongoDb - Details on the POCMongoDb - Details on the POC
MongoDb - Details on the POC
 

Ähnlich wie MongoDB tuning on AWS

Augmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure DataAugmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure DataTreasure Data, Inc.
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataTreasure Data, Inc.
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB Habilelabs
 
Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at nightMichael Yarichuk
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring dataJimmy Ray
 
Deep Dive - Maximising EC2 & EBS Performance
Deep Dive - Maximising EC2 & EBS PerformanceDeep Dive - Maximising EC2 & EBS Performance
Deep Dive - Maximising EC2 & EBS PerformanceAmazon Web Services
 
Mongodb Training Tutorial in Bangalore
Mongodb Training Tutorial in BangaloreMongodb Training Tutorial in Bangalore
Mongodb Training Tutorial in Bangalorerajkamaltibacademy
 
MongoDB Aggregation Performance
MongoDB Aggregation PerformanceMongoDB Aggregation Performance
MongoDB Aggregation PerformanceMongoDB
 
Scaling with mongo db (with notes)
Scaling with mongo db (with notes)Scaling with mongo db (with notes)
Scaling with mongo db (with notes)emiltamas
 
Top MongoDB interview Questions and Answers
Top MongoDB interview Questions and AnswersTop MongoDB interview Questions and Answers
Top MongoDB interview Questions and Answersjeetendra mandal
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Consjohnrjenson
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDBMongoDB
 
Deep Dive: Maximizing EC2 and EBS Performance
Deep Dive: Maximizing EC2 and EBS PerformanceDeep Dive: Maximizing EC2 and EBS Performance
Deep Dive: Maximizing EC2 and EBS PerformanceAmazon Web Services
 
Deep Dive: Maximizing Amazon EC2 and Amazon Elastic Block Store Performance
Deep Dive: Maximizing Amazon EC2 and Amazon Elastic Block Store PerformanceDeep Dive: Maximizing Amazon EC2 and Amazon Elastic Block Store Performance
Deep Dive: Maximizing Amazon EC2 and Amazon Elastic Block Store PerformanceAmazon Web Services
 
Deep Dive: Maximizing Amazon EC2 and Amazon Elastic Block Store Performance
Deep Dive: Maximizing Amazon EC2 and Amazon Elastic Block Store PerformanceDeep Dive: Maximizing Amazon EC2 and Amazon Elastic Block Store Performance
Deep Dive: Maximizing Amazon EC2 and Amazon Elastic Block Store PerformanceAmazon Web Services
 
MongoDB by Emroz sardar.
MongoDB by Emroz sardar.MongoDB by Emroz sardar.
MongoDB by Emroz sardar.Emroz Sardar
 
10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCupWebGeek Philippines
 
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayChoosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayAmazon Web Services Korea
 

Ähnlich wie MongoDB tuning on AWS (20)

Augmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure DataAugmenting Mongo DB with Treasure Data
Augmenting Mongo DB with Treasure Data
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB
 
Why databases cry at night
Why databases cry at nightWhy databases cry at night
Why databases cry at night
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring data
 
Deep Dive - Maximising EC2 & EBS Performance
Deep Dive - Maximising EC2 & EBS PerformanceDeep Dive - Maximising EC2 & EBS Performance
Deep Dive - Maximising EC2 & EBS Performance
 
Mongodb Training Tutorial in Bangalore
Mongodb Training Tutorial in BangaloreMongodb Training Tutorial in Bangalore
Mongodb Training Tutorial in Bangalore
 
MongoDB Aggregation Performance
MongoDB Aggregation PerformanceMongoDB Aggregation Performance
MongoDB Aggregation Performance
 
Scaling with mongo db (with notes)
Scaling with mongo db (with notes)Scaling with mongo db (with notes)
Scaling with mongo db (with notes)
 
Top MongoDB interview Questions and Answers
Top MongoDB interview Questions and AnswersTop MongoDB interview Questions and Answers
Top MongoDB interview Questions and Answers
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDB
 
Deep Dive: Maximizing EC2 and EBS Performance
Deep Dive: Maximizing EC2 and EBS PerformanceDeep Dive: Maximizing EC2 and EBS Performance
Deep Dive: Maximizing EC2 and EBS Performance
 
Deep Dive: Maximizing Amazon EC2 and Amazon Elastic Block Store Performance
Deep Dive: Maximizing Amazon EC2 and Amazon Elastic Block Store PerformanceDeep Dive: Maximizing Amazon EC2 and Amazon Elastic Block Store Performance
Deep Dive: Maximizing Amazon EC2 and Amazon Elastic Block Store Performance
 
Deep Dive: Maximizing Amazon EC2 and Amazon Elastic Block Store Performance
Deep Dive: Maximizing Amazon EC2 and Amazon Elastic Block Store PerformanceDeep Dive: Maximizing Amazon EC2 and Amazon Elastic Block Store Performance
Deep Dive: Maximizing Amazon EC2 and Amazon Elastic Block Store Performance
 
MongoDB by Emroz sardar.
MongoDB by Emroz sardar.MongoDB by Emroz sardar.
MongoDB by Emroz sardar.
 
MongoDB
MongoDBMongoDB
MongoDB
 
Mongodb
MongodbMongodb
Mongodb
 
10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup10gen MongoDB Video Presentation at WebGeek DevCup
10gen MongoDB Video Presentation at WebGeek DevCup
 
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayChoosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
 

Mehr von Ryuji Tamagawa

20171012 found IT #9 PySparkの勘所
20171012 found  IT #9 PySparkの勘所20171012 found  IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所Ryuji Tamagawa
 
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所Ryuji Tamagawa
 
hbstudy 74 Site Reliability Engineering
hbstudy 74 Site Reliability Engineeringhbstudy 74 Site Reliability Engineering
hbstudy 74 Site Reliability EngineeringRyuji Tamagawa
 
PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) Ryuji Tamagawa
 
20170210 sapporotechbar7
20170210 sapporotechbar720170210 sapporotechbar7
20170210 sapporotechbar7Ryuji Tamagawa
 
20161215 python pandas-spark四方山話
20161215 python pandas-spark四方山話20161215 python pandas-spark四方山話
20161215 python pandas-spark四方山話Ryuji Tamagawa
 
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京Ryuji Tamagawa
 
20160708 データ処理のプラットフォームとしてのpython 札幌
20160708 データ処理のプラットフォームとしてのpython 札幌20160708 データ処理のプラットフォームとしてのpython 札幌
20160708 データ処理のプラットフォームとしてのpython 札幌Ryuji Tamagawa
 
20160127三木会 RDB経験者のためのspark
20160127三木会 RDB経験者のためのspark20160127三木会 RDB経験者のためのspark
20160127三木会 RDB経験者のためのsparkRyuji Tamagawa
 
20151205 Japan.R SparkRとParquet
20151205 Japan.R SparkRとParquet20151205 Japan.R SparkRとParquet
20151205 Japan.R SparkRとParquetRyuji Tamagawa
 
Performant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame APIPerformant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame APIRyuji Tamagawa
 
足を地に着け落ち着いて考える
足を地に着け落ち着いて考える足を地に着け落ち着いて考える
足を地に着け落ち着いて考えるRyuji Tamagawa
 
ヘルシープログラマ・翻訳と実践
ヘルシープログラマ・翻訳と実践ヘルシープログラマ・翻訳と実践
ヘルシープログラマ・翻訳と実践Ryuji Tamagawa
 
BigQueryの課金、節約しませんか
BigQueryの課金、節約しませんかBigQueryの課金、節約しませんか
BigQueryの課金、節約しませんかRyuji Tamagawa
 
You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQueryRyuji Tamagawa
 
Google BigQueryについて 紹介と推測
Google BigQueryについて 紹介と推測Google BigQueryについて 紹介と推測
Google BigQueryについて 紹介と推測Ryuji Tamagawa
 
lessons learned from talking at rakuten technology conference
lessons learned from talking at rakuten technology conferencelessons learned from talking at rakuten technology conference
lessons learned from talking at rakuten technology conferenceRyuji Tamagawa
 
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみましたRyuji Tamagawa
 

Mehr von Ryuji Tamagawa (20)

20171012 found IT #9 PySparkの勘所
20171012 found  IT #9 PySparkの勘所20171012 found  IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所
 
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
 
hbstudy 74 Site Reliability Engineering
hbstudy 74 Site Reliability Engineeringhbstudy 74 Site Reliability Engineering
hbstudy 74 Site Reliability Engineering
 
PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase) PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase)
 
20170210 sapporotechbar7
20170210 sapporotechbar720170210 sapporotechbar7
20170210 sapporotechbar7
 
20161215 python pandas-spark四方山話
20161215 python pandas-spark四方山話20161215 python pandas-spark四方山話
20161215 python pandas-spark四方山話
 
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
 
20160708 データ処理のプラットフォームとしてのpython 札幌
20160708 データ処理のプラットフォームとしてのpython 札幌20160708 データ処理のプラットフォームとしてのpython 札幌
20160708 データ処理のプラットフォームとしてのpython 札幌
 
20160127三木会 RDB経験者のためのspark
20160127三木会 RDB経験者のためのspark20160127三木会 RDB経験者のためのspark
20160127三木会 RDB経験者のためのspark
 
20151205 Japan.R SparkRとParquet
20151205 Japan.R SparkRとParquet20151205 Japan.R SparkRとParquet
20151205 Japan.R SparkRとParquet
 
Performant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame APIPerformant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame API
 
Apache Sparkの紹介
Apache Sparkの紹介Apache Sparkの紹介
Apache Sparkの紹介
 
足を地に着け落ち着いて考える
足を地に着け落ち着いて考える足を地に着け落ち着いて考える
足を地に着け落ち着いて考える
 
ヘルシープログラマ・翻訳と実践
ヘルシープログラマ・翻訳と実践ヘルシープログラマ・翻訳と実践
ヘルシープログラマ・翻訳と実践
 
Google Big Query
Google Big QueryGoogle Big Query
Google Big Query
 
BigQueryの課金、節約しませんか
BigQueryの課金、節約しませんかBigQueryの課金、節約しませんか
BigQueryの課金、節約しませんか
 
You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQuery
 
Google BigQueryについて 紹介と推測
Google BigQueryについて 紹介と推測Google BigQueryについて 紹介と推測
Google BigQueryについて 紹介と推測
 
lessons learned from talking at rakuten technology conference
lessons learned from talking at rakuten technology conferencelessons learned from talking at rakuten technology conference
lessons learned from talking at rakuten technology conference
 
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
 

MongoDB tuning on AWS

  • 1.
  • 2. Information •Tweet: Hashtag #jawsdays #ijaws •Please register you on ijaws on Doorkeeper (Next meetup on Mid April) •There’s the JOB board behind the wall
  • 3. Self-Introduction •Ryuji Tamagawa@facebook •tamagawa_ryuji@twitter •Software Developer working in Osaka •Translator (for O’Reilly) •Loves performance tuning
  • 4. Introducing MongoDB •Hybrid of NoSQL and RDB •Easily Scales (up to certain point) •Stores JSON document as ‘BSON’ •Has Seconday Index ( on any part of JSON Doc), Query Optimizer •Replication, Sharding ready
  • 5. To make MongoDB runs fast on AWS •You have to understand: •its architectural feature of memory management •Workload pattern of your application •Size of your ‘HOT’ data
  • 6. What’s the ‘HOT’ data? •‘Hot’ Data is what accessed frequently •Ex: If you simply write data like access logs and transfer them to somewhere else, ‘hot’ spot could be very small •If the collection has indexes, one write can make many places hot
  • 7. MongoDB does not manage memory •Most DBMS has built-in MMS, but MongoDB doesn’t. •MongoDB accesses database files through ‘Memory mapped files’: Let the OS manage the buffer Traditional RDB Memory Buffer DB Files MongoDB Memory Mapped DB Files OS App
  • 8. The Rules of Thumb about Memory •Give enough memory to the OS to hold ‘HOT’ data •Don’t forget about the indexes •Use dedicated EC2 instances
  • 9. Keep your data safe with Replication •Using ReplicaSet, you can distribute your data to many places easily •You have choices to keep your data safe from crashes •EBS or Instance Store : trade off between cost, safety, performance Primary Secondary Secondary Try MongoDB’s Replicaset with: https://bitbucket.org/tamagawa_ryuji/mongodb_replicaset_playground_on_vagrant
  • 10. Storage Performance Evaluated • Converted Wikipadia-ja’s page data (about 1,700,000 documents) to JSON • Write them to MongoDB on EC2 from another instance • Data writer is a simple python application with pymongo driver running 4 processes
  • 11. Storage Performance Evaluated Instance Type Instance Cost(Spot) Storage Time to finish ebs-normal 0:10:55 ephemeral0 0:07:36 PIOPS 1500 0:08:26 ephemeral0 0:10:22 PIOPS 1500 0:09:02 ephemeral0 0:05:19 m3.large $0.09 m3.xlarge (SSD instance store) $0.16 hi1.4xlarge (Storage Optimized) $0.50
  • 12. Comparing Instance Types Instance Type CPU ECU Memory Storage Cost Memory ($/GB) CPU ($/ECU) Storage ($/100GB) m3.medium 1 3 3.75 1 x 4 SSD $0.17 $0.05 $0.06 $4.28 m3.large 2 6.5 7.5 1 x 32 SSD $0.34 $0.05 $0.05 $1.07 m3.xlarge 4 13 15 2 x 40 SSD $0.68 $0.05 $0.05 $0.86 m3.2xlarge 8 26 30 2 x 80 SSD $1.37 $0.05 $0.05 $0.86 m2.xlarge 2 6.5 17.1 1 x 420 $0.51 $0.03 $0.08 $0.12 m2.2xlarge 4 13 34.2 1 x 850 $1.01 $0.03 $0.08 $0.12 m2.4xlarge 8 26 68.4 2 x 840 $2.02 $0.03 $0.08 $0.12 cr1.8xlarge 32 88 244 2 x 120 SSD $4.31 $0.02 $0.05 $1.80 i2.xlarge 4 14 30.5 1 x 800 SSD $1.05 $0.03 $0.08 $0.13 i2.2xlarge 8 27 61 2 x 800 SSD $2.10 $0.03 $0.08 $0.13 i2.4xlarge 16 53 122 4 x 800 SSD $4.20 $0.03 $0.08 $0.13 i2.8xlarge 32 104 244 8 x 800 SSD $8.40 $0.03 $0.08 $0.13 hs1.8xlarge 16 35 117 24 x 2048 $5.67 $0.05 $0.16 $0.01
  • 13. THANK YOU ! YOUR CONTACTS ARE WELCOME !!