Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Azure DocumentDB: Advanced Features for Large Scale-Apps

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Wird geladen in …3
×

Hier ansehen

1 von 70 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Azure DocumentDB: Advanced Features for Large Scale-Apps (20)

Anzeige

Aktuellste (20)

Azure DocumentDB: Advanced Features for Large Scale-Apps

  1. 1. { "name": "Andrew Liu", "e-mail": "andrl@microsoft.com", "twitter": "@aliuy8" }
  2. 2. • NoSQL is buzzword • NoSQL is varied • Key-value • Wide-column • Graph • Document-oriented
  3. 3. { "name": "SmugMug", "permalink": "smugmug", "homepage_url": "http://www.smugmug.com", "blog_url": "http://blogs.smugmug.com/", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ] } Perfect for these Documents schema-agnostic JSON store for hierarchical and de-normalized data at scale
  4. 4. Not these documents
  5. 5. { "name": "SmugMug", "permalink": "smugmug", "homepage_url": "http://www.smugmug.com", "blog_url": "http://blogs.smugmug.com/", "category_code": "photo_video", "products": [ { "name": "SmugMug", "permalink": "smugmug" } ], "offices": [ { "description": "", "address1": "67 E. Evelyn Ave", "address2": "", "zip_code": "94041", "city": "Mountain View", "state_code": "CA", "country_code": "USA", "latitude": 37.390056, "longitude": -122.067692 } ] } Perfect for these Documents schema-agnostic JSON store for hierarchical and de-normalized data at scale
  6. 6. Azure DocumentDB Millions of RPS Many TBs of data Transparent Partitioning <10ms Reads <15ms Writes @P99 Low-latency access around the globe! Automatic Indexing Easy-to-learn query grammar Multi-Record Transactions Blazing fast, planet scale NoSQL service 99.99% SLAs for availability, latency, and throughput
  7. 7. How does this fit in the Azure family?
  8. 8. Item Author Pages Language Harry Potter and the Sorcerer’s Stone J.K. Rowling 309 English Game of Thrones: A Song of Ice and Fire George R.R. Martin 864 English
  9. 9. Item Author Pages Language Harry Potter and the Sorcerer’s Stone J.K. Rowling 309 English Game of Thrones: A Song of Ice and Fire George R.R. Martin 864 English Lenovo Thinkpad X1 Carbon ??? ??? ???
  10. 10. Item Author Pages Language Processor Memory Storage Harry Potter and the Sorcerer’s Stone J.K. Rowling 309 English ??? ??? ??? Game of Thrones: A Song of Ice and Fire George R.R. Martin 864 English ??? ??? ??? Lenovo Thinkpad X1 Carbon ??? ??? ??? Core i7 3.3ghz 8 GB 256 GB SSD
  11. 11. Item Author Pages Language Harry Potter and the Sorcerer’s Stone J.K. Rowling 309 English Game of Thrones: A Song of Ice and Fire George R.R. Martin 864 English Item CPU Memory Storage Lenovo Thinkpad X1 Carbon Core i7 3.3ghz 8 GB 256 GB SSD
  12. 12. ProductId Item 1 Harry Potter and the Sorcerer’s Stone 2 Game of Thrones: A Song of Ice and Fire 3 Lenovo Thinkpad X1 Carbon ProductId Attribute Value 1 Author J.K. Rowling 1 Pages 309 … 2 Author George R.R. Martin 2 Pages 864 … 3 Processor Core i7 3.3ghz 3 Memory 8 GB …
  13. 13.    
  14. 14. The Challenge  Scale with expectation of millions of users on Day 1  Deliver real time responsiveness for a lag-free, gaming experience  Highly competitive – high scores and global leaderboards critical More Users, More Problems
  15. 15. The Results  #1 in Apple app store free apps during launch week  >1M downloads  ~1B queries per day  99p queries served under 10ms
  16. 16. How?
  17. 17. Just throw some data in a database!
  18. 18. Just throw some data in a database!
  19. 19. Not that easy…
  20. 20. Why is this such a hard problem?  Caches  Scoreboard keeps updating…  SQL database  Need to shard  Schema and Index Management  Loss of relational benefits  Azure Table Storage  Secondary Indexes  Latency  Throughput
  21. 21. Planet-Scale NoSQL  Horizontal Scaling for storage and throughput  High performance with SSDs and automatic indexing  Operating on a global scale
  22. 22. really painful
  23. 23. Request Unit (RU) is the normalized currency % Memory % IOPS % CPU Replica gets a fixed budget of Request Units Resource Resource set Resource Resource DocumentsSQL sprocs args Resource Resource Predictable Performance
  24. 24. Creating partitioned collections //pre-defined collections DocumentCollection collectionSpec = new DocumentCollection { Id = "Walkers" }; RequestOptions options = new RequestOptions { OfferType = "S3" }; DocumentCollection documentCollection = await client.CreateDocumentCollectionAsync("dbs/" + database.Id, collectionSpec, options); //partitioned collections DocumentCollection collectionSpec = new DocumentCollection { Id = "Walkers" }; collectionSpec.PartitionKey.Paths.Add(“/walkerId”); int collectionThroughput = 100000; RequestOptions options = new RequestOptions { OfferThroughput = collectionThroughput }; DocumentCollection documentCollection = await client.CreateDocumentCollectionAsync("dbs/" + database.Id, collectionSpec, options);
  25. 25. Globally Distributed • Not just for disaster recovery…. DocumentDB is unreasonably highly available • Replicate data across any # of regions of your choice • Low-latency access to your data around the globe • Dynamically configure your write and read regions Azure DocumentDB gives you the ability cheat the speed of light!
  26. 26. Bounded Staleness Session EventualStrong LEFT TO RIGHT  Relaxed consistency => better performance and availability Consistency Level Strong Bounded Staleness Session Eventual Total global order Yes Yes, outside of the “staleness window” No, partial “session” order No Consistent prefix guarantee Yes Yes Yes Yes Monotonic reads Yes Yes, across regions outside of the staleness window and within a region all the time Yes, for the given session No Monotonic writes Yes Yes Yes Yes Read your writes Yes Yes (in the write region) Yes No Strong consistency, High latency Eventual consistency, Low latency 27% 3% 54% 16% Observed Distribution BoundedStalene ss Eventual Session
  27. 27. App defined regional preferences ConnectionPolicy docClientConnectionPolicy = new ConnectionPolicy { ConnectionMode = ConnectionMode.Direct, ConnectionProtocol = Protocol.Tcp }; docClientConnectionPolicy.PreferredLocations.Add(LocationNames.EastUS2); docClientConnectionPolicy.PreferredLocations.Add(LocationNames.WestUS); docClient = new DocumentClient( new Uri("https://myglobaldb.documents.azure.com:443"), "PARvqUuBw2QTO4rRXr6d1GnLCR7VinERcYrBQvDRh6EDTJLOHtZxgjTS4pv8nQv2Lg1QQLBLfO6TVziOZKvYow==", docClientConnectionPolicy);
  28. 28. Automatic Indexing • Index is a union of all the document trees Common structure Terms Postings List/Values $/location/0/ 1, 2 location/0/country/ 1, 2 location/0/city/ 1, 2 0/country/Germany 1, 2 1/country/France 2 … … 0/city/Moscow 2 0/dealers/0 2 http://aka.ms/docdbvldb No need to define secondary indices / schema hints!
  29. 29. Index policies customize index management including storage overhead, throughput and query consistency  range, hash and spatial indexes  included and excluded paths  indexing mode; consistent or lazy  index precision  online, in-place index transformations { "indexingMode": "consistent", "automatic": true, "includedPaths": [ { "path": "/*", "indexes": [ { "kind": "Range", "dataType": "Number", "precision": -1 }, { "kind": "Hash", "dataType": "String", "precision": 3 }, { "kind": "Spatial", "dataType": "Point" } ] } ], "excludedPaths": [] }
  30. 30. -- Nested lookup against index SELECT Books.Author FROM Books WHERE Books.Author.Name = "Leo Tolstoy" -- Transformation, Filters, Array access SELECT { Name: Books.Title, Author: Books.Author.Name } FROM Books WHERE Books.Price > 10 AND Books.Languages[0] = "English" -- Joins, User Defined Functions (UDF) SELECT CalculateRegionalTax(Books.Price, "USA", "WA") FROM Books JOIN LanguagesArr IN Books.Languages WHERE LanguagesArr.Language = "Russian" SQL Query Grammar
  31. 31. function(playerId1, playerId2) { var playersToSwap = __.filter (function (document) { return (document.id == playerId1 || document.id == playerId2); }); var player1 = playersToSwap[0], player2 = playersToSwap[1]; var player1ItemTemp = player1.item; player1.item = player2.item; player2.item = player1ItemTemp; __.replaceDocument(player1) .then(function() { return __.replaceDocument(player2); }) .fail(function(error){ throw 'Unable to update players, abort'; }); } client.executeStoredProcedureAsync ("procs/1234", ["MasterChief", "SolidSnake“]) .then(function (response) { console.log(“success!"); }, function (err) { console.log("Failed to swap!", error); } ); Client Database
  32. 32. API and Toolchain Options DocumentDB REST over HTTPS/TCPJava .NET PowerBI
  33. 33. { "id": "1", "firstName": "Thomas", "lastName": "Andersen", "addresses": [ { "line1": "100 Some Street", "line2": "Unit 1", "city": "Seattle", "state": "WA", "zip": 98012 } ], "contactDetails": [ {"email: "thomas@andersen.com"}, {"phone": "+1 555 555-5555", "extension": 5555} ] } Try model your entity as a self- contained document Generally, use embedded data models when: contains one-to-few changes infrequently won’t grow integral better read performance
  34. 34. In general, use normalized data models when: Write performance one-to-many many-to-many changes frequently { "id": "xyz", "username: "user xyz" } { "id": "address_xyz", "userid": "xyz", "address" : { … } } { "id: "contact_xyz", "userid": "xyz", "email" : "user@user.com" "phone" : "555 5555" } Normalizing typically provides better write performance
  35. 35. No magic bullet Think about how your data is going to be written, read and model accordingly { "id": "1", "firstName": "Thomas", "lastName": "Andersen", "countOfBooks": 3, "books": [1, 2, 3], "images": [ {"thumbnail": "http://....png"} {"profile": "http://....png"} ] } { "id": 1, "name": "DocumentDB 101", "authors": [ {"id": 1, "name": "Thomas Andersen", "thumbnail": "http://....png"}, {"id": 2, "name": "William Wakefield", "thumbnail": "http://....png"} ] }

Hinweis der Redaktion

  • Image  licensed under the Creative Commons Attribution-Share Alike 2.0 Generic license.
    http://commons.wikimedia.org/wiki/File:Crying-girl.jpg
  • Well nested, multiple properties and values
  • Not word documents
  • Well nested, multiple properties and values
  • Query over heterogeneous documents without defining schema or managing indexes

    Query arbitrary paths, properties and values without specifying secondary indexes or indexing hints
    Execute queries with consistent results in the face of sustained writes
    Query through fluent language integration including LINQ for .NET developers and a “document oriented“ SQL grammar for traditional SQL developers
    Extend query execution through application supplied JavaScript UDFs
    Supported SQL features include; predicates, iterations (arrays), sub-queries, logical operators, UDFs, intra-document JOINs, JSON transforms
  • Stored Procedures and Triggers
    Familiar programming model constructs for executing application logic
    Registered as named, URI addressable, durable resources
    Scoped to a DocumentDB collection
    JavaScript as a procedural language to express business logic
    Language integration
    JavaScript throw statement results into aborting the transaction
    Execution
    JavaScript runtime is hosted on each replica
    Pre-compiled on registration
    The entire procedure is wrapped in an implicit database transaction
    Fully resource governed and sandboxed execution

  • Stored Procedures and Triggers
    Familiar programming model constructs for executing application logic
    Registered as named, URI addressable, durable resources
    Scoped to a DocumentDB collection
    JavaScript as a procedural language to express business logic
    Language integration
    JavaScript throw statement results into aborting the transaction
    Execution
    JavaScript runtime is hosted on each replica
    Pre-compiled on registration
    The entire procedure is wrapped in an implicit database transaction
    Fully resource governed and sandboxed execution

  • Stored Procedures and Triggers
    Familiar programming model constructs for executing application logic
    Registered as named, URI addressable, durable resources
    Scoped to a DocumentDB collection
    JavaScript as a procedural language to express business logic
    Language integration
    JavaScript throw statement results into aborting the transaction
    Execution
    JavaScript runtime is hosted on each replica
    Pre-compiled on registration
    The entire procedure is wrapped in an implicit database transaction
    Fully resource governed and sandboxed execution

  • Source: http://en.wikipedia.org/wiki/Denormalization

    In computing, denormalization is the process of attempting to optimize the read performance of a database by adding redundant data or by grouping data.[1][2] In some cases, denormalization is a means of addressing performance or scalability in relational database software.
  • With DocumentDB, you can choose to also use a hybrid model that to mimic advantages of normalization.
  • With DocumentDB, you can choose to also use a hybrid model that to mimic advantages of normalization.

×