Nesta sessão vamos analisar as características deste serviço fazer uma breve introdução à arquitectura que a suporta. Iremos verificar as considerações que devem ser tidas em conta na criação e utilização deste tipo de armazenamento, analisando o impacto que as decisões tomadas têm no que respeita a performance e objectivos de escalabilidade.
Serão ainda mostrados alguns exemplos de utilização em cenários distintos, incluindo algumas optimizações que se podem fazer para melhorar a performance.
Comunidade NetPonto, a comunidade .NET em Portugal!
http://netponto.org
14. Extent Nodes (EN)
Front End Layer FE
Incoming Write Request
Partition
Server
Partition
Server
Partition
Server
Partition
Server
Partition
Master
FE FE FE FE
Lock
Service
Ack
Partition Layer
Stream Layer
37. Common Design & Scalability
Access pattern lexically sorted by
Partition Key values
38. Common Design & Scalability
• Turn on analytics & take control of your investigations– Logging and Metrics
• Who deleted my container? – Look at the client IP for delete container request
• Why is my request latency increased? - Look at E2E vs. Server latency
• What is my user demographics? – Use client request id to trace requests & client IP
• How can I tune my service usage? – Use metrics to analyze API usage & peak traffic
stats
• And many more…
• Use appropriate retry policy for intermittent errors
• Storage client uses exponential retry by default
42. 0
20
40
60
80
100
120
140
160
0
5
10
15
20
25
30
35
40
Storage Client 1.7 Storage Client 2.0 :
DataServices
Storage Client 2.0 :
Reflection
Storage Client 2.0 : No
Reflection
Time(ms)
Batch Stress Scenario Per Entity Latencies
Delete
Query
Insert
Processor Time (s)
Test Duration (s)
Faster NoSQL table access
Upto 72.06% reduction in execution time
Upto 31.92% reduction in processor time
Upto 69-90% reduction in latency
43. 0
5,000
10,000
15,000
20,000
25,000
30,000
Storage Client 1.7 Storage Client 2.0
Time(s)
Large Blob Scenario (256MB) Resource
Utilization
Total Test Time (s)
Total Processor Time (s)
0
10
20
30
40
50
60
70
Storage Client 1.7 Storage Client 2.0
Time(s)
Large Blob Scenario (256MB) Latencies
Upload
Download
Faster uploads and downloads
31.46% reduction in processor time
Upto 22.07% reduction in latency
47. Próximas reuniões presenciais
23/03/2013 – Março (Lisboa)
20/04/2013 – Abril (Lisboa)
22/06/2013 – Junho (Lisboa)
??/??/2013 – ? (Porto)
??/??/2013 – ? (Coimbra)
Reserva estes dias na agenda! :)
Slide Objectives:Explain the different Storage Libraries and languages that can be used to work with Windows Azure Storage. VALUE PROPProgrammatic access to the Blob, Queue, and Table services is available via the Windows Azure client libraries and the Windows Azure storage services REST API.Speaking Points:Windows Azure is an open cloud platform that enables you to quickly build, deploy and manage applications across a global network of Microsoft-managed datacenters.You can build applications using any language, tool or framework.Notes:
Slide ObjectivesUnderstand TablesVALUE PROPEnable customers to easily migrate, maintain, and monitor their existing SQL Server applications to Windows Azure VM role, and run them with competitive reliability, performance, and TCO characteristics.Speaker NotesThe Table service provides structured storage in the form of tables. The Table service supports a REST API that is compliant with the ADO.NET Data Services REST API. Developers may also use the .NET Client Library for ADO.NET Data Services to access the Table service.NotesWithin a storage account, a developer may create named tables. Tables store data as entities. An entity is a collection of named properties and their values, similar to a row. Tables are partitioned to support load balancing across storage nodes. Each table has as its first property a partition key that specifies the partition an entity belongs to. The second property is a row key that identifies an entity within a given partition. The combination of the partition key and the row key forms a primary key that identifies each entity uniquely within the table.
Slide ObjectivesUnderstand Flexible EntitiesVALUE PROPEnable customers to easily migrate, maintain, and monitor their existing SQL Server applications to Windows Azure VM role, and run them with competitive reliability, performance, and TCO characteristics.Speaker NotesTables store data as entities. A table can contain entities of any shapeThere is no fixed schemaThere is no schema checkingThere is no strong typing- not that Birthdate is stored as both a datetime value and as a stringNot that we can add additional columnsNoteshttp://msdn.microsoft.com/en-us/library/dd573356.aspx
Slide ObjectivesUnderstand the Windows Azure Storage scalability modelVALUE PROPWindows Azure Storage scales automatically to provide the best performanceSpeaker NotesFanout is automatic, handles by Windows AzureThe key here is “elasticity”. The ability to automatically scale based on load.Fanout is based on the load. Fanout isn’t immediate…Windows Azure will wait several seconds to ensure that the load is a true load and not just a temporary spikePartitioning is based on Partition Key – the choice of the partition key is criticalPartitions can be condensed when load increasesReads are load balanced against the three replicasNotes
Slide ObjectivesUnderstand the importance of Windows Azure Table scalability model and how Partition Key and Row Key are critical for table scalabilityVALUE PROPEnable customers to easily migrate, maintain, and monitor their existing SQL Server applications to Windows Azure VM role, and run them with competitive reliability, performance, and TCO characteristics.Speaker NotesTable entities represent the units of data stored in a table and are similar to rows in a typical relational database table. Each entity defines a collection of properties. Each property is key/value pair defined by its name, value, and the value's data type. Entities must define the following three system properties as part of the property collection:PartitionKey – The PartitionKey property stores string values that identify the partition that an entity belongs to. This means that entities with the same PartitionKey values belong in the same partition. Partitions, as discussed later, are integral to the scalability of the table.RowKey – The RowKey property stores string values that uniquely identify entities within each partition.NotesTables are partitioned to support load balancing across storage nodes. A table's entities are organized by partition. A partition is a consecutive range of entities possessing the same partition key value. The partition key is a unique identifier for the partition within a given table, specified by the PartitionKey property. The partition key forms the first part of an entity's primary key. The partition key may be a string value up to 1 KB in size.
Slide ObjectiveMore detail that Discusses horizontal partitioning in Windows Azure Table storageSpeaking notesUnderstanding the sequential nature of cross partition queries is importantContinuation tokens may be returned at any time (i.e. data comes back in multiple pages)You will always get a continuation token if you cross a hardware boundary- i.e. you move between partitions that sit on different nodesThe Storage API handles continuation tokens elegantly, but, it may mask a poor architecture- YOU DO NOT WANT TO RUN A QUERY THAT CROSSES HUNDRED OF SERVERS!Be aggressive with partitioning- if you’ll only ever query something by a single key use an empty Row key and a unique partition key for a partition of 1.Can also just use blob storage which is already partitioned by Blob nameNotesQueue storage is partitioned by Queue nameBlob storage is partitioned by Bob name (i.e. partition size of 1)http://www.syringe.net.nz/2009/08/08/SimplePartitioningWithWindowsAzureTableStorage.aspxhttp://nmackenzie.spaces.live.com/Blog/cns!B863FF075995D18A!417.entry Good article from Julie Lerman. Worth reading when discussing table storagehttp://msdn.microsoft.com/en-us/magazine/ff796231.aspx
Slide ObjectiveUnderstand why we need to partitionUnderstand the cloud specific driversSpeaking notesPartitioning is hardly a new topicDBAs have been partitioning databases for a long long timeTwo main reasons to partition Data volume.There are just too many bytes to fit.For example SQL Azure has a maximum DB size of 50GB. If you have more data than that then you’ll need to partitionWork loadEach partition can only handle so many transactions per secondIn Windows Azure tables for example partitioning is used to spread the request load over nodes in the storage systemThere are some new cloud focussed reasons tooCostDifferent types of storage have different costsArguably we’ve been doing cost driven partitioning on premise for some time too- for example partitioning a table across both expensive 15k RPM drives and cheaper 7200 RPM drivesIn the cloud the cost difference can be far more pronouncedThe cloud also provides a concept of elastic partitioningWhereas on premise a partition is often a separate server or separate disks with the related capital cost and lead timeA partition in the cloud can be created and destroyed in a matter of secondsThis presents the opportunity to create partitions just for a short period of time- say a period of peak loadNotes
Slide ObjectiveDiscusses how to choose a partition keySpeaking notesNatural keys are often very good for partitioning.For example you may choose to break up data by geographical regionNatural keys can also cause problemsPartitioning by things like first letter last name can be badYour ‘S’ partition will be too full and your ‘Z’ partition will be all but empty… unless you’re in an Asian country where the opposite is trueYou may want to use a mathematical operator to assist in partitioningWe’ll discuss these shortlyFinally you may want to use a lookup tableYou may for example in an SaaS application partition each customer into their own database and then lookup the database to use at runtime based on the host header that was used to visit the site NotesSQL Azure Horizontal partitioninghttp://blogs.msdn.com/b/sqlazure/archive/2010/06/24/10029719.aspx
Slide ObjectiveDescribes Modulo partitioning Speaking notesThe module operator is very useful for partitioning exercisesThe important thing here is having a good distributionNoteshttp://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/985a3198-ba54-4dcc-932c-0e6bdb166a46
Slide ObjectiveDiscusses how to choose a partition keySpeaking notesNatural keys are often very good for partitioning.For example you may choose to break up data by geographical regionNatural keys can also cause problemsPartitioning by things like first letter last name can be badYour ‘S’ partition will be too full and your ‘Z’ partition will be all but empty… unless you’re in an Asian country where the opposite is trueYou may want to use a mathematical operator to assist in partitioningWe’ll discuss these shortlyFinally you may want to use a lookup tableYou may for example in an SaaS application partition each customer into their own database and then lookup the database to use at runtime based on the host header that was used to visit the site NotesSQL Azure Horizontal partitioninghttp://blogs.msdn.com/b/sqlazure/archive/2010/06/24/10029719.aspx
Slide ObjectiveDescribes the challenge of managing partitions over timeSpeaking notesAs applications grow and change so may our partitioning needsHow do we deal with thisWhat happens if we need to re-partition our data?We will need to process it into a new partitioning schemeWe can also version our partitioning scheme such that our partition keys include an identifier to resolve the partition scheme to be usedIN the example above we’ll end up with 14 partitions- 4 for the v1 scheme, 10 for the v2 scheme Notes
Slide ObjectiveThe next few slides build on each otherRun through the worked exampleSpeaking notesSuppose we want to build a tweet search engineTwitter creates quite a bit of data; it’s well suited to storing in Windows Azure tablesIn SQL land we might start with a simple like query. This table scans every time…. We soon realize this is no goodNotesSee also SririamKrishnans Programming Windows Azure title from O’Reilly which contains a more detailed example of this
Next we’d probably pull the words out into a separate table, i.e. spit each tweet into separate wordsWe’d soon realize that we could collapse the Word table back into the index as we’d end up in a situation where the primary keys on the associative table were longer than the word itself- so we’re better to duplicate the word as rows in the word table
IN Windows Azure tables we take this one step further.We basically use worker roles to create indexes for usSo in the above example I canRetrieve all the Tweets made y a certain user by querying the Tweet table and including the user ID (there is a partition per user)Retrieve all the Tweets that contain a particular word by querying from the TweetIndex table and including the Word (there is a partition per word)
We may the choose to create a MentionIndex where the data is not partitioned by the person who wrote the tweet but rather by the person(s) who were mentioned in a tweet. If a tweet mentions 4 users it’ll appear 4 times in the MentionIndex table in four different partitions
Slide ObjectiveProvide some final notes on Tables data modeling Speaking notesThere are no secondary indexes so querying on any variable other than the Row key will result in a partition scan- keep partitions of manageable size for thisYou should ALWAYS include the partition key in your queries- build your data model top support thisIf you are building your own indexes then you can often include related data if it is small enough- Tweets are conveniently small for our example!NotesSee also SririamKrishnans Programming Windows Azure title from O’Reilly which contains a more detailed example of this