NoSQL em Windows Azure Table Storage - Vitor Tomaz

•

1 gefällt mir•1,012 views

Nesta sessão vamos analisar as características deste serviço fazer uma breve introdução à arquitectura que a suporta. Iremos verificar as considerações que devem ser tidas em conta na criação e utilização deste tipo de armazenamento, analisando o impacto que as decisões tomadas têm no que respeita a performance e objectivos de escalabilidade. Serão ainda mostrados alguns exemplos de utilização em cenários distintos, incluindo algumas optimizações que se podem fazer para melhorar a performance. Comunidade NetPonto, a comunidade .NET em Portugal! http://netponto.org

Technologie

NoSQL em Windows Azure Table Storage
Vítor Tomaz
http://netponto.org37ª Reunião Presencial @ Lisboa - 23/03/2013

Vítor Tomaz
ISEL – LEIC
SAFIRA
NetPonto
AzurePT
Revista Programar
Portugal@Programar
SQLPort
MSDN

Agenda
• Characteristics & Concepts
• Service Architecture
• Scalability Targets
• Non-Relational Data Modeling
• Best Practices

Extent Nodes (EN)
Front End Layer FE
Incoming Write Request
Partition
Server
Partition
Server
Partition
Server
Partition
Server
Partition
Master
FE FE FE FE
Lock
Service
Ack
Partition Layer
Stream Layer

Scalability Targets -Storage Account
Geo Redundant
Locally Redundant

You’d soon realize that LIKE isn’t so wonderful.
You’d do a little normalization

Common Design & Scalability
Access pattern lexically sorted by
Partition Key values

Common Design & Scalability
• Turn on analytics & take control of your investigations– Logging and Metrics
• Who deleted my container? – Look at the client IP for delete container request
• Why is my request latency increased? - Look at E2E vs. Server latency
• What is my user demographics? – Use client request id to trace requests & client IP
• How can I tune my service usage? – Use metrics to analyze API usage & peak traffic
stats
• And many more…
• Use appropriate retry policy for intermittent errors
• Storage client uses exponential retry by default

0
20
40
60
80
100
120
140
160
0
5
10
15
20
25
30
35
40
Storage Client 1.7 Storage Client 2.0 :
DataServices
Storage Client 2.0 :
Reflection
Storage Client 2.0 : No
Reflection
Time(ms)
Batch Stress Scenario Per Entity Latencies
Delete
Query
Insert
Processor Time (s)
Test Duration (s)
Faster NoSQL table access
Upto 72.06% reduction in execution time
Upto 31.92% reduction in processor time
Upto 69-90% reduction in latency

0
5,000
10,000
15,000
20,000
25,000
30,000
Storage Client 1.7 Storage Client 2.0
Time(s)
Large Blob Scenario (256MB) Resource
Utilization
Total Test Time (s)
Total Processor Time (s)
0
10
20
30
40
50
60
70
Storage Client 1.7 Storage Client 2.0
Time(s)
Large Blob Scenario (256MB) Latencies
Upload
Download
Faster uploads and downloads
31.46% reduction in processor time
Upto 22.07% reduction in latency

http://blogs.msdn.com/b/windowsazurestorage/
https://www.windowsazure.com/en-us
/develop/overview/
https://www.windowsazure.com/en-us
/pricing/details

Próximas reuniões presenciais
23/03/2013 – Março (Lisboa)
20/04/2013 – Abril (Lisboa)
22/06/2013 – Junho (Lisboa)
??/??/2013 – ? (Porto)
??/??/2013 – ? (Coimbra)
Reserva estes dias na agenda! :)

Patrocinador “GOLD”
Twitter: @PTMicrosoft http://www.microsoft.com/portugal

Obrigado!
Vítor Tomaz
vitorbstomaz AT gmail.com
http://twitter.com/vitortomaz

Empfohlen

[NetPonto] NoSQL em Windows Azure Table StorageVitor Tomaz

How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionElasticsearch

Migrating a legacy logging system: Etsy’s journey to Elastic CloudElasticsearch

Virtual Global Azure 2020 - Azure MonitorPedro Sousa

Elastic at Procter & Gamble: A Network StoryElasticsearch

O monitoramento da infraestrutura facilitado, da ingestão ao insightElasticsearch

What’s Evolving in the Elastic StackElasticsearch

One Azure Monitor to Rule Them All? - Marius ZahariaITCamp

Empfohlen

[NetPonto] NoSQL em Windows Azure Table StorageVitor Tomaz

How KeyBank Used Elastic to Build an Enterprise Monitoring SolutionElasticsearch

Migrating a legacy logging system: Etsy’s journey to Elastic CloudElasticsearch

Virtual Global Azure 2020 - Azure MonitorPedro Sousa

Elastic at Procter & Gamble: A Network StoryElasticsearch

O monitoramento da infraestrutura facilitado, da ingestão ao insightElasticsearch

What’s Evolving in the Elastic StackElasticsearch

One Azure Monitor to Rule Them All? - Marius ZahariaITCamp

Efficient monitoring and alertingTobias Schmidt

Elastic Stack roadmap deep diveElasticsearch

Closing the door on application performance problemsManageEngine, Zoho Corporation

Using the Cloud for Mobile, Social, and Games - RightScale Compute 2013RightScale

Presto: Fast SQL on EverythingDavid Phillips

Monitoring real-life Azure applications: When to use what and whyKarl Ots

Machine Learning for Anomaly Detection, Time Series Modeling, and MoreElasticsearch

"What database can tell about application issues? What application can tell a...Fwdays

Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...Lucidworks

Migrating from RDBMS to MongoDB Atlas - Texas American Resources Company (TARC)MongoDB

Real-Time Vote Platform BenchmarkLahav Savir

TechDays NL 2016 - Building your scalable secure IoT Solution on AzureTom Kerkhove

Keynote : évolution et vision d'Elastic ObservabilityElasticsearch

Cloud applications monitoring in digital transformation eraManageEngine, Zoho Corporation

Azure Stream Analytics - WebinarHARIHARAN R

Site24x7 Cloud MonitoringSite24x7

Mastering Azure MonitorRichard Conway

Beyond the Basics 1: Storage Engines MongoDB

Getting started with apache flink streaming apiPreetdeep Kumar

Architectural Best Practices to Master + Pitfalls to Avoid (P) Elasticsearch

Microsoft Azure - введение в основные сервисы для разработки и инфраструктуры...Microsoft

Estratégias de Estruturação de Código-fonte e Controlo de VersãoComunidade NetPonto

Weitere ähnliche Inhalte

Was ist angesagt?

Efficient monitoring and alertingTobias Schmidt

Elastic Stack roadmap deep diveElasticsearch

Closing the door on application performance problemsManageEngine, Zoho Corporation

Using the Cloud for Mobile, Social, and Games - RightScale Compute 2013RightScale

Presto: Fast SQL on EverythingDavid Phillips

Monitoring real-life Azure applications: When to use what and whyKarl Ots

Machine Learning for Anomaly Detection, Time Series Modeling, and MoreElasticsearch

"What database can tell about application issues? What application can tell a...Fwdays

Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...Lucidworks

Migrating from RDBMS to MongoDB Atlas - Texas American Resources Company (TARC)MongoDB

Real-Time Vote Platform BenchmarkLahav Savir

TechDays NL 2016 - Building your scalable secure IoT Solution on AzureTom Kerkhove

Keynote : évolution et vision d'Elastic ObservabilityElasticsearch

Cloud applications monitoring in digital transformation eraManageEngine, Zoho Corporation

Azure Stream Analytics - WebinarHARIHARAN R

Site24x7 Cloud MonitoringSite24x7

Mastering Azure MonitorRichard Conway

Beyond the Basics 1: Storage Engines MongoDB

Getting started with apache flink streaming apiPreetdeep Kumar

Architectural Best Practices to Master + Pitfalls to Avoid (P) Elasticsearch

Was ist angesagt? (20)

Efficient monitoring and alerting

Elastic Stack roadmap deep dive

Closing the door on application performance problems

Using the Cloud for Mobile, Social, and Games - RightScale Compute 2013

Presto: Fast SQL on Everything

Monitoring real-life Azure applications: When to use what and why

Machine Learning for Anomaly Detection, Time Series Modeling, and More

"What database can tell about application issues? What application can tell a...

Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...

Migrating from RDBMS to MongoDB Atlas - Texas American Resources Company (TARC)

Real-Time Vote Platform Benchmark

TechDays NL 2016 - Building your scalable secure IoT Solution on Azure

Keynote : évolution et vision d'Elastic Observability

Cloud applications monitoring in digital transformation era

Azure Stream Analytics - Webinar

Site24x7 Cloud Monitoring

Mastering Azure Monitor

Beyond the Basics 1: Storage Engines

Getting started with apache flink streaming api

Architectural Best Practices to Master + Pitfalls to Avoid (P)

Andere mochten auch

Microsoft Azure - введение в основные сервисы для разработки и инфраструктуры...Microsoft

Estratégias de Estruturação de Código-fonte e Controlo de VersãoComunidade NetPonto

The power of templating.... with NVelocity - Nuno CanceloComunidade NetPonto

ASP.Net Performance – A pragmatic approach - Luis PaulinoComunidade NetPonto

MVVM Light e Cimbalino Toolkits - Sara SilvaComunidade NetPonto

Criando aplicações para windows phone 8.1 e windows 8.1 com o app studio da...Comunidade NetPonto

Deep dive into Windows Azure Mobile Services - Ricardo CostaComunidade NetPonto

ASP.NET Signal R - Glauco GodoiComunidade NetPonto

Andere mochten auch (8)

Microsoft Azure - введение в основные сервисы для разработки и инфраструктуры...

Estratégias de Estruturação de Código-fonte e Controlo de Versão

The power of templating.... with NVelocity - Nuno Cancelo

ASP.Net Performance – A pragmatic approach - Luis Paulino

MVVM Light e Cimbalino Toolkits - Sara Silva

Criando aplicações para windows phone 8.1 e windows 8.1 com o app studio da...

Deep dive into Windows Azure Mobile Services - Ricardo Costa

ASP.NET Signal R - Glauco Godoi

Ähnlich wie NoSQL em Windows Azure Table Storage - Vitor Tomaz

Serverless SQLTorsten Steinbach

SQL Explore 2012: P&T Part 1sqlserver.co.il

Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsZohar Elkayam

Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Denny Lee

Service quality monitoring system architectureMatsuo Sawahashi

Why & how to optimize sql server for performance from design to queryAntonios Chatzipavlis

Dealing with and learning from the sandboxElaine Van Bergen

Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media

Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...Amazon Web Services

SharePoint 2013 Performance Analysis - Robi VončinaSPC Adriatics

Dealing with and learning from the sandboxElaine Van Bergen

FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services

Presentation cloud control enterprise manager 12cxKinAnx

Remote DBA Experts SQL Server 2008 New FeaturesRemote DBA Experts

Boosting the Performance of your Rails AppsMatt Kuklinski

Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...Datapolis

What's new in JBoss ON 3.2Thomas Segismont

AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...Amazon Web Services

Data exposure in Azure - production use-caseAlexander Laysha

IBM Cloud Native Day April 2021: Serverless Data LakeTorsten Steinbach

Ähnlich wie NoSQL em Windows Azure Table Storage - Vitor Tomaz (20)

Serverless SQL

SQL Explore 2012: P&T Part 1

Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs

Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...

Service quality monitoring system architecture

Why & how to optimize sql server for performance from design to query

Dealing with and learning from the sandbox

Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day

Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...

SharePoint 2013 Performance Analysis - Robi Vončina

Dealing with and learning from the sandbox

FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud

Presentation cloud control enterprise manager 12c

Remote DBA Experts SQL Server 2008 New Features

Boosting the Performance of your Rails Apps

Datapolis Guest Expert Presentation: Top 15 SharePoint Server Configuration M...

What's new in JBoss ON 3.2

AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...

Data exposure in Azure - production use-case

IBM Cloud Native Day April 2021: Serverless Data Lake

Mehr von Comunidade NetPonto

Continuous Delivery for Desktop Applications: a case study - Miguel Alho & Jo...Comunidade NetPonto

HTML5 - Pedro RosaComunidade NetPonto

De Zero a Produção - João JesusComunidade NetPonto

OData – Super Cola W3Comunidade NetPonto

Como deixar de fazer "copy and paste" entre Windows Store e Windows Phone AppsComunidade NetPonto

Case studies about Layout & View States & Scale in Windows 8 Store AppsComunidade NetPonto

Aspect-oriented Programming (AOP) com PostSharpComunidade NetPonto

Utilização de Mock Objects em Testes UnitáriosComunidade NetPonto

Dinâmica e Motivacao de Equipas de ProjectoComunidade NetPonto

KnockoutJS com ASP.NET MVC3: Utilização na vida realComunidade NetPonto

Como ser programador durante o dia e mesmo assim dormir bem à noiteComunidade NetPonto

Windows 8: Desenvolvimento de Metro Style Apps - C. Augusto ProieteComunidade NetPonto

Uma Introdução a ASP.NET Web APIComunidade NetPonto

Como não entalar os dedos nas janelas: Finger-based apps no Windows 8Comunidade NetPonto

Sessão Especial: PowerPivot com Alberto FerrariComunidade NetPonto

NuGet no Contexto EmpresarialComunidade NetPonto

Arquitectura dos Serviços da plataforma Windows AzureComunidade NetPonto

Developer 0.0 - Tiago PascoalComunidade NetPonto

Kentico CMS 6Comunidade NetPonto

VSTO + LOB Apps Information MattersComunidade NetPonto

Mehr von Comunidade NetPonto (20)

Continuous Delivery for Desktop Applications: a case study - Miguel Alho & Jo...

HTML5 - Pedro Rosa

De Zero a Produção - João Jesus

OData – Super Cola W3

Como deixar de fazer "copy and paste" entre Windows Store e Windows Phone Apps

Case studies about Layout & View States & Scale in Windows 8 Store Apps

Aspect-oriented Programming (AOP) com PostSharp

Utilização de Mock Objects em Testes Unitários

Dinâmica e Motivacao de Equipas de Projecto

KnockoutJS com ASP.NET MVC3: Utilização na vida real

Como ser programador durante o dia e mesmo assim dormir bem à noite

Windows 8: Desenvolvimento de Metro Style Apps - C. Augusto Proiete

Uma Introdução a ASP.NET Web API

Como não entalar os dedos nas janelas: Finger-based apps no Windows 8

Sessão Especial: PowerPivot com Alberto Ferrari

NuGet no Contexto Empresarial

Arquitectura dos Serviços da plataforma Windows Azure

Developer 0.0 - Tiago Pascoal

Kentico CMS 6

VSTO + LOB Apps Information Matters

Kürzlich hochgeladen

PLAI - Acceleration Program for Generative A.I. StartupsStefano

Buy Epson EcoTank L3210 Colour Printer Online.pdfEasyPrinterHelp

Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin

SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin

FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance

The UX of Automation by AJ King, Senior UX Researcher, OcadoUXDXConf

AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379

WebAssembly is Key to Better LLM PerformanceSamy Fodil

THE BEST IPTV in GERMANY for 2024: IPTVreelreely ones

Strategic AI Integration in Engineering TeamsUXDXConf

AI revolution and Salesforce, Jiří KarpíšekCzechDreamin

Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin

Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance

Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel

The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance

How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance

Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin

Speed Wins: From Kafka to APIs in Minutesconfluent

Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance

Optimizing NoSQL Performance Through ObservabilityScyllaDB

Kürzlich hochgeladen (20)

PLAI - Acceleration Program for Generative A.I. Startups

Buy Epson EcoTank L3210 Colour Printer Online.pdf

Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade

SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...

FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...

The UX of Automation by AJ King, Senior UX Researcher, Ocado

AI presentation and introduction - Retrieval Augmented Generation RAG 101

WebAssembly is Key to Better LLM Performance

THE BEST IPTV in GERMANY for 2024: IPTVreel

Strategic AI Integration in Engineering Teams

AI revolution and Salesforce, Jiří Karpíšek

Powerful Start- the Key to Project Success, Barbara Laskowska

Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf

Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx

The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf

How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf

Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom

Speed Wins: From Kafka to APIs in Minutes

Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...

Optimizing NoSQL Performance Through Observability

NoSQL em Windows Azure Table Storage - Vitor Tomaz

1. NoSQL em Windows Azure Table Storage Vítor Tomaz http://netponto.org37ª Reunião Presencial @ Lisboa - 23/03/2013

2. Vítor Tomaz ISEL – LEIC SAFIRA NetPonto AzurePT Revista Programar Portugal@Programar SQLPort MSDN

3. Agenda • Characteristics & Concepts • Service Architecture • Scalability Targets • Non-Relational Data Modeling • Best Practices

7. South Central US West US East US

10. Table Details

11.

12.

13. Service Architecture

14. Extent Nodes (EN) Front End Layer FE Incoming Write Request Partition Server Partition Server Partition Server Partition Server Partition Master FE FE FE FE Lock Service Ack Partition Layer Stream Layer

15.

16.

17. http://tinyurl.com/ContToken

18. Scalability Targets

19. Scalability Targets -Storage Account Geo Redundant Locally Redundant

20. Scalability Targets – Partition

21. Non-Relational Data Modeling

22.

23.

24. :

25.

26.

27.

28. You’d soon realize that LIKE isn’t so wonderful. You’d do a little normalization

29.

30.

31. Entity Group Transactions

32.

33.

34.

35.

36. Best Practices

37. Common Design & Scalability Access pattern lexically sorted by Partition Key values

38. Common Design & Scalability • Turn on analytics & take control of your investigations– Logging and Metrics • Who deleted my container? – Look at the client IP for delete container request • Why is my request latency increased? - Look at E2E vs. Server latency • What is my user demographics? – Use client request id to trace requests & client IP • How can I tune my service usage? – Use metrics to analyze API usage & peak traffic stats • And many more… • Use appropriate retry policy for intermittent errors • Storage client uses exponential retry by default

39. Storage Accounts

40. Storage Accounts

41.

42. 0 20 40 60 80 100 120 140 160 0 5 10 15 20 25 30 35 40 Storage Client 1.7 Storage Client 2.0 : DataServices Storage Client 2.0 : Reflection Storage Client 2.0 : No Reflection Time(ms) Batch Stress Scenario Per Entity Latencies Delete Query Insert Processor Time (s) Test Duration (s) Faster NoSQL table access Upto 72.06% reduction in execution time Upto 31.92% reduction in processor time Upto 69-90% reduction in latency

43. 0 5,000 10,000 15,000 20,000 25,000 30,000 Storage Client 1.7 Storage Client 2.0 Time(s) Large Blob Scenario (256MB) Resource Utilization Total Test Time (s) Total Processor Time (s) 0 10 20 30 40 50 60 70 Storage Client 1.7 Storage Client 2.0 Time(s) Large Blob Scenario (256MB) Latencies Upload Download Faster uploads and downloads 31.46% reduction in processor time Upto 22.07% reduction in latency

44.

45. http://blogs.msdn.com/b/windowsazurestorage/ https://www.windowsazure.com/en-us /develop/overview/ https://www.windowsazure.com/en-us /pricing/details

46. Questões?

47. Próximas reuniões presenciais 23/03/2013 – Março (Lisboa) 20/04/2013 – Abril (Lisboa) 22/06/2013 – Junho (Lisboa) ??/??/2013 – ? (Porto) ??/??/2013 – ? (Coimbra) Reserva estes dias na agenda! :)

48. Patrocinador “GOLD” Twitter: @PTMicrosoft http://www.microsoft.com/portugal

49. Patrocinadores “Silver”

50. Patrocinadores “Bronze”

51. Obrigado! Vítor Tomaz vitorbstomaz AT gmail.com http://twitter.com/vitortomaz

Hinweis der Redaktion

Slide Objectives:Explain the different Storage Libraries and languages that can be used to work with Windows Azure Storage. VALUE PROPProgrammatic access to the Blob, Queue, and Table services is available via the Windows Azure client libraries and the Windows Azure storage services REST API.Speaking Points:Windows Azure is an open cloud platform that enables you to quickly build, deploy and manage applications across a global network of Microsoft-managed datacenters.You can build applications using any language, tool or framework.Notes:
Slide ObjectivesUnderstand TablesVALUE PROPEnable customers to easily migrate, maintain, and monitor their existing SQL Server applications to Windows Azure VM role, and run them with competitive reliability, performance, and TCO characteristics.Speaker NotesThe Table service provides structured storage in the form of tables. The Table service supports a REST API that is compliant with the ADO.NET Data Services REST API. Developers may also use the .NET Client Library for ADO.NET Data Services to access the Table service.NotesWithin a storage account, a developer may create named tables. Tables store data as entities. An entity is a collection of named properties and their values, similar to a row. Tables are partitioned to support load balancing across storage nodes. Each table has as its first property a partition key that specifies the partition an entity belongs to. The second property is a row key that identifies an entity within a given partition. The combination of the partition key and the row key forms a primary key that identifies each entity uniquely within the table.
Slide ObjectivesUnderstand Flexible EntitiesVALUE PROPEnable customers to easily migrate, maintain, and monitor their existing SQL Server applications to Windows Azure VM role, and run them with competitive reliability, performance, and TCO characteristics.Speaker NotesTables store data as entities. A table can contain entities of any shapeThere is no fixed schemaThere is no schema checkingThere is no strong typing- not that Birthdate is stored as both a datetime value and as a stringNot that we can add additional columnsNoteshttp://msdn.microsoft.com/en-us/library/dd573356.aspx
Slide ObjectivesUnderstand the Windows Azure Storage scalability modelVALUE PROPWindows Azure Storage scales automatically to provide the best performanceSpeaker NotesFanout is automatic, handles by Windows AzureThe key here is “elasticity”. The ability to automatically scale based on load.Fanout is based on the load. Fanout isn’t immediate…Windows Azure will wait several seconds to ensure that the load is a true load and not just a temporary spikePartitioning is based on Partition Key – the choice of the partition key is criticalPartitions can be condensed when load increasesReads are load balanced against the three replicasNotes
Slide ObjectivesUnderstand the importance of Windows Azure Table scalability model and how Partition Key and Row Key are critical for table scalabilityVALUE PROPEnable customers to easily migrate, maintain, and monitor their existing SQL Server applications to Windows Azure VM role, and run them with competitive reliability, performance, and TCO characteristics.Speaker NotesTable entities represent the units of data stored in a table and are similar to rows in a typical relational database table. Each entity defines a collection of properties. Each property is key/value pair defined by its name, value, and the value's data type. Entities must define the following three system properties as part of the property collection:PartitionKey – The PartitionKey property stores string values that identify the partition that an entity belongs to. This means that entities with the same PartitionKey values belong in the same partition. Partitions, as discussed later, are integral to the scalability of the table.RowKey – The RowKey property stores string values that uniquely identify entities within each partition.NotesTables are partitioned to support load balancing across storage nodes. A table's entities are organized by partition. A partition is a consecutive range of entities possessing the same partition key value. The partition key is a unique identifier for the partition within a given table, specified by the PartitionKey property. The partition key forms the first part of an entity's primary key. The partition key may be a string value up to 1 KB in size.
Slide ObjectiveMore detail that Discusses horizontal partitioning in Windows Azure Table storageSpeaking notesUnderstanding the sequential nature of cross partition queries is importantContinuation tokens may be returned at any time (i.e. data comes back in multiple pages)You will always get a continuation token if you cross a hardware boundary- i.e. you move between partitions that sit on different nodesThe Storage API handles continuation tokens elegantly, but, it may mask a poor architecture- YOU DO NOT WANT TO RUN A QUERY THAT CROSSES HUNDRED OF SERVERS!Be aggressive with partitioning- if you’ll only ever query something by a single key use an empty Row key and a unique partition key for a partition of 1.Can also just use blob storage which is already partitioned by Blob nameNotesQueue storage is partitioned by Queue nameBlob storage is partitioned by Bob name (i.e. partition size of 1)http://www.syringe.net.nz/2009/08/08/SimplePartitioningWithWindowsAzureTableStorage.aspxhttp://nmackenzie.spaces.live.com/Blog/cns!B863FF075995D18A!417.entry Good article from Julie Lerman. Worth reading when discussing table storagehttp://msdn.microsoft.com/en-us/magazine/ff796231.aspx
Slide ObjectiveUnderstand why we need to partitionUnderstand the cloud specific driversSpeaking notesPartitioning is hardly a new topicDBAs have been partitioning databases for a long long timeTwo main reasons to partition Data volume.There are just too many bytes to fit.For example SQL Azure has a maximum DB size of 50GB. If you have more data than that then you’ll need to partitionWork loadEach partition can only handle so many transactions per secondIn Windows Azure tables for example partitioning is used to spread the request load over nodes in the storage systemThere are some new cloud focussed reasons tooCostDifferent types of storage have different costsArguably we’ve been doing cost driven partitioning on premise for some time too- for example partitioning a table across both expensive 15k RPM drives and cheaper 7200 RPM drivesIn the cloud the cost difference can be far more pronouncedThe cloud also provides a concept of elastic partitioningWhereas on premise a partition is often a separate server or separate disks with the related capital cost and lead timeA partition in the cloud can be created and destroyed in a matter of secondsThis presents the opportunity to create partitions just for a short period of time- say a period of peak loadNotes
Slide ObjectiveDiscusses how to choose a partition keySpeaking notesNatural keys are often very good for partitioning.For example you may choose to break up data by geographical regionNatural keys can also cause problemsPartitioning by things like first letter last name can be badYour ‘S’ partition will be too full and your ‘Z’ partition will be all but empty… unless you’re in an Asian country where the opposite is trueYou may want to use a mathematical operator to assist in partitioningWe’ll discuss these shortlyFinally you may want to use a lookup tableYou may for example in an SaaS application partition each customer into their own database and then lookup the database to use at runtime based on the host header that was used to visit the site NotesSQL Azure Horizontal partitioninghttp://blogs.msdn.com/b/sqlazure/archive/2010/06/24/10029719.aspx
Slide ObjectiveDescribes Modulo partitioning Speaking notesThe module operator is very useful for partitioning exercisesThe important thing here is having a good distributionNoteshttp://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/985a3198-ba54-4dcc-932c-0e6bdb166a46
Slide ObjectiveDiscusses how to choose a partition keySpeaking notesNatural keys are often very good for partitioning.For example you may choose to break up data by geographical regionNatural keys can also cause problemsPartitioning by things like first letter last name can be badYour ‘S’ partition will be too full and your ‘Z’ partition will be all but empty… unless you’re in an Asian country where the opposite is trueYou may want to use a mathematical operator to assist in partitioningWe’ll discuss these shortlyFinally you may want to use a lookup tableYou may for example in an SaaS application partition each customer into their own database and then lookup the database to use at runtime based on the host header that was used to visit the site NotesSQL Azure Horizontal partitioninghttp://blogs.msdn.com/b/sqlazure/archive/2010/06/24/10029719.aspx
Slide ObjectiveDescribes the challenge of managing partitions over timeSpeaking notesAs applications grow and change so may our partitioning needsHow do we deal with thisWhat happens if we need to re-partition our data?We will need to process it into a new partitioning schemeWe can also version our partitioning scheme such that our partition keys include an identifier to resolve the partition scheme to be usedIN the example above we’ll end up with 14 partitions- 4 for the v1 scheme, 10 for the v2 scheme Notes
Slide ObjectiveThe next few slides build on each otherRun through the worked exampleSpeaking notesSuppose we want to build a tweet search engineTwitter creates quite a bit of data; it’s well suited to storing in Windows Azure tablesIn SQL land we might start with a simple like query. This table scans every time…. We soon realize this is no goodNotesSee also SririamKrishnans Programming Windows Azure title from O’Reilly which contains a more detailed example of this
Next we’d probably pull the words out into a separate table, i.e. spit each tweet into separate wordsWe’d soon realize that we could collapse the Word table back into the index as we’d end up in a situation where the primary keys on the associative table were longer than the word itself- so we’re better to duplicate the word as rows in the word table
IN Windows Azure tables we take this one step further.We basically use worker roles to create indexes for usSo in the above example I canRetrieve all the Tweets made y a certain user by querying the Tweet table and including the user ID (there is a partition per user)Retrieve all the Tweets that contain a particular word by querying from the TweetIndex table and including the Word (there is a partition per word)
We may the choose to create a MentionIndex where the data is not partitioned by the person who wrote the tweet but rather by the person(s) who were mentioned in a tweet. If a tweet mentions 4 users it’ll appear 4 times in the MentionIndex table in four different partitions
Slide ObjectiveProvide some final notes on Tables data modeling Speaking notesThere are no secondary indexes so querying on any variable other than the Row key will result in a partition scan- keep partitions of manageable size for thisYou should ALWAYS include the partition key in your queries- build your data model top support thisIf you are building your own indexes then you can often include related data if it is small enough- Tweets are conveniently small for our example!NotesSee also SririamKrishnans Programming Windows Azure title from O’Reilly which contains a more detailed example of this