The document summarizes the architecture of Windows Azure services. It discusses how Windows Azure provides scalable and reliable cloud services through its datacenter infrastructure and platform services. Key points include:
- Windows Azure uses a multi-tenant datacenter architecture with fault domains and update domains to provide high availability.
- The storage architecture uses a partitioned and replicated design to store and access blobs, tables, queues and files in a reliable manner.
- SQL Azure provides a scalable relational database service running on top of the Windows Azure infrastructure, with automatic replication and failover.
5. Agenda
• Introdução
• Arquitectura do Datacenter e Windows Azure
• Arquitectura do Windows Azure Storage
• Arquitectura do SQL Azure
6. Deploying A Service Manually
• Resource allocation
– Machines must be chosen to host roles of the service
– Procure additional hardware if necessary
– IP addresses must be acquired
• Provisioning
– Machines must be setup This is ongoing
– Virtual machines created work…you’re
– Applications configured never done
– DNS setup
– Load balancers must be programmed
• Upgrades
– Locate appropriate machines
– Update the software/settings as necessary
– Only bring down a subset of the service at a time
• Maintaining service health
– Software faults must be handled
– Hardware failures will occur
– Logging infrastructure is provided to diagnose issues
7. Recusos Capacidade
Poucos disponíveis Prevista
recursos
Demasiados
CAPACIDADE
recursos
Capacidade Real
TEMPO
8. Capacidade on Capacidade
Escalabilidade Demand Prevista
CAPACIDADE
Não há recursos Elasticidade
desperdiçados
Baixo
Investimento
Capacidade Real
TEMPO
17. System Center
Windows Azure Portal AppManager
Fabric Controller Fabric Controller Fabric Controller
Datacenter Datacenter Datacenter
18.
19.
20. Datacenter network
Aggregation
Routers and
Load
L AG L
Top of
Balancers B G B
Rack
Switches TOR TOR TOR TOR TOR
…
…
…
…
…
Power PDU PDU PDU PDU PDU
Distribution
Units
21. Datacenter network
Aggregation
TOR
Routers and
Load
Balancers
Ag
g
LB LB
Top of Rack
Switches
TO TO TO
…
R R R
Nodes
Nodes
Nodes
…
PD PD PD
U U U
Power PDU
Distribution
Units
22. Server Datacenter
Kernel Fabric Controller
Process Service
SQL Exchange SQL
Word
Server Online Azure
Server Datacenter
26. Windows
Image Repository Fabric Controller Deployment
Server
Maintenance Parent
Windows Azure RoleRole
RoleRole PXE
OS OS Images
Images
Images
Images Server
Windows
FC
Host Azure
Agent Node
OS
Windows Azure Hypervisor
27. Role B
Worker Role
www.mycloudapp.net Count: 2
Update Domains: 2
Size: Medium
www.mycloudapp.net
Load
Balancer
10.100.0.36 10.100.0.185
10.100.0.122
28. Role: Front-End Role: Middle-Tier
Definition Definition
My Type: Web Type: Worker
VM Size: Large VM Size: Medium
Service Endpoints: External-1 Endpoints: Internal-1
Configuration Configuration
Instances: 3 Instances: 2
Update Domains: 3 Update Domains: 2
Fault Domains: 3 Fault Domains: 2
29.
30.
31. Nó físico
Guest Guest Guest Partition
Partition Partition
Role Role
Role Instance
Instance Instance
Guest Guest
Guest Agent
Agent Agent
Trust boundary
Host Partition
FC Host Image Repository (OS
Agent VHDs, role ZIP files)
Fabric Controller Fabric Controller Fabric Controller
(Primary) (Replica) … (Replica)
34. OS Volume
Resource Volume
Role Volume
Guest Agent
Role Host
Role Entry Point
35. Role B
Worker Role
www.mycloudapp.net Count: 2
Update Domains:
2
Size: Medium
www.mycloudapp.net
Load
Balancer
10.100.0.36 10.100.0.185
10.100.0.122 10.100.0.191
36. Problem How Detected Fabric Response
Role instance crashes FC guest agent monitors role FC restarts role
termination
Guest VM or agent FC host agent notices missing guest FC restarts VM and hosted role
crashes agent heartbeats
Host OS or agent FC notices missing host agent Tries to recover node
crashes heartbeat FC reallocates roles to other nodes
Detected node Host agent informs FC FC migrates roles to other nodes
hardware issue Marks node “out for repair”
37. Fault Domain Fault Domain
Rack Rack
Web Role Web Role
U/G Domain #1
U/G Domain #2
Worker Role Worker Role
U/G Domain #1
U/G Domain #2
40. Production VIP – VIP1 Staging VIP – VIP2
<dnsname>.cloudapp.net <guid>.cloudapp.net
Port Port Port Port Port Port
80 3389 3390 80 3389 3390
Role A Role B Role A’ Role B’
Deployment A Deployment A’
45. Access blob storage via the URL: http://<account>.blob.core.windows.net/
Storage
Data access Location
Service
LB LB
Front-Ends Front-Ends
Partition Layer Partition Layer
Inter-stamp (Geo) replication
Stream Layer Stream Layer
Intra-stamp replication Intra-stamp replication
Storage Stamp Storage Stamp
46.
47. Incoming Write Request
Ack
Front End FE FE FE FE FE
Layer
Partition
Master Lock
Service
Partition Layer
Partition Partition Partition Partition
Server Server Server Server
Stream Layer
Extent Nodes (EN)
49. Partition
Master
Partition Partition Partition Partition
Server Server Server Server
50. Extent Nodes (EN)
• Sistema de ficheiros distibuido e “append-only”
• Os dados são armazenados em ficheiros (extents)
• Todos os extent estão replicados 3 vezes em diferentes fault
e upgrade domains
• Todos os dados passam por Checksum
• Novamente replicado se houver falha de disco/nó/rack ou
checksum
57. Paxos
SM
Create Stream/Extent Stream
SM
Partition Master
Layer EN1 Primary
EN2, EN3 Secondary
Allocate Extent replica set
EN 1 EN 2 EN 3 EN
Primary Secondary A Secondary B
58. Paxos
SM
Stream
EN1 Primary SM
Partition Master
Layer EN2, EN3 Secondary
Append
Ack
EN 1 EN 2 EN 3 EN
Primary Secondary A Secondary B
60. Paxos
Seal Extent
SM
Seal Extent Stream
SM Sealed at 120
Partition Master
Layer
Append 120
120 Ask for current length
EN 1 EN 2 EN 3 EN 4
Primary Secondary A Secondary B
61. Paxos
Seal Extent
SM
Stream
SM Sealed at 120
Partition Master
Layer
Sync with SM
120
EN 1 EN 2 EN 3 EN 4
Primary Secondary A Secondary B
62. Paxos
Seal Extent
SM
Seal Extent SM
SM Sealed at 100
Partition
Layer
Append
Ask for current length
120
100
EN 1 EN 2 EN 3 EN 4
Primary Secondary A Secondary B
63. Paxos
Seal Extent
SM
SM Sealed at 100
Partition SM
Layer
100 Sync with SM
EN 1 EN 2 EN 3 EN 4
Primary Secondary A Secondary B
69. Client Layer
PHP ASP.NET WCF Data Services
OBDC ADO.NET
Tabular Data Stream (TDS)
70. • Verifica os comandos (parser)
TDS
• Handshake SSL
• “Denial of Service” guard
Services • Valida credenciais de acesso
Layer • Valida regras da Firewall
• Mapeia o nome da base de dados
Sessão
TDS
Gateway usado pelo cliente ao nome interno
• Cria a sessão entre a base de dados
física e o cliente
• Fica a fazer de proxy da sessão
71. • Cada nó contêm
Platform Layer • Uma única instância de SQL Server
Node 14
SQL Instance
• Com uma única instância de base de
SQL DB dados
User
DB1
User
DB2
User
DB3
User
DB4 • Com várias partições (até 650)
• Cada partição é uma base de dados SQLAzure
SQL Azure Fabric • Que pode ser primária ou secundária
• Uma instância de SQL Azure Fabric
Node 15
SQL Instance • Failure detection
SQL DB
User User User User
• Reconfiguration Agent
DB1 DB2 DB3 DB4
• Engine Throttling
SQL Azure Fabric
• Ring Topology
• Partition Manager Location Resolution
72. • Failure detection
• Detecta falhas num réplica primária ou secundária de
modo a accionar o Reconfiguration Agent
• Reconfiguration Agent
• Gere o re-estabelecimento de réplicas após falha de um nó
• Engine Throttling
• Gere a utilização dos recursos
• Ring Topology
• Mecanismo de ajuda à detecção de falhas
• Partition Manager Location Resolution
• Gere as comunicações com o Partition Manager
73. • Detecção de falhas
• Topologia lógica em anel lógico faz com que cada
máquina tenha duas máquinas vizinhas que podem
detectar falhas nessa máquina.
• Cada transacção tem que ser commited pela primária e
pelo menos por uma secundária
• Reconfiguração
• Falha de hardware, crash do sistema
operativo, problemas na instância de SQL
Server, actualizações (SO, SQL Server, SQL Azure)
74. • Falha da réplica primária
• Réplica secundária com menos carga passa a primária
• O cliente recebe uma disconnection
• Pode demorar 30 segundos a propagar a mudança aos
gateways
• Falha de uma réplica secundária
• Se a falha for permanente cria uma nova réplica
secundária e copia os dados da primária.
• Esta cópia é uma das principais razões para a limitação
do tamanho das bases de dados em SQL Azure
77. • Customer A using 30% CPU on a machine
• Customer B kicks of load of 70% additional CPU on the same
machine
• Customer B gets throttled
• Customer A using 70% CPU on a machine
• Customer B kicks of load to 30% additional CPU on the same
machine
• Customer A gets throttled
• Machine has no active workload
• Customer A kicks of load to 100% CPU and gets throttled
repeatedly
• Customer A gets throttled
78. select
sum(reserved_page_count)*8.0/1024 AS
[Storage_in_MB]
from
sys.dm_db_partition_stats
79. select
highest_cpu_queries.total_worker_time,
q.text AS [Query_Text],
highest_cpu_queries.plan_handle
from
(select top 50
qs.plan_handle,
qs.total_worker_time
from
sys.dm_exec_query_stats qs
order by qs.total_worker_time desc) as
highest_cpu_queries
cross apply sys.dm_exec_sql_text(plan_handle) as q
order by highest_cpu_queries.total_worker_time desc
80. select top 25
(total_logical_reads/execution_count) as
avg_logical_reads,
(total_logical_writes/execution_count) as
avg_logical_writes,
(total_physical_reads/execution_count) as
avg_phys_reads,
Execution_count,
sql_handle,
plan_handle
from sys.dm_exec_query_stats
order by
(total_logical_reads + total_logical_writes) Desc
Slide Objectives:Explain the differences and relationship between IaaS, PaaS, and SaaS in more detail.Speaking Points:Here’s another way to look at the cloud services taxonomy and how this taxonomy maps to the components in an IT infrastructure. Packaged SoftwareWith packaged software a customer would be responsible for managing the entire stack – ranging from the network connectivity to the applications. IaaSWith Infrastructure as a Service, the lower levels of the stack are managed by a vendor. Some of these components can be provided by traditional hosters – in fact most of them have moved to having a virtualized offering. Very few actually provide an OSThe customer is still responsible for managing the OS through the Applications. For the developer, an obvious benefit with IaaS is that it frees the developer from many concerns when provisioning physical or virtual machines. This was one of the earliest and primary use cases for Amazon Web Services Elastic Cloud Compute (EC2). Developers were able to readily provision virtual machines (AMIs) on EC2, develop and test solutions and, often, run the results ‘in production’. The only requirement was a credit card to pay for the services.PaaSWith Platform as a Service, everything from the network connectivity through the runtime is provided and managed by the platform vendor. The Windows Azure Platform best fits in this category today. In fact because we don’t provide access to the underlying virtualization or operating system today, we’re often referred to as not providing IaaS.PaaS offerings further reduce the developer burden by additionally supporting the platform runtime and related application services. With PaaS, the developer can, almost immediately, begin creating the business logic for an application. Potentially, the increases in productivity are considerable and, because the hardware and operational aspects of the cloud platform are also managed by the cloud platform provider, applications can quickly be taken from an idea to reality very quickly.SaaSFinally, with SaaS, a vendor provides the application and abstracts you from all of the underlying components.
Speaking Points:At PDC10 in just over a month, we will introduce several new services including: Caching and Reporting. We will also have a new CTP for the Data Sync Service and Project Dallas will be finally available. Let’s drill into these services in a bit more detail.--Speaking Points:I suspect most if not all of you in this room are familiar with the Windows Azure Platform today.Today the platform consists of a set of foundational services SQL Azure relational databaseAppFabric provides services that can be used by any apps – hosted in Windows Azure, on-premises, or hosted in another environment. Questions:How many of you are building applications for Windows Azure?How many are using SQL Azure?How many are using the Access Control service today? The Service Bus?Notes:Windows Azure StoryWe are building an open platform to run your applications in the cloud. Your apps are .NET, Java, PHP, etc. We love everyone.We are going to help you migrate your existing apps to the cloud. The cloud platform is the future. Enables scale, self-service, lowers friction, etc. We provide the best cloud platform for building new apps. (aka n-tier, web services, etc.)
Slide ObjectiveUse this slide to transition into an explanation of SQL Azure Database (Reporting and Data Sync will be covered later)Explain at a high level how SQL Azure worksSpeaker NotesDesign Principle of SQL Azure: Focus on combining the best features of SQL Server running at scale with low frictionSQL Azure is a high availability databaseAlways three transaction consistent replicas of the databaseOne primary replica; two slave replicasFailure of a replica will result in another replica being spun up immediately by the fabricFailure of the primary replica means a slave replica will become the primary and a new slave will spin upMinimal down timeTypically just a few dropped connectionsEasy to code for the failover scenario- if you are ding god connection management and error handling will be fineClustered index required on all tables to allow replicationNotesUseful article from SQL Azure teamhttp://msdn.microsoft.com/en-us/magazine/ee321567.aspx