2. Microsoft アカウント
Microsoft Azure 無料サブスクリプション
Visual Studio 無料サブスクリプション
Python SDK Windows版 のインストール
Python SDK Mac のインストール
Microsoft Developer Network
Azure コマンドラインインタフェース のインストール
AzCopy コマンドライン ツール (Azure Storage 用)
ストレージエクスプローラー・Windows版 のインストール
ストレージエクスプローラー・Mac版 のインストール
3. Platform Services
Infrastructure Services
Web Apps
Mobile
Apps
API
Management
API Apps
Logic Apps
Notification
Hubs
Content
Delivery
Network (CDN)
Media
Services
BizTalk
Services
Hybrid
Connections
Service Bus
Storage
Queues
Hybrid
Operations
Backup
StorSimple
Azure Site
Recovery
Import/Export
SQL
Database
DocumentDB
Redis
Cache
Azure
Search
Storage
Tables
Data
Warehouse Azure AD
Health Monitoring
AD Privileged
Identity
Management
Operational
Analytics
Cloud
Services
Batch
RemoteApp
Service
Fabric
Visual Studio
App
Insights
Azure
SDK
VS Online
Domain Services
HDInsight Machine
Learning
Stream
Analytics
Data
Factory
Event
Hubs
Mobile
Engagement
Data
Lake
IoT Hub
Data
Catalog
Security &
Management
Azure Active
Directory
Multi-Factor
Authentication
Automation
Portal
Key Vault
Store/
Marketplace
VM Image Gallery
& VM Depot
Azure AD
B2C
Scheduler
The Azure Platform
4. 様々なアプリ開発が行われています
Web & mobile Business apps Microservice apps
Development & test Big data & analytics Internet of Things
Backup, recovery
& archive
High Performance
Computing
Digital media
21. Azure SQL Database Query Performance Insight
https://docs.microsoft.com/ja-jp/azure/sql-database/sql-database-query-performance
Query Store is turned on by default for Azure SQL Database
https://azure.microsoft.com/ja-jp/updates/query-store-on-is-the-new-default-for-azure-sql-database/
57. REFERENCE ASSEMBLY WebLogExtASM;
@rs =
EXTRACT
UserID string,
Start DateTime,
End DateTime,
Region string,
SitesVisited string,
PagesVisited string
FROM “/Logs/WebLogRecords.txt”
USING WebLogExtractor ();
@result = SELECT UserID,
(End.Subtract(Start)).TotalSeconds AS Duration
FROM @rs ORDER BY Duration DESC FETCH 10;
OUTPUT @result TO “/Logs/Results/top10.tsv"
USING Outputter.Tsv();
• 型定義は C# の型定義と同じ
• データをファイルから抽出・読み込み
するときに、スキーマが必要
Data Lake Store 内 のファイル
独自形式を解析するカスタム関数
C# の関数
行セット:
(中間テーブルの概念
に近い)
TSV形式で書き込む関数
60. 従来型の処理・分析 Azure Data Lake を中心とした処理・分析
Business
apps
Custom
apps
Sensors
and devices
ADL Store
People
非構造化データも
含めてあらゆる
データを格納
Azure SQL
DW
Azure AD
Power BI
ADF
ADL
Analytics
• 処理・分析業務の大半はデータ準備作業が占める
• 処理・分析業務に手間・時間が必要
Business
apps
Custom
apps
Sensors
and devices
HDInsight
ユーザー管理、認証
データの連携
Power BI
File System
Database
Database
Hadoop
DWH
Data Mart
62. Microsoft data platform solutions
Product Category Description More Info
SQL Server 2016 RDBMS Earned top spot in Gartner’s Operational Database magic
quadrant. JSON support. Linux TBD
https://www.microsoft.com/en-us/server-
cloud/products/sql-server-2016/
SQL Database RDBMS/DBaaS Cloud-based service that is provisioned and scaled quickly.
Has built-in high availability and disaster recovery. JSON
support
https://azure.microsoft.com/en-
us/services/sql-database/
SQL Data Warehouse MPP RDBMS/DBaaS Cloud-based service that handles relational big data.
Provision and scale quickly. Can pause service to reduce cost
https://azure.microsoft.com/en-
us/services/sql-data-warehouse/
Analytics Platform System (APS) MPP RDBMS Big data analytics appliance for high performance and
seamless integration of all your data
https://www.microsoft.com/en-us/server-
cloud/products/analytics-platform-system/
Azure Data Lake Store Hadoop storage Removes the complexities of ingesting and storing all of your
data while making it faster to get up and running with batch,
streaming, and interactive analytics
https://azure.microsoft.com/en-
us/services/data-lake-store/
Azure Data Lake Analytics On-demand analytics job
service/Big Data-as-a-
service
Cloud-based service that dynamically provisions resources so
you can run queries on exabytes of data. Includes U-SQL, a
new big data query language
https://azure.microsoft.com/en-
us/services/data-lake-analytics/
HDInsight PaaS Hadoop
compute/Hadoop
clusters-as-a-service
A managed Apache Hadoop, Spark, R, HBase, Kafka, and
Storm cloud service made easy
https://azure.microsoft.com/en-
us/services/hdinsight/
DocumentDB PaaS NoSQL: Document
Store
Get your apps up and running in hours with a fully managed
NoSQL database service that indexes, stores, and queries
data using familiar SQL syntax
https://azure.microsoft.com/en-
us/services/documentdb/
Azure Table Storage PaaS NoSQL: Key-value
Store
Store large amount of semi-structured data in the cloud https://azure.microsoft.com/en-
us/services/storage/tables/
63. Microsoft Big Data Portfolio
SQL Server Stretch
Business intelligence
Machine learning analytics
Insights
Azure SQL Database
SQL Server 2016
SQL Server 2016 Fast Track
Azure SQL DW
ADLS & ADLA
DocumentDB
HDInsight
Hadoop
Analytics Platform System
Sequential Scale Out + AcrossScale Up
Key
Relational Non-relational
On-premisesCloud
Microsoft has solutions covering
and connecting all four
quadrants – that’s why SQL
Server is one of the most utilized
databases in the world
64. Azure
Data Lake Store
Azure
Blob Storage
Purpose Optimized for big data analytics General purpose bulk storage
Use Cases Batch, Interactive, Streaming App backend, backup data, media storage for
streaming
Units of Storage Accounts / Folders / Files Accounts / Containers / Blobs
Structure Hierarchical File System Flat namespace
WebHDFS Implements WebHDFS No (WASB)
Security AD SAS keys
Storage Auto Shared/Files chunked Manually manage expansion/Files intact
Service State Generally Available. PolyBase
just supported!
Generally Available
Billing Pay for data stored and for I/O Pay for data stored and for I/O
Region Availability Two US regions (Other regions coming soon) All Azure Regions
ADL Store vs Blob Store
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-comparison-with-blob-storage
65. Want
Hadoop?
Need exact
same on-
prem
Need
interactive /
streaming?
Mandatory
No strong opinion
Azure Marketplace (IaaS)
• Need all workloads exactly like on-
premises
• Need 100% Hortonworks/Cloudera/MapR
Azure HDInsight
• Most Hadoop workloads
• Fully managed by Microsoft
• Sell HDI + ADLS
• Stickier to Microsoft than VMs
• Can do interactive (Spark) and streaming
(Storm/Spark)
Azure Data Lake Analytics
• Easiest experience for admin: no sense of
clusters, instant scale per job
• Easiest experience for developers: Visual
Studio/U-SQL (C#+SQL)
• Sell ADLA + ADLS
• Batch workloads only
Need everything exactly
like on-prem
Need core
projects Yes Batch is OK
Always present
ADLA if .NET or
Visual Studio Shop
If .NET or
VS shop?
66. Azure SQL DW HDInsight Hive HDInsight Spark Azure Data Lake SQL Server (IaaS)
Volume Petabytes Petabytes Petabytes Petabytes Terabytes
Security Encryption, TD,
Audit
ADLS / Apache
Ranger
ADLS AAD Security
Groups (data)
Encryption, TD
Audit
Languages T-SQL HiveQL SparkSQL, HiveQL,
Scala, Java, Python,
R
U-SQL T-SQL
Extensibility No Yes, .NET/SerDe Yes, Packages Yes, .NET Yes, .NET CLR
External File
Types
ORC, TXT,
Parquet, RCFile
ORC, CSV, Parquet
+ others
Parquet, JSON,
Hive + others
Many ORC, TXT, Parquet,
RCFile
Admin Low-Medium Medium-High Medium-High Low High
Cost Model DWU Nodes & VM Nodes & VM Units/Jobs VM
Schema
Definition
Schema on Write
/ Polybase
Schema on Read Schema on Read Schema on Read Schema on Write /
Polybase
Max DB Size 240TB Comp (5X
= 1PB)
Unlimited 64TB (64 1TB drives)