SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Windows Azure HDInsight
        Service


   Hadoop on Windows Azure


        NEIL MACKENZIE
Who Am I?

 Neil Mackenzie
 Windows Azure Architect @ Satory Global


 Windows Azure MVP
 Blog: http://convective.wordpress.com/
 Twitter: @mknz


 Book:
 Microsoft Windows Azure Development Cookbook
Goals and Agenda

 Goals
   Introduce Windows Azure HDInsight Service to the Windows
    Azure developer
   Introduce Windows Azure to the Hadoop user

   Not a tutorial on how to use Hadoop features

 Agenda
   Big Data

   Windows Azure

   Windows Azure HDInsight Service
Big Data

 Problem:
   How do we create value from enormous amounts of low-value
    data?


 Solution:
   Analyze it using a lot of commodity hardware.
Three Vs of Big Data

 Volume
   How much data is there?

 Variety
   What are the sources of the data?

 Velocity
   How fast is the data being generated?
MapReduce

 Distributed computational model for data analysis.
   Map function:
        Processes a key-value pair to generate intermediate pairs
    Reduce function:
        Merges all intermediate values with the same intermediate key.


 Map and reduce functions allocated to many
  compute nodes with data stored locally.
 Raw MapReduce functions are written in Java.
Apache Hadoop

 Modules:
   Hadoop Distributed File System (HDFS)

   MapReduce

 Related projects:
   HBase – scalable, distributed database

   Hive – data warehouse infrastructure

   Mahout – scalable machine learning library

   Pig – high-level data-flow language

 Other:
   Sqoop –import and export to relational database
Windows Azure

 Compute
   PaaS: Cloud Services, Windows Azure Web Sites
   IaaS: Virtual Machines

 Storage
   Windows Azure Storage Service: blobs, tables, queues
   Windows Azure SQL Database
   IaaS: Microsoft SQL Server, MongoDB, Cassandra, etc.

 Connectivity
   HTTP, TCP, UDP, Site-to-Site VPN

 Administration
   Portal, Service Management API
Windows Azure HDInsight Service

 Components:
   HadoopCore – v1.0.1

   HDFS & ASV

   Pig – v0.9.3

   Hive – v0.8.1

   Sqoop – v1.4.2

   Excel/Hive



 Note: this was formerly known as Hadoop on Azure.
Hadoop Administration

 Portal
   http://www.hadooponazure.com

   Apply to join preview

   Create and manage Hadoop cluster
         3 nodes for 5 days
     Access the Interactive console
       Hive
         Invoke Hive statements

       JavaScript
         Invoke HDFS commands

         Invoke Hive & Pig statements
Distributed File Systems

 HDFS
   Contents deleted when cluster deleted

 ASV
   Azure Storage Vault

   Data stored in Windows Azure Blob Storage

   Configured on Hadoop on Azure portal

   Contents survive deletion of Hadoop cluster

   Supports multi-level structure, e.g.:
       containername/input/file1
Pig

 Hadoop feature to perform data-flow operations:
   Execution environment

   Language: Pig Latin

 Execution Environment
   Local in local JVM or distributed on Hadoop cluster

 Pig Latin
   High-level language

   Describes data-flow operations

   Automatically invokes MapReduce jobs

   Much simpler than using MapReduce directly
Pig Example


records = LOAD 'asv://flightdata/input/flightdata.txt'
AS
(year:int, month:int, day:int, carrier:chararray, origin:char
array, dest:chararray, depdelay:int, arrdelay:int);

modified_records = FOREACH records
GENERATE origin, depdelay;

STORE modified_records
INTO 'my_output' using PigStorage(',');
Hive

 Hadoop feature to perform data warehouse
  operations
 HiveQL
     high-level, SQL-like language
     Supports equi-joins
     Schema on read NOT schema on write
     Automatically invokes MapReduce jobs
     Much simpler than using MapReduce directly
 Metadata store
   Contains descriptions of tables
Hive Example

FROM flightdata_asv

INSERT OVERWRITE TABLE origin_counts
SELECT origin, COUNT(*)
GROUP BY origin

INSERT OVERWRITE TABLE dest_counts
SELECT dest, COUNT(*)
GROUP BY dest
Sqoop

 Feature allowing import and export from SQL
 databases
    Uses JDBC connector
    Works with Windows Azure SQL Database
    Table must exist before export
Sqoop Example

 Exporting a table:
sqoop.cmd export –connect
"jdbc:sqlserver://sql_database_server.database.windows.net:1433;database=
sql_database_instance;user=sqoop_login@sql_database_server;password=s
qoop_login_password"
--table sql_database_table
--export-dir "/user/hive/warehouse/hive_table"
--input-fields-terminated-by "001"
Excel and Hadoop on Azure

 Example of Microsoft business intelligence strategy
   Expose Hadoop to existing tools

 HiveODBC connector for Excel
   Create Hive queries from Excel

   Invoke them from Excel
More Information

 Sign up for preview:
  http://www.hadooponazure.com
 Support:
  http://social.msdn.microsoft.com/Forums/en-US/hdinsight
 Avkash Chauhan’s blog:
  http://blogs.msdn.com/b/avkashchauhan/archive/tags/hadoop
 Roger Jennings’ blog:
  http://oakleafblog.blogspot.com/2012/04/using-data-in-
  windows-azure-blobs-with.html
Summary

 Hadoop:
   De-facto solution to the Big Data problem

 Windows Azure HDInsight Service
   Native Hadoop implementation

   Managed Hadoop service for Windows Azure

   Currently in preview
Windows Azure HDInsight Service

Weitere ähnliche Inhalte

Was ist angesagt?

Hands-on Lab: Amazon ElastiCache
Hands-on Lab: Amazon ElastiCacheHands-on Lab: Amazon ElastiCache
Hands-on Lab: Amazon ElastiCacheAmazon Web Services
 
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and moreScaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and moreDropsolid
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and DrupalPromet Source
 
Windows Azure Virtual Machines
Windows Azure Virtual MachinesWindows Azure Virtual Machines
Windows Azure Virtual MachinesNeil Mackenzie
 
Getting Started with ElastiCache for Redis
Getting Started with ElastiCache for RedisGetting Started with ElastiCache for Redis
Getting Started with ElastiCache for RedisAmazon Web Services
 
Infrastructure as Code on Azure - NET Conf CO v2018
Infrastructure as Code on Azure - NET Conf CO v2018 Infrastructure as Code on Azure - NET Conf CO v2018
Infrastructure as Code on Azure - NET Conf CO v2018 Victor Silva
 
More Cache for Less Cash
More Cache for Less CashMore Cache for Less Cash
More Cache for Less CashMichael Collier
 
Apache Superset at Airbnb
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at AirbnbBill Liu
 
AWS Cyber Security Best Practices
AWS Cyber Security Best PracticesAWS Cyber Security Best Practices
AWS Cyber Security Best PracticesDoiT International
 
Azure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment ScenariosAzure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment ScenariosBrian Benz
 
Automating Your Microsoft Azure Environment (DevLink 2014)
Automating Your Microsoft Azure Environment (DevLink 2014)Automating Your Microsoft Azure Environment (DevLink 2014)
Automating Your Microsoft Azure Environment (DevLink 2014)Michael Collier
 
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar SeriesBest Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar SeriesAmazon Web Services
 
Automating Your Azure Environment
Automating Your Azure EnvironmentAutomating Your Azure Environment
Automating Your Azure EnvironmentMichael Collier
 
Amazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and HostingAmazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and HostingAcquia
 
Windows Azure Blob Storage
Windows Azure Blob StorageWindows Azure Blob Storage
Windows Azure Blob Storageylew15
 
Zero to 60 with Azure Cosmos DB
Zero to 60 with Azure Cosmos DBZero to 60 with Azure Cosmos DB
Zero to 60 with Azure Cosmos DBAdnan Hashmi
 
Bacd zenoss
Bacd zenossBacd zenoss
Bacd zenosske4qqq
 
HDInsight Informative articles
HDInsight Informative articlesHDInsight Informative articles
HDInsight Informative articlesKaran Gulati
 

Was ist angesagt? (20)

Hands-on Lab: Amazon ElastiCache
Hands-on Lab: Amazon ElastiCacheHands-on Lab: Amazon ElastiCache
Hands-on Lab: Amazon ElastiCache
 
Accelerating DynamoDB with DAX
Accelerating DynamoDB with DAXAccelerating DynamoDB with DAX
Accelerating DynamoDB with DAX
 
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and moreScaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and Drupal
 
Windows Azure Virtual Machines
Windows Azure Virtual MachinesWindows Azure Virtual Machines
Windows Azure Virtual Machines
 
Getting Started with ElastiCache for Redis
Getting Started with ElastiCache for RedisGetting Started with ElastiCache for Redis
Getting Started with ElastiCache for Redis
 
Infrastructure as Code on Azure - NET Conf CO v2018
Infrastructure as Code on Azure - NET Conf CO v2018 Infrastructure as Code on Azure - NET Conf CO v2018
Infrastructure as Code on Azure - NET Conf CO v2018
 
More Cache for Less Cash
More Cache for Less CashMore Cache for Less Cash
More Cache for Less Cash
 
Apache Superset at Airbnb
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at Airbnb
 
AWS Cyber Security Best Practices
AWS Cyber Security Best PracticesAWS Cyber Security Best Practices
AWS Cyber Security Best Practices
 
Azure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment ScenariosAzure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment Scenarios
 
Automating Your Microsoft Azure Environment (DevLink 2014)
Automating Your Microsoft Azure Environment (DevLink 2014)Automating Your Microsoft Azure Environment (DevLink 2014)
Automating Your Microsoft Azure Environment (DevLink 2014)
 
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar SeriesBest Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
 
Automating Your Azure Environment
Automating Your Azure EnvironmentAutomating Your Azure Environment
Automating Your Azure Environment
 
Amazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and HostingAmazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and Hosting
 
Azure IaaS
Azure IaaSAzure IaaS
Azure IaaS
 
Windows Azure Blob Storage
Windows Azure Blob StorageWindows Azure Blob Storage
Windows Azure Blob Storage
 
Zero to 60 with Azure Cosmos DB
Zero to 60 with Azure Cosmos DBZero to 60 with Azure Cosmos DB
Zero to 60 with Azure Cosmos DB
 
Bacd zenoss
Bacd zenossBacd zenoss
Bacd zenoss
 
HDInsight Informative articles
HDInsight Informative articlesHDInsight Informative articles
HDInsight Informative articles
 

Andere mochten auch

Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Herman Wu
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureKhalid Salama
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhDAdnan Masood
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLJen Stirrup
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)James Serra
 
Drive Smarter Decisions with Hadoop and Windows Azure HDInsight
Drive Smarter Decisions with Hadoop and Windows Azure HDInsightDrive Smarter Decisions with Hadoop and Windows Azure HDInsight
Drive Smarter Decisions with Hadoop and Windows Azure HDInsightDataWorks Summit
 
Functional Reactive Programming without Black Magic (UIKonf 2015)
Functional Reactive Programming without Black Magic (UIKonf 2015)Functional Reactive Programming without Black Magic (UIKonf 2015)
Functional Reactive Programming without Black Magic (UIKonf 2015)Jens Ravens
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Jen Stirrup
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Jen Stirrup
 
Restructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachRestructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachAdnan Masood
 
Visualising the tabular model for power view upload
Visualising the tabular model for power view uploadVisualising the tabular model for power view upload
Visualising the tabular model for power view uploadJen Stirrup
 
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationJen Stirrup
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
 
Go Serverless with Azure Functions
Go Serverless with Azure FunctionsGo Serverless with Azure Functions
Go Serverless with Azure FunctionsJim O'Neil
 

Andere mochten auch (20)

Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft Azure
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureML
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
Drive Smarter Decisions with Hadoop and Windows Azure HDInsight
Drive Smarter Decisions with Hadoop and Windows Azure HDInsightDrive Smarter Decisions with Hadoop and Windows Azure HDInsight
Drive Smarter Decisions with Hadoop and Windows Azure HDInsight
 
Functional Reactive Programming without Black Magic (UIKonf 2015)
Functional Reactive Programming without Black Magic (UIKonf 2015)Functional Reactive Programming without Black Magic (UIKonf 2015)
Functional Reactive Programming without Black Magic (UIKonf 2015)
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
Realtime analytics with_hadoop
Realtime analytics with_hadoopRealtime analytics with_hadoop
Realtime analytics with_hadoop
 
Restructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachRestructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality Approach
 
Cloud computing by Bhavesh
Cloud computing by BhaveshCloud computing by Bhavesh
Cloud computing by Bhavesh
 
Visualising the tabular model for power view upload
Visualising the tabular model for power view uploadVisualising the tabular model for power view upload
Visualising the tabular model for power view upload
 
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Go Serverless with Azure Functions
Go Serverless with Azure FunctionsGo Serverless with Azure Functions
Go Serverless with Azure Functions
 

Ähnlich wie Windows Azure HDInsight Service

Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopAvkash Chauhan
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Jonathan Seidman
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive AnalyticsManish Chopra
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
 
It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?Srihari Srinivasan
 
Srikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hydSrikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hydsrikanth K
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platformnvvrajesh
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop StoryMichael Rys
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackAndrew Brust
 
Escalando Aplicaciones Web
Escalando Aplicaciones WebEscalando Aplicaciones Web
Escalando Aplicaciones WebSantiago Coffey
 

Ähnlich wie Windows Azure HDInsight Service (20)

Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
מיכאל
מיכאלמיכאל
מיכאל
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?
 
Srikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hydSrikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hyd
 
Hive
HiveHive
Hive
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
 
Escalando Aplicaciones Web
Escalando Aplicaciones WebEscalando Aplicaciones Web
Escalando Aplicaciones Web
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
Sureh hadoop 3 years t
Sureh hadoop 3 years tSureh hadoop 3 years t
Sureh hadoop 3 years t
 

Mehr von Neil Mackenzie

Project Orleans - Actor Model framework
Project Orleans - Actor Model frameworkProject Orleans - Actor Model framework
Project Orleans - Actor Model frameworkNeil Mackenzie
 
Windows Azure SQL Database Federations
Windows Azure SQL Database FederationsWindows Azure SQL Database Federations
Windows Azure SQL Database FederationsNeil Mackenzie
 
Brokered Messaging in Windows Azure
Brokered Messaging in Windows AzureBrokered Messaging in Windows Azure
Brokered Messaging in Windows AzureNeil Mackenzie
 
Windows Azure Diagnostics
Windows Azure DiagnosticsWindows Azure Diagnostics
Windows Azure DiagnosticsNeil Mackenzie
 
Introduction to Windows Azure AppFabric Applications
Introduction to Windows Azure AppFabric ApplicationsIntroduction to Windows Azure AppFabric Applications
Introduction to Windows Azure AppFabric ApplicationsNeil Mackenzie
 

Mehr von Neil Mackenzie (6)

Azure DocumentDB
Azure DocumentDBAzure DocumentDB
Azure DocumentDB
 
Project Orleans - Actor Model framework
Project Orleans - Actor Model frameworkProject Orleans - Actor Model framework
Project Orleans - Actor Model framework
 
Windows Azure SQL Database Federations
Windows Azure SQL Database FederationsWindows Azure SQL Database Federations
Windows Azure SQL Database Federations
 
Brokered Messaging in Windows Azure
Brokered Messaging in Windows AzureBrokered Messaging in Windows Azure
Brokered Messaging in Windows Azure
 
Windows Azure Diagnostics
Windows Azure DiagnosticsWindows Azure Diagnostics
Windows Azure Diagnostics
 
Introduction to Windows Azure AppFabric Applications
Introduction to Windows Azure AppFabric ApplicationsIntroduction to Windows Azure AppFabric Applications
Introduction to Windows Azure AppFabric Applications
 

Kürzlich hochgeladen

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Kürzlich hochgeladen (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

Windows Azure HDInsight Service

  • 1. Windows Azure HDInsight Service Hadoop on Windows Azure NEIL MACKENZIE
  • 2. Who Am I?  Neil Mackenzie  Windows Azure Architect @ Satory Global  Windows Azure MVP  Blog: http://convective.wordpress.com/  Twitter: @mknz  Book: Microsoft Windows Azure Development Cookbook
  • 3. Goals and Agenda  Goals  Introduce Windows Azure HDInsight Service to the Windows Azure developer  Introduce Windows Azure to the Hadoop user  Not a tutorial on how to use Hadoop features  Agenda  Big Data  Windows Azure  Windows Azure HDInsight Service
  • 4. Big Data  Problem:  How do we create value from enormous amounts of low-value data?  Solution:  Analyze it using a lot of commodity hardware.
  • 5. Three Vs of Big Data  Volume  How much data is there?  Variety  What are the sources of the data?  Velocity  How fast is the data being generated?
  • 6. MapReduce  Distributed computational model for data analysis.  Map function:  Processes a key-value pair to generate intermediate pairs  Reduce function:  Merges all intermediate values with the same intermediate key.  Map and reduce functions allocated to many compute nodes with data stored locally.  Raw MapReduce functions are written in Java.
  • 7. Apache Hadoop  Modules:  Hadoop Distributed File System (HDFS)  MapReduce  Related projects:  HBase – scalable, distributed database  Hive – data warehouse infrastructure  Mahout – scalable machine learning library  Pig – high-level data-flow language  Other:  Sqoop –import and export to relational database
  • 8. Windows Azure  Compute  PaaS: Cloud Services, Windows Azure Web Sites  IaaS: Virtual Machines  Storage  Windows Azure Storage Service: blobs, tables, queues  Windows Azure SQL Database  IaaS: Microsoft SQL Server, MongoDB, Cassandra, etc.  Connectivity  HTTP, TCP, UDP, Site-to-Site VPN  Administration  Portal, Service Management API
  • 9. Windows Azure HDInsight Service  Components:  HadoopCore – v1.0.1  HDFS & ASV  Pig – v0.9.3  Hive – v0.8.1  Sqoop – v1.4.2  Excel/Hive  Note: this was formerly known as Hadoop on Azure.
  • 10. Hadoop Administration  Portal  http://www.hadooponazure.com  Apply to join preview  Create and manage Hadoop cluster  3 nodes for 5 days  Access the Interactive console  Hive  Invoke Hive statements  JavaScript  Invoke HDFS commands  Invoke Hive & Pig statements
  • 11. Distributed File Systems  HDFS  Contents deleted when cluster deleted  ASV  Azure Storage Vault  Data stored in Windows Azure Blob Storage  Configured on Hadoop on Azure portal  Contents survive deletion of Hadoop cluster  Supports multi-level structure, e.g.:  containername/input/file1
  • 12. Pig  Hadoop feature to perform data-flow operations:  Execution environment  Language: Pig Latin  Execution Environment  Local in local JVM or distributed on Hadoop cluster  Pig Latin  High-level language  Describes data-flow operations  Automatically invokes MapReduce jobs  Much simpler than using MapReduce directly
  • 13. Pig Example records = LOAD 'asv://flightdata/input/flightdata.txt' AS (year:int, month:int, day:int, carrier:chararray, origin:char array, dest:chararray, depdelay:int, arrdelay:int); modified_records = FOREACH records GENERATE origin, depdelay; STORE modified_records INTO 'my_output' using PigStorage(',');
  • 14. Hive  Hadoop feature to perform data warehouse operations  HiveQL  high-level, SQL-like language  Supports equi-joins  Schema on read NOT schema on write  Automatically invokes MapReduce jobs  Much simpler than using MapReduce directly  Metadata store  Contains descriptions of tables
  • 15. Hive Example FROM flightdata_asv INSERT OVERWRITE TABLE origin_counts SELECT origin, COUNT(*) GROUP BY origin INSERT OVERWRITE TABLE dest_counts SELECT dest, COUNT(*) GROUP BY dest
  • 16. Sqoop  Feature allowing import and export from SQL databases  Uses JDBC connector  Works with Windows Azure SQL Database  Table must exist before export
  • 17. Sqoop Example  Exporting a table: sqoop.cmd export –connect "jdbc:sqlserver://sql_database_server.database.windows.net:1433;database= sql_database_instance;user=sqoop_login@sql_database_server;password=s qoop_login_password" --table sql_database_table --export-dir "/user/hive/warehouse/hive_table" --input-fields-terminated-by "001"
  • 18. Excel and Hadoop on Azure  Example of Microsoft business intelligence strategy  Expose Hadoop to existing tools  HiveODBC connector for Excel  Create Hive queries from Excel  Invoke them from Excel
  • 19. More Information  Sign up for preview: http://www.hadooponazure.com  Support: http://social.msdn.microsoft.com/Forums/en-US/hdinsight  Avkash Chauhan’s blog: http://blogs.msdn.com/b/avkashchauhan/archive/tags/hadoop  Roger Jennings’ blog: http://oakleafblog.blogspot.com/2012/04/using-data-in- windows-azure-blobs-with.html
  • 20. Summary  Hadoop:  De-facto solution to the Big Data problem  Windows Azure HDInsight Service  Native Hadoop implementation  Managed Hadoop service for Windows Azure  Currently in preview