SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Windows Azure HDInsight
        Service


   Hadoop on Windows Azure


        NEIL MACKENZIE
Who Am I?

 Neil Mackenzie
 Windows Azure Architect @ Satory Global


 Windows Azure MVP
 Blog: http://convective.wordpress.com/
 Twitter: @mknz


 Book:
 Microsoft Windows Azure Development Cookbook
Goals and Agenda

 Goals
   Introduce Windows Azure HDInsight Service to the Windows
    Azure developer
   Introduce Windows Azure to the Hadoop user

   Not a tutorial on how to use Hadoop features

 Agenda
   Big Data

   Windows Azure

   Windows Azure HDInsight Service
Big Data

 Problem:
   How do we create value from enormous amounts of low-value
    data?


 Solution:
   Analyze it using a lot of commodity hardware.
Three Vs of Big Data

 Volume
   How much data is there?

 Variety
   What are the sources of the data?

 Velocity
   How fast is the data being generated?
MapReduce

 Distributed computational model for data analysis.
   Map function:
        Processes a key-value pair to generate intermediate pairs
    Reduce function:
        Merges all intermediate values with the same intermediate key.


 Map and reduce functions allocated to many
  compute nodes with data stored locally.
 Raw MapReduce functions are written in Java.
Apache Hadoop

 Modules:
   Hadoop Distributed File System (HDFS)

   MapReduce

 Related projects:
   HBase – scalable, distributed database

   Hive – data warehouse infrastructure

   Mahout – scalable machine learning library

   Pig – high-level data-flow language

 Other:
   Sqoop –import and export to relational database
Windows Azure

 Compute
   PaaS: Cloud Services, Windows Azure Web Sites
   IaaS: Virtual Machines

 Storage
   Windows Azure Storage Service: blobs, tables, queues
   Windows Azure SQL Database
   IaaS: Microsoft SQL Server, MongoDB, Cassandra, etc.

 Connectivity
   HTTP, TCP, UDP, Site-to-Site VPN

 Administration
   Portal, Service Management API
Windows Azure HDInsight Service

 Components:
   HadoopCore – v1.0.1

   HDFS & ASV

   Pig – v0.9.3

   Hive – v0.8.1

   Sqoop – v1.4.2

   Excel/Hive



 Note: this was formerly known as Hadoop on Azure.
Hadoop Administration

 Portal
   http://www.hadooponazure.com

   Apply to join preview

   Create and manage Hadoop cluster
         3 nodes for 5 days
     Access the Interactive console
       Hive
         Invoke Hive statements

       JavaScript
         Invoke HDFS commands

         Invoke Hive & Pig statements
Distributed File Systems

 HDFS
   Contents deleted when cluster deleted

 ASV
   Azure Storage Vault

   Data stored in Windows Azure Blob Storage

   Configured on Hadoop on Azure portal

   Contents survive deletion of Hadoop cluster

   Supports multi-level structure, e.g.:
       containername/input/file1
Pig

 Hadoop feature to perform data-flow operations:
   Execution environment

   Language: Pig Latin

 Execution Environment
   Local in local JVM or distributed on Hadoop cluster

 Pig Latin
   High-level language

   Describes data-flow operations

   Automatically invokes MapReduce jobs

   Much simpler than using MapReduce directly
Pig Example


records = LOAD 'asv://flightdata/input/flightdata.txt'
AS
(year:int, month:int, day:int, carrier:chararray, origin:char
array, dest:chararray, depdelay:int, arrdelay:int);

modified_records = FOREACH records
GENERATE origin, depdelay;

STORE modified_records
INTO 'my_output' using PigStorage(',');
Hive

 Hadoop feature to perform data warehouse
  operations
 HiveQL
     high-level, SQL-like language
     Supports equi-joins
     Schema on read NOT schema on write
     Automatically invokes MapReduce jobs
     Much simpler than using MapReduce directly
 Metadata store
   Contains descriptions of tables
Hive Example

FROM flightdata_asv

INSERT OVERWRITE TABLE origin_counts
SELECT origin, COUNT(*)
GROUP BY origin

INSERT OVERWRITE TABLE dest_counts
SELECT dest, COUNT(*)
GROUP BY dest
Sqoop

 Feature allowing import and export from SQL
 databases
    Uses JDBC connector
    Works with Windows Azure SQL Database
    Table must exist before export
Sqoop Example

 Exporting a table:
sqoop.cmd export –connect
"jdbc:sqlserver://sql_database_server.database.windows.net:1433;database=
sql_database_instance;user=sqoop_login@sql_database_server;password=s
qoop_login_password"
--table sql_database_table
--export-dir "/user/hive/warehouse/hive_table"
--input-fields-terminated-by "001"
Excel and Hadoop on Azure

 Example of Microsoft business intelligence strategy
   Expose Hadoop to existing tools

 HiveODBC connector for Excel
   Create Hive queries from Excel

   Invoke them from Excel
More Information

 Sign up for preview:
  http://www.hadooponazure.com
 Support:
  http://social.msdn.microsoft.com/Forums/en-US/hdinsight
 Avkash Chauhan’s blog:
  http://blogs.msdn.com/b/avkashchauhan/archive/tags/hadoop
 Roger Jennings’ blog:
  http://oakleafblog.blogspot.com/2012/04/using-data-in-
  windows-azure-blobs-with.html
Summary

 Hadoop:
   De-facto solution to the Big Data problem

 Windows Azure HDInsight Service
   Native Hadoop implementation

   Managed Hadoop service for Windows Azure

   Currently in preview
Windows Azure HDInsight Service

Weitere ähnliche Inhalte

Was ist angesagt?

Hands-on Lab: Amazon ElastiCache
Hands-on Lab: Amazon ElastiCacheHands-on Lab: Amazon ElastiCache
Hands-on Lab: Amazon ElastiCacheAmazon Web Services
 
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and moreScaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and moreDropsolid
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and DrupalPromet Source
 
Windows Azure Virtual Machines
Windows Azure Virtual MachinesWindows Azure Virtual Machines
Windows Azure Virtual MachinesNeil Mackenzie
 
Getting Started with ElastiCache for Redis
Getting Started with ElastiCache for RedisGetting Started with ElastiCache for Redis
Getting Started with ElastiCache for RedisAmazon Web Services
 
Infrastructure as Code on Azure - NET Conf CO v2018
Infrastructure as Code on Azure - NET Conf CO v2018 Infrastructure as Code on Azure - NET Conf CO v2018
Infrastructure as Code on Azure - NET Conf CO v2018 Victor Silva
 
More Cache for Less Cash
More Cache for Less CashMore Cache for Less Cash
More Cache for Less CashMichael Collier
 
Apache Superset at Airbnb
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at AirbnbBill Liu
 
AWS Cyber Security Best Practices
AWS Cyber Security Best PracticesAWS Cyber Security Best Practices
AWS Cyber Security Best PracticesDoiT International
 
Azure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment ScenariosAzure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment ScenariosBrian Benz
 
Automating Your Microsoft Azure Environment (DevLink 2014)
Automating Your Microsoft Azure Environment (DevLink 2014)Automating Your Microsoft Azure Environment (DevLink 2014)
Automating Your Microsoft Azure Environment (DevLink 2014)Michael Collier
 
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar SeriesBest Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar SeriesAmazon Web Services
 
Automating Your Azure Environment
Automating Your Azure EnvironmentAutomating Your Azure Environment
Automating Your Azure EnvironmentMichael Collier
 
Amazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and HostingAmazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and HostingAcquia
 
Windows Azure Blob Storage
Windows Azure Blob StorageWindows Azure Blob Storage
Windows Azure Blob Storageylew15
 
Zero to 60 with Azure Cosmos DB
Zero to 60 with Azure Cosmos DBZero to 60 with Azure Cosmos DB
Zero to 60 with Azure Cosmos DBAdnan Hashmi
 
Bacd zenoss
Bacd zenossBacd zenoss
Bacd zenosske4qqq
 
HDInsight Informative articles
HDInsight Informative articlesHDInsight Informative articles
HDInsight Informative articlesKaran Gulati
 

Was ist angesagt? (20)

Hands-on Lab: Amazon ElastiCache
Hands-on Lab: Amazon ElastiCacheHands-on Lab: Amazon ElastiCache
Hands-on Lab: Amazon ElastiCache
 
Accelerating DynamoDB with DAX
Accelerating DynamoDB with DAXAccelerating DynamoDB with DAX
Accelerating DynamoDB with DAX
 
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and moreScaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and Drupal
 
Windows Azure Virtual Machines
Windows Azure Virtual MachinesWindows Azure Virtual Machines
Windows Azure Virtual Machines
 
Getting Started with ElastiCache for Redis
Getting Started with ElastiCache for RedisGetting Started with ElastiCache for Redis
Getting Started with ElastiCache for Redis
 
Infrastructure as Code on Azure - NET Conf CO v2018
Infrastructure as Code on Azure - NET Conf CO v2018 Infrastructure as Code on Azure - NET Conf CO v2018
Infrastructure as Code on Azure - NET Conf CO v2018
 
More Cache for Less Cash
More Cache for Less CashMore Cache for Less Cash
More Cache for Less Cash
 
Apache Superset at Airbnb
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at Airbnb
 
AWS Cyber Security Best Practices
AWS Cyber Security Best PracticesAWS Cyber Security Best Practices
AWS Cyber Security Best Practices
 
Azure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment ScenariosAzure Virtual Machines Deployment Scenarios
Azure Virtual Machines Deployment Scenarios
 
Automating Your Microsoft Azure Environment (DevLink 2014)
Automating Your Microsoft Azure Environment (DevLink 2014)Automating Your Microsoft Azure Environment (DevLink 2014)
Automating Your Microsoft Azure Environment (DevLink 2014)
 
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar SeriesBest Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar Series
 
Automating Your Azure Environment
Automating Your Azure EnvironmentAutomating Your Azure Environment
Automating Your Azure Environment
 
Amazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and HostingAmazon Web Services Building Blocks for Drupal Applications and Hosting
Amazon Web Services Building Blocks for Drupal Applications and Hosting
 
Azure IaaS
Azure IaaSAzure IaaS
Azure IaaS
 
Windows Azure Blob Storage
Windows Azure Blob StorageWindows Azure Blob Storage
Windows Azure Blob Storage
 
Zero to 60 with Azure Cosmos DB
Zero to 60 with Azure Cosmos DBZero to 60 with Azure Cosmos DB
Zero to 60 with Azure Cosmos DB
 
Bacd zenoss
Bacd zenossBacd zenoss
Bacd zenoss
 
HDInsight Informative articles
HDInsight Informative articlesHDInsight Informative articles
HDInsight Informative articles
 

Andere mochten auch

Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Herman Wu
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureKhalid Salama
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhDAdnan Masood
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLJen Stirrup
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)James Serra
 
Drive Smarter Decisions with Hadoop and Windows Azure HDInsight
Drive Smarter Decisions with Hadoop and Windows Azure HDInsightDrive Smarter Decisions with Hadoop and Windows Azure HDInsight
Drive Smarter Decisions with Hadoop and Windows Azure HDInsightDataWorks Summit
 
Functional Reactive Programming without Black Magic (UIKonf 2015)
Functional Reactive Programming without Black Magic (UIKonf 2015)Functional Reactive Programming without Black Magic (UIKonf 2015)
Functional Reactive Programming without Black Magic (UIKonf 2015)Jens Ravens
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Jen Stirrup
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Jen Stirrup
 
Restructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachRestructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachAdnan Masood
 
Visualising the tabular model for power view upload
Visualising the tabular model for power view uploadVisualising the tabular model for power view upload
Visualising the tabular model for power view uploadJen Stirrup
 
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationJen Stirrup
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
 
Go Serverless with Azure Functions
Go Serverless with Azure FunctionsGo Serverless with Azure Functions
Go Serverless with Azure FunctionsJim O'Neil
 

Andere mochten auch (20)

Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft Azure
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhDSpark with Azure HDInsight  - Tampa Bay Data Science - Adnan Masood, PhD
Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureML
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
Drive Smarter Decisions with Hadoop and Windows Azure HDInsight
Drive Smarter Decisions with Hadoop and Windows Azure HDInsightDrive Smarter Decisions with Hadoop and Windows Azure HDInsight
Drive Smarter Decisions with Hadoop and Windows Azure HDInsight
 
Functional Reactive Programming without Black Magic (UIKonf 2015)
Functional Reactive Programming without Black Magic (UIKonf 2015)Functional Reactive Programming without Black Magic (UIKonf 2015)
Functional Reactive Programming without Black Magic (UIKonf 2015)
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 
Realtime analytics with_hadoop
Realtime analytics with_hadoopRealtime analytics with_hadoop
Realtime analytics with_hadoop
 
Restructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachRestructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality Approach
 
Cloud computing by Bhavesh
Cloud computing by BhaveshCloud computing by Bhavesh
Cloud computing by Bhavesh
 
Visualising the tabular model for power view upload
Visualising the tabular model for power view uploadVisualising the tabular model for power view upload
Visualising the tabular model for power view upload
 
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Go Serverless with Azure Functions
Go Serverless with Azure FunctionsGo Serverless with Azure Functions
Go Serverless with Azure Functions
 

Ähnlich wie Windows Azure HDInsight Service

Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopAvkash Chauhan
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Jonathan Seidman
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive AnalyticsManish Chopra
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Lace Lofranco
 
It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?Srihari Srinivasan
 
Srikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hydSrikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hydsrikanth K
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platformnvvrajesh
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop StoryMichael Rys
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackAndrew Brust
 
Escalando Aplicaciones Web
Escalando Aplicaciones WebEscalando Aplicaciones Web
Escalando Aplicaciones WebSantiago Coffey
 

Ähnlich wie Windows Azure HDInsight Service (20)

Introduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache HadoopIntroduction to Big Data Analytics on Apache Hadoop
Introduction to Big Data Analytics on Apache Hadoop
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
מיכאל
מיכאלמיכאל
מיכאל
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?It takes two to tango! : Is SQL-on-Hadoop the next big step?
It takes two to tango! : Is SQL-on-Hadoop the next big step?
 
Srikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hydSrikanth hadoop 3.6yrs_hyd
Srikanth hadoop 3.6yrs_hyd
 
Hive
HiveHive
Hive
 
Hadoop in action
Hadoop in actionHadoop in action
Hadoop in action
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
 
Escalando Aplicaciones Web
Escalando Aplicaciones WebEscalando Aplicaciones Web
Escalando Aplicaciones Web
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
Sureh hadoop 3 years t
Sureh hadoop 3 years tSureh hadoop 3 years t
Sureh hadoop 3 years t
 

Mehr von Neil Mackenzie

Project Orleans - Actor Model framework
Project Orleans - Actor Model frameworkProject Orleans - Actor Model framework
Project Orleans - Actor Model frameworkNeil Mackenzie
 
Windows Azure SQL Database Federations
Windows Azure SQL Database FederationsWindows Azure SQL Database Federations
Windows Azure SQL Database FederationsNeil Mackenzie
 
Brokered Messaging in Windows Azure
Brokered Messaging in Windows AzureBrokered Messaging in Windows Azure
Brokered Messaging in Windows AzureNeil Mackenzie
 
Windows Azure Diagnostics
Windows Azure DiagnosticsWindows Azure Diagnostics
Windows Azure DiagnosticsNeil Mackenzie
 
Introduction to Windows Azure AppFabric Applications
Introduction to Windows Azure AppFabric ApplicationsIntroduction to Windows Azure AppFabric Applications
Introduction to Windows Azure AppFabric ApplicationsNeil Mackenzie
 

Mehr von Neil Mackenzie (6)

Azure DocumentDB
Azure DocumentDBAzure DocumentDB
Azure DocumentDB
 
Project Orleans - Actor Model framework
Project Orleans - Actor Model frameworkProject Orleans - Actor Model framework
Project Orleans - Actor Model framework
 
Windows Azure SQL Database Federations
Windows Azure SQL Database FederationsWindows Azure SQL Database Federations
Windows Azure SQL Database Federations
 
Brokered Messaging in Windows Azure
Brokered Messaging in Windows AzureBrokered Messaging in Windows Azure
Brokered Messaging in Windows Azure
 
Windows Azure Diagnostics
Windows Azure DiagnosticsWindows Azure Diagnostics
Windows Azure Diagnostics
 
Introduction to Windows Azure AppFabric Applications
Introduction to Windows Azure AppFabric ApplicationsIntroduction to Windows Azure AppFabric Applications
Introduction to Windows Azure AppFabric Applications
 

Kürzlich hochgeladen

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Windows Azure HDInsight Service

  • 1. Windows Azure HDInsight Service Hadoop on Windows Azure NEIL MACKENZIE
  • 2. Who Am I?  Neil Mackenzie  Windows Azure Architect @ Satory Global  Windows Azure MVP  Blog: http://convective.wordpress.com/  Twitter: @mknz  Book: Microsoft Windows Azure Development Cookbook
  • 3. Goals and Agenda  Goals  Introduce Windows Azure HDInsight Service to the Windows Azure developer  Introduce Windows Azure to the Hadoop user  Not a tutorial on how to use Hadoop features  Agenda  Big Data  Windows Azure  Windows Azure HDInsight Service
  • 4. Big Data  Problem:  How do we create value from enormous amounts of low-value data?  Solution:  Analyze it using a lot of commodity hardware.
  • 5. Three Vs of Big Data  Volume  How much data is there?  Variety  What are the sources of the data?  Velocity  How fast is the data being generated?
  • 6. MapReduce  Distributed computational model for data analysis.  Map function:  Processes a key-value pair to generate intermediate pairs  Reduce function:  Merges all intermediate values with the same intermediate key.  Map and reduce functions allocated to many compute nodes with data stored locally.  Raw MapReduce functions are written in Java.
  • 7. Apache Hadoop  Modules:  Hadoop Distributed File System (HDFS)  MapReduce  Related projects:  HBase – scalable, distributed database  Hive – data warehouse infrastructure  Mahout – scalable machine learning library  Pig – high-level data-flow language  Other:  Sqoop –import and export to relational database
  • 8. Windows Azure  Compute  PaaS: Cloud Services, Windows Azure Web Sites  IaaS: Virtual Machines  Storage  Windows Azure Storage Service: blobs, tables, queues  Windows Azure SQL Database  IaaS: Microsoft SQL Server, MongoDB, Cassandra, etc.  Connectivity  HTTP, TCP, UDP, Site-to-Site VPN  Administration  Portal, Service Management API
  • 9. Windows Azure HDInsight Service  Components:  HadoopCore – v1.0.1  HDFS & ASV  Pig – v0.9.3  Hive – v0.8.1  Sqoop – v1.4.2  Excel/Hive  Note: this was formerly known as Hadoop on Azure.
  • 10. Hadoop Administration  Portal  http://www.hadooponazure.com  Apply to join preview  Create and manage Hadoop cluster  3 nodes for 5 days  Access the Interactive console  Hive  Invoke Hive statements  JavaScript  Invoke HDFS commands  Invoke Hive & Pig statements
  • 11. Distributed File Systems  HDFS  Contents deleted when cluster deleted  ASV  Azure Storage Vault  Data stored in Windows Azure Blob Storage  Configured on Hadoop on Azure portal  Contents survive deletion of Hadoop cluster  Supports multi-level structure, e.g.:  containername/input/file1
  • 12. Pig  Hadoop feature to perform data-flow operations:  Execution environment  Language: Pig Latin  Execution Environment  Local in local JVM or distributed on Hadoop cluster  Pig Latin  High-level language  Describes data-flow operations  Automatically invokes MapReduce jobs  Much simpler than using MapReduce directly
  • 13. Pig Example records = LOAD 'asv://flightdata/input/flightdata.txt' AS (year:int, month:int, day:int, carrier:chararray, origin:char array, dest:chararray, depdelay:int, arrdelay:int); modified_records = FOREACH records GENERATE origin, depdelay; STORE modified_records INTO 'my_output' using PigStorage(',');
  • 14. Hive  Hadoop feature to perform data warehouse operations  HiveQL  high-level, SQL-like language  Supports equi-joins  Schema on read NOT schema on write  Automatically invokes MapReduce jobs  Much simpler than using MapReduce directly  Metadata store  Contains descriptions of tables
  • 15. Hive Example FROM flightdata_asv INSERT OVERWRITE TABLE origin_counts SELECT origin, COUNT(*) GROUP BY origin INSERT OVERWRITE TABLE dest_counts SELECT dest, COUNT(*) GROUP BY dest
  • 16. Sqoop  Feature allowing import and export from SQL databases  Uses JDBC connector  Works with Windows Azure SQL Database  Table must exist before export
  • 17. Sqoop Example  Exporting a table: sqoop.cmd export –connect "jdbc:sqlserver://sql_database_server.database.windows.net:1433;database= sql_database_instance;user=sqoop_login@sql_database_server;password=s qoop_login_password" --table sql_database_table --export-dir "/user/hive/warehouse/hive_table" --input-fields-terminated-by "001"
  • 18. Excel and Hadoop on Azure  Example of Microsoft business intelligence strategy  Expose Hadoop to existing tools  HiveODBC connector for Excel  Create Hive queries from Excel  Invoke them from Excel
  • 19. More Information  Sign up for preview: http://www.hadooponazure.com  Support: http://social.msdn.microsoft.com/Forums/en-US/hdinsight  Avkash Chauhan’s blog: http://blogs.msdn.com/b/avkashchauhan/archive/tags/hadoop  Roger Jennings’ blog: http://oakleafblog.blogspot.com/2012/04/using-data-in- windows-azure-blobs-with.html
  • 20. Summary  Hadoop:  De-facto solution to the Big Data problem  Windows Azure HDInsight Service  Native Hadoop implementation  Managed Hadoop service for Windows Azure  Currently in preview