SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Downloaden Sie, um offline zu lesen
Big Data on Azure
@davidgiard
David Giard
Microsoft Technical Evangelist
dgiard@Microsoft.com
Davidgiard.com
@davidgiard
@davidgiard
Cloud Computing
Host some or all of your data or application
on a third-party server
in a highly-scalable, highly-reliable way
@davidgiard
Advantages of Cloud Computing
• Lower capital costs
• Flexible operating cost (Rent vs Buy)
• Platform as a Service
• Freedom from infrastructure / hardware
• Redundancy
• Automatic monitoring and failover
@davidgiard
0
1
2
3
4
5
6
7
8
9
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Demand and Capacity
@davidgiard
0
1
2
3
4
5
6
7
8
9
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Demand and Capacity
@davidgiard
0
1
2
3
4
5
6
7
8
9
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Demand and Capacity
@davidgiard
0
1
2
3
4
5
6
7
8
9
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Demand and Capacity
@davidgiard
0
1
2
3
4
5
6
7
8
9
Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri
Demand and Capacity
@davidgiard
0
1
2
3
4
5
6
7
8
9
1:00 2:00 3:00 4:00 5:00 6:00 7:00 8:00 9:00 10:00 11:00 12:00
Demand and Capacity
@davidgiard
0
1
2
3
4
5
6
7
8
9
1:00 2:00 3:00 4:00 5:00 6:00 7:00 8:00 9:00 10:00 11:00 12:00
Big Data Demand
@davidgiard
Cost Factors
• Service
• VM Size
• # VMs
• Time
@davidgiard
HDInsight
@davidgiard
Azure HDInsight
• Microsoft Azure’s big-data solution using Hadoop
• Open-source framework for storing and analyzing massive amounts of data
on clusters built from commodity hardware
• Uses Hadoop Distributed File System (HDFS) for storage
• Employs the open-source Hortonworks Data Platform implementation
of Hadoop
• Includes HBase, Hive, Pig, Storm, Spark, and more
• Integrates with popular BI tools
• Includes Power BI, Excel, SSAS, SSRS, Tableau
@davidgiard
Apache Hadoop on Azure
• Automatic cluster provisioning and configuration
• Bypass an otherwise manual-intensive process
• Cluster scaling
• Change number of nodes without deleting/re-creating the cluster
• High availability/reliability
• Managed solution - 99.9% SLA
• HDInsight includes a secondary head node
• Reliable and economical storage
• HDFS mapped over Azure Blob Storage
• Accessed through “wasb://” protocol prefix
@davidgiard
Lambda Architecture
• Batch Layer
• Speed Layer
• Serving Layer
@davidgiard
Clusters
@davidgiard
Clusters
Blob Storage
@davidgiard
HDInsight Cluster Types
• Hadoop: Query workloads
• Reliable data storage, simple MapReduce
• HBase: NoSQL workloads
• Distributed database offering random access to large amounts of
data
• Apache Storm: Stream workloads
• Real-time analysis of moving data streams
• Apache Spark: High-performance workloads
• In-memory parallel processing
@davidgiard
Cluster Creation
@davidgiard
Cluster Creation
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"clusterName": {
"type": "string",
"metadata": {
"description": "The name of the HDInsight cluster to create."
}
},
"clusterLoginUserName": {
"type": "string",
"defaultValue": "admin",
"metadata": {
"description": "These credentials can be used to submit jobs to the cluster and to log into cluster dashboards."
}
},
"clusterLoginPassword": {
"type": "securestring",
@davidgiard
Demo
@davidgiard
@davidgiard
Storm
• Apache Storm is a distributed, fault-tolerant, open-source computation
system that allows you to process data in real-time with Hadoop.
• Apache Storm on HDInsight allows you to create distributed, real-time
analytics solutions in the Azure environment by using Apache Hadoop.
• Storm solutions can also provide guaranteed processing of data, with the
ability to replay data that was not successfully processed the first time.
• Ability to write Storm components in C#, JAVA and Python.
• Azure Scale up or Scale down without an impact for running Storm
topologies.
• Ease of provision and use in Azure portal.
• Visual Studio project templates for Storm apps
@davidgiard
Storm
• Apache Storm apps are submitted as Topologies.
• A topology is a graph of computation that processes streams
• Stream: An unbound collection of tuples. Streams are produced by spouts
and bolts, and they are consumed by bolts.
• Tuple: A named list of dynamically typed values.
• Spout: Consumes data from a data source and emits one or more streams.
• Bolt: Consumes streams, performs processing on tuples, and may emit
streams. Bolts are also responsible for writing data to external storage,
such as a queue, HDInsight, HBase, a blob, or other data store.
• Nimbus: JobTracker in Hadoop that distribute jobs, monitoring failures.
@davidgiard
Stream
Apache Storm Topology
Event Source
Tuple
(
“timestamp:: 1234567890,
“measurement”: “123”,
“location”: “ABC123”
)
Tuple
{
key1: “value1”,
key2: “value2”,
key3: “value3”,
}
{
key1: “value1”,
key2: “value2”,
key3: “value3”,
}
Tuple Tuple
{
key1: “value1”,
key4: “value4”
}
Bolt
Spout
Bolt Bolt
Bolt
Tuple
{
key1: “value1”,
key2: “value2”,
key3: “value3”,
}
@davidgiard
Demo
@DavidGiard
@davidgiard
HBase
• Apache HBase is an open-source, NoSQL database that is built on Hadoop
and modeled after Google BigTable.
• HBase provides random access and strong consistency for large amounts of
unstructured and semistructured data in a schemaless database organized
by column families
• Data is stored in the rows of a table, and data within a row is grouped by
column family.
• The open-source code scales linearly to handle petabytes of data on
thousands of nodes. It can rely on data redundancy, batch processing, and
other features that are provided by distributed applications in the Hadoop
ecosystem.
@davidgiard
HBase
• HBase Commands:
• create  Equivalent to create table in T-SQL
• get  Equivalent to select statements in T-SQL
• put  Equivalent to update, Insert statement in T-SQL
• scan  Equivalent to select (no where condition) in T-SQL
• delete/deleteall  Equivalent to delete in T-SQL
• HBase shell is your query tool to execute in CRUD commands to a HBase cluster.
• Data can also be managed using the HBase C# API, which provides a client library on top
of the HBase REST API.
• An HBase database can also be queried by using Hive.
@davidgiard
HBase
RowKey a:1 a:2 a:3 a:4 b:1 b:2 c:numA
982069 10 20 30 40 5 7 4
926025 9 11 21 4 9 3
254114 11 15 22 35 7 11 4
881514 8 14 2 3 2
Column family “a”
Column family “b”
Column family “c”
@davidgiard
HBase
RowKey a:1 a:2 a:3 a:4 b:1 b:2 c:numA
982069 10 20 30 40 5 7 4
926025 9 11 21 4 9 3
254114 11 15 22 35 7 11 4
881514 8 14 2 3 2
Column family “temperature”
Column family
“pressure” Column family
“avg_temp”
@DavidGiard
@davidgiard
Hive
• Apache Hive is a data warehouse system for Hadoop, which enables data
summarization, querying, and analysis of data by using HiveQL (a query
language similar to SQL).
• Hive understands how to work with structured and semi-structured data,
such as text files where the fields are delimited by specific characters.
• Hive also supports custom serializer/deserializers for complex or
irregularly structured data.
• Hive can also be extended through user-defined functions (UDF).
• A UDF allows you to implement functionality or logic that isn't easily
modeled in HiveQL.
@davidgiard
HiveQL
# Number of Records
SELECT COUNT(1) FROM www_access;
# Number of Unique IPs
SELECT COUNT(1) FROM ( 
SELECT DISTINCT ip FROM www_access 
) t;
# Number of Unique IPs that Accessed the Top Page
SELECT COUNT(distinct ip) FROM www_access 
WHERE url='/';
# Number of Accesses per Unique IP
SELECT ip, COUNT(1) FROM www_access 
GROUP BY ip LIMIT 30;
# Unique IPs Sorted by Number of Accesses
SELECT ip, COUNT(1) AS cnt FROM www_access 
GROUP BY ip
ORDER BY cnt DESC LIMIT 30;
# Number of Accesses After a Certain Time
SELECT COUNT(1) FROM www_access 
WHERE TD_TIME_RANGE(time, "2011-08-19", NULL, "PDT")
@DavidGiard
@davidgiard
Apache Spark
• Interactive manipulation and visualization of data
• Scala, Python, and R Interactive Shells
• Jupyter Notebook with PySpark (Python) and Spark (Scala) kernels provide in-
browser interaction
• Unified platform for processing multiple workloads
• Real-time processing, Machine Learning, Stream Analytics, Interactive
Querying, Graphing
• Leverages in-memory processing for really big data
• Resilient distributed datasets (RDDs)
• APIs for processing large datasets
• Up to 100x faster than MapReduce
@davidgiard
Spark Components on HDInsight
• Spark Core
• Includes Spark SQL, Spark Streaming,
GraphX, and MLlib
• Anaconda
• Livy
• Jupyter Notebooks
• ODBC Driver for connecting from BI tools
(Power BI, Tableau)
@davidgiard
Jupyter Notebooks on HDInsight
• Browser-based interface for working with text, code, equations, plots,
graphics, and interactive controls in a single document.
• Include preset Spark and Hive contexts (sc and sqlContext)
@davidgiard
Demo
@davidgiard
Items of Note About HDInsight
• There is no “suspend” on HDInsight clusters
• Provision the cluster, do work, then delete the cluster to avoid unnecessary
charges
• Storage can be decoupled from the cluster and reused across deployments
• Can deploy from the portal, but often scripted in practice
• Easier/repeatable creation and deletion
@davidgiard
Get Started
azure.com
@DavidGiard
Thank you!
@DavidGiard
Links
www.slideshare.net/dgiard/big-data-on-azure-70554456
github.com/MSFTImagine/computerscience/tree/master/Workshop/7.%20HDInsight

Weitere ähnliche Inhalte

Was ist angesagt?

Tracking crime as it occurs with apache phoenix, apache hbase and apache nifi
Tracking crime as it occurs with apache phoenix, apache hbase and apache nifiTracking crime as it occurs with apache phoenix, apache hbase and apache nifi
Tracking crime as it occurs with apache phoenix, apache hbase and apache nifiTimothy Spann
 
Apache deep learning 202 Washington DC - DWS 2019
Apache deep learning 202   Washington DC - DWS 2019Apache deep learning 202   Washington DC - DWS 2019
Apache deep learning 202 Washington DC - DWS 2019Timothy Spann
 
RHTE2015_CloudForms_Containers
RHTE2015_CloudForms_ContainersRHTE2015_CloudForms_Containers
RHTE2015_CloudForms_ContainersJerome Marc
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Databricks
 
Open Source Applied - Real World Use Cases
Open Source Applied - Real World Use CasesOpen Source Applied - Real World Use Cases
Open Source Applied - Real World Use CasesAll Things Open
 
Death of the dumb pipes: Using Apache Kafka® for Integration projects
Death of the dumb pipes: Using Apache Kafka® for Integration projectsDeath of the dumb pipes: Using Apache Kafka® for Integration projects
Death of the dumb pipes: Using Apache Kafka® for Integration projectsHostedbyConfluent
 
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...HostedbyConfluent
 
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs
Flink Forward
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...HostedbyConfluent
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamFlink Forward
 
Solving Hybrid Cloud Data Replication with Apache Cassandra
Solving Hybrid Cloud Data Replication with Apache CassandraSolving Hybrid Cloud Data Replication with Apache Cassandra
Solving Hybrid Cloud Data Replication with Apache CassandraAaron Ploetz
 
Kafka Summit 2019 Microservice Orchestration
Kafka Summit 2019 Microservice OrchestrationKafka Summit 2019 Microservice Orchestration
Kafka Summit 2019 Microservice Orchestrationlarsfrancke
 
Ted Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink DriftTed Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink DriftFlink Forward
 
Building a modern end-to-end open source Big Data reference application
Building a modern end-to-end open source Big Data reference applicationBuilding a modern end-to-end open source Big Data reference application
Building a modern end-to-end open source Big Data reference applicationDataWorks Summit
 
Spark at Airbnb
Spark at AirbnbSpark at Airbnb
Spark at AirbnbHao Wang
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...HostedbyConfluent
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...HostedbyConfluent
 

Was ist angesagt? (20)

Tracking crime as it occurs with apache phoenix, apache hbase and apache nifi
Tracking crime as it occurs with apache phoenix, apache hbase and apache nifiTracking crime as it occurs with apache phoenix, apache hbase and apache nifi
Tracking crime as it occurs with apache phoenix, apache hbase and apache nifi
 
Apache deep learning 202 Washington DC - DWS 2019
Apache deep learning 202   Washington DC - DWS 2019Apache deep learning 202   Washington DC - DWS 2019
Apache deep learning 202 Washington DC - DWS 2019
 
RHTE2015_CloudForms_Containers
RHTE2015_CloudForms_ContainersRHTE2015_CloudForms_Containers
RHTE2015_CloudForms_Containers
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
 
Open Source Applied - Real World Use Cases
Open Source Applied - Real World Use CasesOpen Source Applied - Real World Use Cases
Open Source Applied - Real World Use Cases
 
Death of the dumb pipes: Using Apache Kafka® for Integration projects
Death of the dumb pipes: Using Apache Kafka® for Integration projectsDeath of the dumb pipes: Using Apache Kafka® for Integration projects
Death of the dumb pipes: Using Apache Kafka® for Integration projects
 
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...
 
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs

 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and Beam
 
Solving Hybrid Cloud Data Replication with Apache Cassandra
Solving Hybrid Cloud Data Replication with Apache CassandraSolving Hybrid Cloud Data Replication with Apache Cassandra
Solving Hybrid Cloud Data Replication with Apache Cassandra
 
Kafka Summit 2019 Microservice Orchestration
Kafka Summit 2019 Microservice OrchestrationKafka Summit 2019 Microservice Orchestration
Kafka Summit 2019 Microservice Orchestration
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Ted Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink DriftTed Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink Drift
 
Building a modern end-to-end open source Big Data reference application
Building a modern end-to-end open source Big Data reference applicationBuilding a modern end-to-end open source Big Data reference application
Building a modern end-to-end open source Big Data reference application
 
Spark at Airbnb
Spark at AirbnbSpark at Airbnb
Spark at Airbnb
 
Apache flink
Apache flinkApache flink
Apache flink
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
Qlik and Confluent Success Stories with Kafka - How Generali and Skechers Kee...
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
 

Andere mochten auch

One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)
One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)
One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)Marius Zaharia
 
Migrating to Continuous Delivery with TFS 2017 - Liviu Mandras-Iura
 Migrating to Continuous Delivery with TFS 2017 - Liviu Mandras-Iura Migrating to Continuous Delivery with TFS 2017 - Liviu Mandras-Iura
Migrating to Continuous Delivery with TFS 2017 - Liviu Mandras-IuraITCamp
 
The fight for surviving in the IoT world - Radu Vunvulea
The fight for surviving in the IoT world - Radu VunvuleaThe fight for surviving in the IoT world - Radu Vunvulea
The fight for surviving in the IoT world - Radu VunvuleaITCamp
 
Building Powerful Applications with AngularJS 2 and TypeScript - David Giard
Building Powerful Applications with AngularJS 2 and TypeScript - David GiardBuilding Powerful Applications with AngularJS 2 and TypeScript - David Giard
Building Powerful Applications with AngularJS 2 and TypeScript - David GiardITCamp
 
The best of Hyper-V 2016 - Thomas Maurer
 The best of Hyper-V 2016 - Thomas Maurer The best of Hyper-V 2016 - Thomas Maurer
The best of Hyper-V 2016 - Thomas MaurerITCamp
 
Columnstore indexes - best practices for the ETL process - Damian Widera
Columnstore indexes - best practices for the ETL process - Damian WideraColumnstore indexes - best practices for the ETL process - Damian Widera
Columnstore indexes - best practices for the ETL process - Damian WideraITCamp
 
How to secure and manage modern IT - Ondrej Vysek
 How to secure and manage modern IT - Ondrej Vysek How to secure and manage modern IT - Ondrej Vysek
How to secure and manage modern IT - Ondrej VysekITCamp
 
Testing your PowerShell code with Pester - Florin Loghiade
Testing your PowerShell code with Pester - Florin LoghiadeTesting your PowerShell code with Pester - Florin Loghiade
Testing your PowerShell code with Pester - Florin LoghiadeITCamp
 
From Developer to Data Scientist - Gaines Kergosien
From Developer to Data Scientist - Gaines KergosienFrom Developer to Data Scientist - Gaines Kergosien
From Developer to Data Scientist - Gaines KergosienITCamp
 
The Vision of Computer Vision: The bold promise of teaching computers to unde...
The Vision of Computer Vision: The bold promise of teaching computers to unde...The Vision of Computer Vision: The bold promise of teaching computers to unde...
The Vision of Computer Vision: The bold promise of teaching computers to unde...ITCamp
 
The Secret of Engaging Presentations - Boris Hristov
The Secret of Engaging Presentations - Boris HristovThe Secret of Engaging Presentations - Boris Hristov
The Secret of Engaging Presentations - Boris HristovITCamp
 
The best of Windows Server 2016 - Thomas Maurer
 The best of Windows Server 2016 - Thomas Maurer The best of Windows Server 2016 - Thomas Maurer
The best of Windows Server 2016 - Thomas MaurerITCamp
 
Forget Process, Focus on People - Peter Leeson
Forget Process, Focus on People - Peter LeesonForget Process, Focus on People - Peter Leeson
Forget Process, Focus on People - Peter LeesonITCamp
 
Scaling face recognition with big data - Bogdan Bocse
 Scaling face recognition with big data - Bogdan Bocse Scaling face recognition with big data - Bogdan Bocse
Scaling face recognition with big data - Bogdan BocseITCamp
 
#NoAgile - Dan Suciu
 #NoAgile - Dan Suciu #NoAgile - Dan Suciu
#NoAgile - Dan SuciuITCamp
 
Great all this new stuff, but how do I convince my management - Erwin Derksen
 Great all this new stuff, but how do I convince my management - Erwin Derksen Great all this new stuff, but how do I convince my management - Erwin Derksen
Great all this new stuff, but how do I convince my management - Erwin DerksenITCamp
 
ITCamp 2017 - Raffaele Rialdi - A Deep Dive Into Bridging Node-js with .NET Core
ITCamp 2017 - Raffaele Rialdi - A Deep Dive Into Bridging Node-js with .NET CoreITCamp 2017 - Raffaele Rialdi - A Deep Dive Into Bridging Node-js with .NET Core
ITCamp 2017 - Raffaele Rialdi - A Deep Dive Into Bridging Node-js with .NET CoreITCamp
 
ITCamp 2017 - Florin Coros - Decide between In-Process or Inter-Processes Com...
ITCamp 2017 - Florin Coros - Decide between In-Process or Inter-Processes Com...ITCamp 2017 - Florin Coros - Decide between In-Process or Inter-Processes Com...
ITCamp 2017 - Florin Coros - Decide between In-Process or Inter-Processes Com...ITCamp
 
Serverless Single Page Apps with React and Redux at ItCamp 2017
Serverless Single Page Apps with React and Redux at ItCamp 2017Serverless Single Page Apps with React and Redux at ItCamp 2017
Serverless Single Page Apps with React and Redux at ItCamp 2017Melania Andrisan (Danciu)
 
Provisioning Windows instances at scale on Azure, AWS and OpenStack - Adrian ...
Provisioning Windows instances at scale on Azure, AWS and OpenStack - Adrian ...Provisioning Windows instances at scale on Azure, AWS and OpenStack - Adrian ...
Provisioning Windows instances at scale on Azure, AWS and OpenStack - Adrian ...ITCamp
 

Andere mochten auch (20)

One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)
One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)
One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)
 
Migrating to Continuous Delivery with TFS 2017 - Liviu Mandras-Iura
 Migrating to Continuous Delivery with TFS 2017 - Liviu Mandras-Iura Migrating to Continuous Delivery with TFS 2017 - Liviu Mandras-Iura
Migrating to Continuous Delivery with TFS 2017 - Liviu Mandras-Iura
 
The fight for surviving in the IoT world - Radu Vunvulea
The fight for surviving in the IoT world - Radu VunvuleaThe fight for surviving in the IoT world - Radu Vunvulea
The fight for surviving in the IoT world - Radu Vunvulea
 
Building Powerful Applications with AngularJS 2 and TypeScript - David Giard
Building Powerful Applications with AngularJS 2 and TypeScript - David GiardBuilding Powerful Applications with AngularJS 2 and TypeScript - David Giard
Building Powerful Applications with AngularJS 2 and TypeScript - David Giard
 
The best of Hyper-V 2016 - Thomas Maurer
 The best of Hyper-V 2016 - Thomas Maurer The best of Hyper-V 2016 - Thomas Maurer
The best of Hyper-V 2016 - Thomas Maurer
 
Columnstore indexes - best practices for the ETL process - Damian Widera
Columnstore indexes - best practices for the ETL process - Damian WideraColumnstore indexes - best practices for the ETL process - Damian Widera
Columnstore indexes - best practices for the ETL process - Damian Widera
 
How to secure and manage modern IT - Ondrej Vysek
 How to secure and manage modern IT - Ondrej Vysek How to secure and manage modern IT - Ondrej Vysek
How to secure and manage modern IT - Ondrej Vysek
 
Testing your PowerShell code with Pester - Florin Loghiade
Testing your PowerShell code with Pester - Florin LoghiadeTesting your PowerShell code with Pester - Florin Loghiade
Testing your PowerShell code with Pester - Florin Loghiade
 
From Developer to Data Scientist - Gaines Kergosien
From Developer to Data Scientist - Gaines KergosienFrom Developer to Data Scientist - Gaines Kergosien
From Developer to Data Scientist - Gaines Kergosien
 
The Vision of Computer Vision: The bold promise of teaching computers to unde...
The Vision of Computer Vision: The bold promise of teaching computers to unde...The Vision of Computer Vision: The bold promise of teaching computers to unde...
The Vision of Computer Vision: The bold promise of teaching computers to unde...
 
The Secret of Engaging Presentations - Boris Hristov
The Secret of Engaging Presentations - Boris HristovThe Secret of Engaging Presentations - Boris Hristov
The Secret of Engaging Presentations - Boris Hristov
 
The best of Windows Server 2016 - Thomas Maurer
 The best of Windows Server 2016 - Thomas Maurer The best of Windows Server 2016 - Thomas Maurer
The best of Windows Server 2016 - Thomas Maurer
 
Forget Process, Focus on People - Peter Leeson
Forget Process, Focus on People - Peter LeesonForget Process, Focus on People - Peter Leeson
Forget Process, Focus on People - Peter Leeson
 
Scaling face recognition with big data - Bogdan Bocse
 Scaling face recognition with big data - Bogdan Bocse Scaling face recognition with big data - Bogdan Bocse
Scaling face recognition with big data - Bogdan Bocse
 
#NoAgile - Dan Suciu
 #NoAgile - Dan Suciu #NoAgile - Dan Suciu
#NoAgile - Dan Suciu
 
Great all this new stuff, but how do I convince my management - Erwin Derksen
 Great all this new stuff, but how do I convince my management - Erwin Derksen Great all this new stuff, but how do I convince my management - Erwin Derksen
Great all this new stuff, but how do I convince my management - Erwin Derksen
 
ITCamp 2017 - Raffaele Rialdi - A Deep Dive Into Bridging Node-js with .NET Core
ITCamp 2017 - Raffaele Rialdi - A Deep Dive Into Bridging Node-js with .NET CoreITCamp 2017 - Raffaele Rialdi - A Deep Dive Into Bridging Node-js with .NET Core
ITCamp 2017 - Raffaele Rialdi - A Deep Dive Into Bridging Node-js with .NET Core
 
ITCamp 2017 - Florin Coros - Decide between In-Process or Inter-Processes Com...
ITCamp 2017 - Florin Coros - Decide between In-Process or Inter-Processes Com...ITCamp 2017 - Florin Coros - Decide between In-Process or Inter-Processes Com...
ITCamp 2017 - Florin Coros - Decide between In-Process or Inter-Processes Com...
 
Serverless Single Page Apps with React and Redux at ItCamp 2017
Serverless Single Page Apps with React and Redux at ItCamp 2017Serverless Single Page Apps with React and Redux at ItCamp 2017
Serverless Single Page Apps with React and Redux at ItCamp 2017
 
Provisioning Windows instances at scale on Azure, AWS and OpenStack - Adrian ...
Provisioning Windows instances at scale on Azure, AWS and OpenStack - Adrian ...Provisioning Windows instances at scale on Azure, AWS and OpenStack - Adrian ...
Provisioning Windows instances at scale on Azure, AWS and OpenStack - Adrian ...
 

Ähnlich wie Big Data Solutions in Azure - David Giard

Big Data on azure
Big Data on azureBig Data on azure
Big Data on azureDavid Giard
 
Best Practices: Hadoop migration to Azure HDInsight
Best Practices: Hadoop migration to Azure HDInsightBest Practices: Hadoop migration to Azure HDInsight
Best Practices: Hadoop migration to Azure HDInsightRevin Chalil
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
Cascading introduction
Cascading introductionCascading introduction
Cascading introductionAlex Su
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real WorldJeremy Hanna
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...Facultad de Informática UCM
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWSPaolo latella
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupVictor Coustenoble
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
 
Survey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeSurvey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeIke Ellis
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopAvkash Chauhan
 
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...PROIDEA
 
Webinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data LayerWebinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data LayerIBM Cloud Data Services
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventTrivadis
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introductionfardinjamshidi
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraVictor Coustenoble
 

Ähnlich wie Big Data Solutions in Azure - David Giard (20)

Big Data on azure
Big Data on azureBig Data on azure
Big Data on azure
 
Best Practices: Hadoop migration to Azure HDInsight
Best Practices: Hadoop migration to Azure HDInsightBest Practices: Hadoop migration to Azure HDInsight
Best Practices: Hadoop migration to Azure HDInsight
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
Cascading introduction
Cascading introductionCascading introduction
Cascading introduction
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWS
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Survey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data LandscapeSurvey of the Microsoft Azure Data Landscape
Survey of the Microsoft Azure Data Landscape
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
 
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
4Developers 2018: Przetwarzanie Big Data w oparciu o architekturę Lambda na p...
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Webinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data LayerWebinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data Layer
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 

Mehr von ITCamp

ITCamp 2019 - Stacey M. Jenkins - Protecting your company's data - By psychol...
ITCamp 2019 - Stacey M. Jenkins - Protecting your company's data - By psychol...ITCamp 2019 - Stacey M. Jenkins - Protecting your company's data - By psychol...
ITCamp 2019 - Stacey M. Jenkins - Protecting your company's data - By psychol...ITCamp
 
ITCamp 2019 - Silviu Niculita - Supercharge your AI efforts with the use of A...
ITCamp 2019 - Silviu Niculita - Supercharge your AI efforts with the use of A...ITCamp 2019 - Silviu Niculita - Supercharge your AI efforts with the use of A...
ITCamp 2019 - Silviu Niculita - Supercharge your AI efforts with the use of A...ITCamp
 
ITCamp 2019 - Peter Leeson - Managing Skills
ITCamp 2019 - Peter Leeson - Managing SkillsITCamp 2019 - Peter Leeson - Managing Skills
ITCamp 2019 - Peter Leeson - Managing SkillsITCamp
 
ITCamp 2019 - Mihai Tataran - Governing your Cloud Resources
ITCamp 2019 - Mihai Tataran - Governing your Cloud ResourcesITCamp 2019 - Mihai Tataran - Governing your Cloud Resources
ITCamp 2019 - Mihai Tataran - Governing your Cloud ResourcesITCamp
 
ITCamp 2019 - Ivana Milicic - Color - The Shadow Ruler of UX
ITCamp 2019 - Ivana Milicic - Color - The Shadow Ruler of UXITCamp 2019 - Ivana Milicic - Color - The Shadow Ruler of UX
ITCamp 2019 - Ivana Milicic - Color - The Shadow Ruler of UXITCamp
 
ITCamp 2019 - Florin Coros - Implementing Clean Architecture
ITCamp 2019 - Florin Coros - Implementing Clean ArchitectureITCamp 2019 - Florin Coros - Implementing Clean Architecture
ITCamp 2019 - Florin Coros - Implementing Clean ArchitectureITCamp
 
ITCamp 2019 - Florin Loghiade - Azure Kubernetes in Production - Field notes...
ITCamp 2019 - Florin Loghiade -  Azure Kubernetes in Production - Field notes...ITCamp 2019 - Florin Loghiade -  Azure Kubernetes in Production - Field notes...
ITCamp 2019 - Florin Loghiade - Azure Kubernetes in Production - Field notes...ITCamp
 
ITCamp 2019 - Florin Flestea - How 3rd Level support experience influenced m...
ITCamp 2019 - Florin Flestea -  How 3rd Level support experience influenced m...ITCamp 2019 - Florin Flestea -  How 3rd Level support experience influenced m...
ITCamp 2019 - Florin Flestea - How 3rd Level support experience influenced m...ITCamp
 
ITCamp 2019 - Emil Craciun - RoboRestaurant of the future powered by serverle...
ITCamp 2019 - Emil Craciun - RoboRestaurant of the future powered by serverle...ITCamp 2019 - Emil Craciun - RoboRestaurant of the future powered by serverle...
ITCamp 2019 - Emil Craciun - RoboRestaurant of the future powered by serverle...ITCamp
 
ITCamp 2019 - Eldert Grootenboer - Cloud Architecture Recipes for The Enterprise
ITCamp 2019 - Eldert Grootenboer - Cloud Architecture Recipes for The EnterpriseITCamp 2019 - Eldert Grootenboer - Cloud Architecture Recipes for The Enterprise
ITCamp 2019 - Eldert Grootenboer - Cloud Architecture Recipes for The EnterpriseITCamp
 
ITCamp 2019 - Cristiana Fernbach - Blockchain Legal Trends
ITCamp 2019 - Cristiana Fernbach - Blockchain Legal TrendsITCamp 2019 - Cristiana Fernbach - Blockchain Legal Trends
ITCamp 2019 - Cristiana Fernbach - Blockchain Legal TrendsITCamp
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp
 
ITCamp 2019 - Andy Cross - Business Outcomes from AI
ITCamp 2019 - Andy Cross - Business Outcomes from AIITCamp 2019 - Andy Cross - Business Outcomes from AI
ITCamp 2019 - Andy Cross - Business Outcomes from AIITCamp
 
ITCamp 2019 - Andrea Saltarello - Modernise your app. The Cloud Story
ITCamp 2019 - Andrea Saltarello - Modernise your app. The Cloud StoryITCamp 2019 - Andrea Saltarello - Modernise your app. The Cloud Story
ITCamp 2019 - Andrea Saltarello - Modernise your app. The Cloud StoryITCamp
 
ITCamp 2019 - Andrea Saltarello - Implementing bots and Alexa skills using Az...
ITCamp 2019 - Andrea Saltarello - Implementing bots and Alexa skills using Az...ITCamp 2019 - Andrea Saltarello - Implementing bots and Alexa skills using Az...
ITCamp 2019 - Andrea Saltarello - Implementing bots and Alexa skills using Az...ITCamp
 
ITCamp 2019 - Alex Mang - I'm Confused Should I Orchestrate my Containers on ...
ITCamp 2019 - Alex Mang - I'm Confused Should I Orchestrate my Containers on ...ITCamp 2019 - Alex Mang - I'm Confused Should I Orchestrate my Containers on ...
ITCamp 2019 - Alex Mang - I'm Confused Should I Orchestrate my Containers on ...ITCamp
 
ITCamp 2019 - Alex Mang - How Far Can Serverless Actually Go Now
ITCamp 2019 - Alex Mang - How Far Can Serverless Actually Go NowITCamp 2019 - Alex Mang - How Far Can Serverless Actually Go Now
ITCamp 2019 - Alex Mang - How Far Can Serverless Actually Go NowITCamp
 
ITCamp 2019 - Peter Leeson - Vitruvian Quality
ITCamp 2019 - Peter Leeson - Vitruvian QualityITCamp 2019 - Peter Leeson - Vitruvian Quality
ITCamp 2019 - Peter Leeson - Vitruvian QualityITCamp
 
ITCamp 2018 - Ciprian Sorlea - Million Dollars Hello World Application
ITCamp 2018 - Ciprian Sorlea - Million Dollars Hello World ApplicationITCamp 2018 - Ciprian Sorlea - Million Dollars Hello World Application
ITCamp 2018 - Ciprian Sorlea - Million Dollars Hello World ApplicationITCamp
 
ITCamp 2018 - Ciprian Sorlea - Enterprise Architectures with TypeScript And F...
ITCamp 2018 - Ciprian Sorlea - Enterprise Architectures with TypeScript And F...ITCamp 2018 - Ciprian Sorlea - Enterprise Architectures with TypeScript And F...
ITCamp 2018 - Ciprian Sorlea - Enterprise Architectures with TypeScript And F...ITCamp
 

Mehr von ITCamp (20)

ITCamp 2019 - Stacey M. Jenkins - Protecting your company's data - By psychol...
ITCamp 2019 - Stacey M. Jenkins - Protecting your company's data - By psychol...ITCamp 2019 - Stacey M. Jenkins - Protecting your company's data - By psychol...
ITCamp 2019 - Stacey M. Jenkins - Protecting your company's data - By psychol...
 
ITCamp 2019 - Silviu Niculita - Supercharge your AI efforts with the use of A...
ITCamp 2019 - Silviu Niculita - Supercharge your AI efforts with the use of A...ITCamp 2019 - Silviu Niculita - Supercharge your AI efforts with the use of A...
ITCamp 2019 - Silviu Niculita - Supercharge your AI efforts with the use of A...
 
ITCamp 2019 - Peter Leeson - Managing Skills
ITCamp 2019 - Peter Leeson - Managing SkillsITCamp 2019 - Peter Leeson - Managing Skills
ITCamp 2019 - Peter Leeson - Managing Skills
 
ITCamp 2019 - Mihai Tataran - Governing your Cloud Resources
ITCamp 2019 - Mihai Tataran - Governing your Cloud ResourcesITCamp 2019 - Mihai Tataran - Governing your Cloud Resources
ITCamp 2019 - Mihai Tataran - Governing your Cloud Resources
 
ITCamp 2019 - Ivana Milicic - Color - The Shadow Ruler of UX
ITCamp 2019 - Ivana Milicic - Color - The Shadow Ruler of UXITCamp 2019 - Ivana Milicic - Color - The Shadow Ruler of UX
ITCamp 2019 - Ivana Milicic - Color - The Shadow Ruler of UX
 
ITCamp 2019 - Florin Coros - Implementing Clean Architecture
ITCamp 2019 - Florin Coros - Implementing Clean ArchitectureITCamp 2019 - Florin Coros - Implementing Clean Architecture
ITCamp 2019 - Florin Coros - Implementing Clean Architecture
 
ITCamp 2019 - Florin Loghiade - Azure Kubernetes in Production - Field notes...
ITCamp 2019 - Florin Loghiade -  Azure Kubernetes in Production - Field notes...ITCamp 2019 - Florin Loghiade -  Azure Kubernetes in Production - Field notes...
ITCamp 2019 - Florin Loghiade - Azure Kubernetes in Production - Field notes...
 
ITCamp 2019 - Florin Flestea - How 3rd Level support experience influenced m...
ITCamp 2019 - Florin Flestea -  How 3rd Level support experience influenced m...ITCamp 2019 - Florin Flestea -  How 3rd Level support experience influenced m...
ITCamp 2019 - Florin Flestea - How 3rd Level support experience influenced m...
 
ITCamp 2019 - Emil Craciun - RoboRestaurant of the future powered by serverle...
ITCamp 2019 - Emil Craciun - RoboRestaurant of the future powered by serverle...ITCamp 2019 - Emil Craciun - RoboRestaurant of the future powered by serverle...
ITCamp 2019 - Emil Craciun - RoboRestaurant of the future powered by serverle...
 
ITCamp 2019 - Eldert Grootenboer - Cloud Architecture Recipes for The Enterprise
ITCamp 2019 - Eldert Grootenboer - Cloud Architecture Recipes for The EnterpriseITCamp 2019 - Eldert Grootenboer - Cloud Architecture Recipes for The Enterprise
ITCamp 2019 - Eldert Grootenboer - Cloud Architecture Recipes for The Enterprise
 
ITCamp 2019 - Cristiana Fernbach - Blockchain Legal Trends
ITCamp 2019 - Cristiana Fernbach - Blockchain Legal TrendsITCamp 2019 - Cristiana Fernbach - Blockchain Legal Trends
ITCamp 2019 - Cristiana Fernbach - Blockchain Legal Trends
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data LakeITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
 
ITCamp 2019 - Andy Cross - Business Outcomes from AI
ITCamp 2019 - Andy Cross - Business Outcomes from AIITCamp 2019 - Andy Cross - Business Outcomes from AI
ITCamp 2019 - Andy Cross - Business Outcomes from AI
 
ITCamp 2019 - Andrea Saltarello - Modernise your app. The Cloud Story
ITCamp 2019 - Andrea Saltarello - Modernise your app. The Cloud StoryITCamp 2019 - Andrea Saltarello - Modernise your app. The Cloud Story
ITCamp 2019 - Andrea Saltarello - Modernise your app. The Cloud Story
 
ITCamp 2019 - Andrea Saltarello - Implementing bots and Alexa skills using Az...
ITCamp 2019 - Andrea Saltarello - Implementing bots and Alexa skills using Az...ITCamp 2019 - Andrea Saltarello - Implementing bots and Alexa skills using Az...
ITCamp 2019 - Andrea Saltarello - Implementing bots and Alexa skills using Az...
 
ITCamp 2019 - Alex Mang - I'm Confused Should I Orchestrate my Containers on ...
ITCamp 2019 - Alex Mang - I'm Confused Should I Orchestrate my Containers on ...ITCamp 2019 - Alex Mang - I'm Confused Should I Orchestrate my Containers on ...
ITCamp 2019 - Alex Mang - I'm Confused Should I Orchestrate my Containers on ...
 
ITCamp 2019 - Alex Mang - How Far Can Serverless Actually Go Now
ITCamp 2019 - Alex Mang - How Far Can Serverless Actually Go NowITCamp 2019 - Alex Mang - How Far Can Serverless Actually Go Now
ITCamp 2019 - Alex Mang - How Far Can Serverless Actually Go Now
 
ITCamp 2019 - Peter Leeson - Vitruvian Quality
ITCamp 2019 - Peter Leeson - Vitruvian QualityITCamp 2019 - Peter Leeson - Vitruvian Quality
ITCamp 2019 - Peter Leeson - Vitruvian Quality
 
ITCamp 2018 - Ciprian Sorlea - Million Dollars Hello World Application
ITCamp 2018 - Ciprian Sorlea - Million Dollars Hello World ApplicationITCamp 2018 - Ciprian Sorlea - Million Dollars Hello World Application
ITCamp 2018 - Ciprian Sorlea - Million Dollars Hello World Application
 
ITCamp 2018 - Ciprian Sorlea - Enterprise Architectures with TypeScript And F...
ITCamp 2018 - Ciprian Sorlea - Enterprise Architectures with TypeScript And F...ITCamp 2018 - Ciprian Sorlea - Enterprise Architectures with TypeScript And F...
ITCamp 2018 - Ciprian Sorlea - Enterprise Architectures with TypeScript And F...
 

Kürzlich hochgeladen

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Big Data Solutions in Azure - David Giard

  • 1. Big Data on Azure
  • 2. @davidgiard David Giard Microsoft Technical Evangelist dgiard@Microsoft.com Davidgiard.com @davidgiard
  • 3. @davidgiard Cloud Computing Host some or all of your data or application on a third-party server in a highly-scalable, highly-reliable way
  • 4. @davidgiard Advantages of Cloud Computing • Lower capital costs • Flexible operating cost (Rent vs Buy) • Platform as a Service • Freedom from infrastructure / hardware • Redundancy • Automatic monitoring and failover
  • 5. @davidgiard 0 1 2 3 4 5 6 7 8 9 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Demand and Capacity
  • 6. @davidgiard 0 1 2 3 4 5 6 7 8 9 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Demand and Capacity
  • 7. @davidgiard 0 1 2 3 4 5 6 7 8 9 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Demand and Capacity
  • 8. @davidgiard 0 1 2 3 4 5 6 7 8 9 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Demand and Capacity
  • 9. @davidgiard 0 1 2 3 4 5 6 7 8 9 Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Demand and Capacity
  • 10. @davidgiard 0 1 2 3 4 5 6 7 8 9 1:00 2:00 3:00 4:00 5:00 6:00 7:00 8:00 9:00 10:00 11:00 12:00 Demand and Capacity
  • 11. @davidgiard 0 1 2 3 4 5 6 7 8 9 1:00 2:00 3:00 4:00 5:00 6:00 7:00 8:00 9:00 10:00 11:00 12:00 Big Data Demand
  • 12. @davidgiard Cost Factors • Service • VM Size • # VMs • Time
  • 14. @davidgiard Azure HDInsight • Microsoft Azure’s big-data solution using Hadoop • Open-source framework for storing and analyzing massive amounts of data on clusters built from commodity hardware • Uses Hadoop Distributed File System (HDFS) for storage • Employs the open-source Hortonworks Data Platform implementation of Hadoop • Includes HBase, Hive, Pig, Storm, Spark, and more • Integrates with popular BI tools • Includes Power BI, Excel, SSAS, SSRS, Tableau
  • 15. @davidgiard Apache Hadoop on Azure • Automatic cluster provisioning and configuration • Bypass an otherwise manual-intensive process • Cluster scaling • Change number of nodes without deleting/re-creating the cluster • High availability/reliability • Managed solution - 99.9% SLA • HDInsight includes a secondary head node • Reliable and economical storage • HDFS mapped over Azure Blob Storage • Accessed through “wasb://” protocol prefix
  • 16. @davidgiard Lambda Architecture • Batch Layer • Speed Layer • Serving Layer
  • 19. @davidgiard HDInsight Cluster Types • Hadoop: Query workloads • Reliable data storage, simple MapReduce • HBase: NoSQL workloads • Distributed database offering random access to large amounts of data • Apache Storm: Stream workloads • Real-time analysis of moving data streams • Apache Spark: High-performance workloads • In-memory parallel processing
  • 21. @davidgiard Cluster Creation { "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#", "contentVersion": "1.0.0.0", "parameters": { "clusterName": { "type": "string", "metadata": { "description": "The name of the HDInsight cluster to create." } }, "clusterLoginUserName": { "type": "string", "defaultValue": "admin", "metadata": { "description": "These credentials can be used to submit jobs to the cluster and to log into cluster dashboards." } }, "clusterLoginPassword": { "type": "securestring",
  • 24. @davidgiard Storm • Apache Storm is a distributed, fault-tolerant, open-source computation system that allows you to process data in real-time with Hadoop. • Apache Storm on HDInsight allows you to create distributed, real-time analytics solutions in the Azure environment by using Apache Hadoop. • Storm solutions can also provide guaranteed processing of data, with the ability to replay data that was not successfully processed the first time. • Ability to write Storm components in C#, JAVA and Python. • Azure Scale up or Scale down without an impact for running Storm topologies. • Ease of provision and use in Azure portal. • Visual Studio project templates for Storm apps
  • 25. @davidgiard Storm • Apache Storm apps are submitted as Topologies. • A topology is a graph of computation that processes streams • Stream: An unbound collection of tuples. Streams are produced by spouts and bolts, and they are consumed by bolts. • Tuple: A named list of dynamically typed values. • Spout: Consumes data from a data source and emits one or more streams. • Bolt: Consumes streams, performs processing on tuples, and may emit streams. Bolts are also responsible for writing data to external storage, such as a queue, HDInsight, HBase, a blob, or other data store. • Nimbus: JobTracker in Hadoop that distribute jobs, monitoring failures.
  • 26. @davidgiard Stream Apache Storm Topology Event Source Tuple ( “timestamp:: 1234567890, “measurement”: “123”, “location”: “ABC123” ) Tuple { key1: “value1”, key2: “value2”, key3: “value3”, } { key1: “value1”, key2: “value2”, key3: “value3”, } Tuple Tuple { key1: “value1”, key4: “value4” } Bolt Spout Bolt Bolt Bolt Tuple { key1: “value1”, key2: “value2”, key3: “value3”, }
  • 29. @davidgiard HBase • Apache HBase is an open-source, NoSQL database that is built on Hadoop and modeled after Google BigTable. • HBase provides random access and strong consistency for large amounts of unstructured and semistructured data in a schemaless database organized by column families • Data is stored in the rows of a table, and data within a row is grouped by column family. • The open-source code scales linearly to handle petabytes of data on thousands of nodes. It can rely on data redundancy, batch processing, and other features that are provided by distributed applications in the Hadoop ecosystem.
  • 30. @davidgiard HBase • HBase Commands: • create  Equivalent to create table in T-SQL • get  Equivalent to select statements in T-SQL • put  Equivalent to update, Insert statement in T-SQL • scan  Equivalent to select (no where condition) in T-SQL • delete/deleteall  Equivalent to delete in T-SQL • HBase shell is your query tool to execute in CRUD commands to a HBase cluster. • Data can also be managed using the HBase C# API, which provides a client library on top of the HBase REST API. • An HBase database can also be queried by using Hive.
  • 31. @davidgiard HBase RowKey a:1 a:2 a:3 a:4 b:1 b:2 c:numA 982069 10 20 30 40 5 7 4 926025 9 11 21 4 9 3 254114 11 15 22 35 7 11 4 881514 8 14 2 3 2 Column family “a” Column family “b” Column family “c”
  • 32. @davidgiard HBase RowKey a:1 a:2 a:3 a:4 b:1 b:2 c:numA 982069 10 20 30 40 5 7 4 926025 9 11 21 4 9 3 254114 11 15 22 35 7 11 4 881514 8 14 2 3 2 Column family “temperature” Column family “pressure” Column family “avg_temp”
  • 34. @davidgiard Hive • Apache Hive is a data warehouse system for Hadoop, which enables data summarization, querying, and analysis of data by using HiveQL (a query language similar to SQL). • Hive understands how to work with structured and semi-structured data, such as text files where the fields are delimited by specific characters. • Hive also supports custom serializer/deserializers for complex or irregularly structured data. • Hive can also be extended through user-defined functions (UDF). • A UDF allows you to implement functionality or logic that isn't easily modeled in HiveQL.
  • 35. @davidgiard HiveQL # Number of Records SELECT COUNT(1) FROM www_access; # Number of Unique IPs SELECT COUNT(1) FROM ( SELECT DISTINCT ip FROM www_access ) t; # Number of Unique IPs that Accessed the Top Page SELECT COUNT(distinct ip) FROM www_access WHERE url='/'; # Number of Accesses per Unique IP SELECT ip, COUNT(1) FROM www_access GROUP BY ip LIMIT 30; # Unique IPs Sorted by Number of Accesses SELECT ip, COUNT(1) AS cnt FROM www_access GROUP BY ip ORDER BY cnt DESC LIMIT 30; # Number of Accesses After a Certain Time SELECT COUNT(1) FROM www_access WHERE TD_TIME_RANGE(time, "2011-08-19", NULL, "PDT")
  • 37. @davidgiard Apache Spark • Interactive manipulation and visualization of data • Scala, Python, and R Interactive Shells • Jupyter Notebook with PySpark (Python) and Spark (Scala) kernels provide in- browser interaction • Unified platform for processing multiple workloads • Real-time processing, Machine Learning, Stream Analytics, Interactive Querying, Graphing • Leverages in-memory processing for really big data • Resilient distributed datasets (RDDs) • APIs for processing large datasets • Up to 100x faster than MapReduce
  • 38. @davidgiard Spark Components on HDInsight • Spark Core • Includes Spark SQL, Spark Streaming, GraphX, and MLlib • Anaconda • Livy • Jupyter Notebooks • ODBC Driver for connecting from BI tools (Power BI, Tableau)
  • 39. @davidgiard Jupyter Notebooks on HDInsight • Browser-based interface for working with text, code, equations, plots, graphics, and interactive controls in a single document. • Include preset Spark and Hive contexts (sc and sqlContext)
  • 41. @davidgiard Items of Note About HDInsight • There is no “suspend” on HDInsight clusters • Provision the cluster, do work, then delete the cluster to avoid unnecessary charges • Storage can be decoupled from the cluster and reused across deployments • Can deploy from the portal, but often scripted in practice • Easier/repeatable creation and deletion