Big Data in the Cloud - Montreal April 2015

http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Big Data in the Cloud
Cindy Gross – Technical Fellow: Big Data and Cloud
@SQLCindy
cindyg@NealAnalytics.com
http://smallbitesofbigdata.com

SQLCindy
Cindy Gross
Neal Analytics Technical Fellow: Big Data and Cloud
Follow me on Twitter @SQLCindy
Subscribe to my blog: http://smallbitesofbigdata.com
Connect with me on LinkedIn http://www.linkedin.com/in/cindygross

Key Takeaways
Basic Big Data and Hadoop terminology
What projects fit well with Hadoop
Why Hadoop in the cloud is so Powerful
Sample end-to-end architecture
See: Data, Hadoop, Hive, Streaming, Analytics, BI
Do: Data, Hadoop, Hive, Streaming, Analytics, BI
How this tech solves your business problems

Your Goals
What are your backgrounds and needs?
What is your Big Data experience?
What questions do you have?
What do you want to know by the end of this talk
Meet the people around you

Schedule
830a: Breakfast
915a: Intro, Pre-Reqs
930a: The Big Data Landscape
Noon: Lunch
1p: More Big Data
Done before 5p

Pre-Req: Azure Subscription
Trial: http://azure.microsoft.com/en-us/pricing/free-trial/
MSDN Subscription: http://azure.microsoft.com/en-us/pricing/member-offers/msdn-benefits/
Startup BizSpark: http://azure.microsoft.com/en-us/pricing/member-offers/bizspark-startups/
Classroom: http://www.microsoftazurepass.com/azureu
Pay-As-You-Go or Enterprise Agreement: http://azure.microsoft.com/en-us/pricing/

Pre-Reqs
Azure subscription with available HDInsight cores
Demo file: http://bit.ly/BDApr2015
Download Power Query add-in http://www.microsoft.com/en-
us/download/details.aspx?id=39379&CorrelationId=d8002172-0438-4ef5-b0fa-e635f8f17251
Enable PowerPivot and Power View in Excel options – com add-ins
HOL labs https://github.com/Azure-Readiness/CloudDataCamp “Clone in Desktop” or “Download ZIP”
+ UNZIP
GUI: Install CloudXplorer http://clumsyleaf.com/products/downloads
Cmd line: Install AzCopy http://azure.microsoft.com/en-us/documentation/articles/storage-use-
azcopy/
Install SQL 2014 SSMS http://www.microsoft.com/en-gb/download/details.aspx?id=42299
Today’s slides: http://www.slideshare.net/cindygross1/hadoop-in-the-cloud-montreal-april-2015

What is Big Data?

What do you think Big Data is?

What is Big Data?
It Is
Scale out, distributed processing
Enables elasticity
Encourages exploration
Faster data ingestion
Lower TCO
Empowers self-service BI and analytics
Rapid time to insight
It Is NOT
A well-defined thing
About volume, size
A replacement for everything
The answer to every problem

What is Hadoop? Conceptual View
It Is
A type of Big Data
Just another data source
A loose collection of open source code
Distributed by many
Handles loosely structured data
Write once, read many
It Is Not
Actually a thing!
The only way to do Big Data
Only about data

Basically Available
Soft State
Eventually Consistent
BASE ACID
Atomic
Consistent
Isolated
Durable
BASE - ACID

What is Hadoop? Tech View
http://hortonworks.com/hdp/

End to End Architecture

Microsoft Azure Data Services
Transform
+ analyze
Visualize
+ decide
Capture
+ manage
Data


Demo
VIEW THE AZURE PORTALS
HDINSIGHT: ELASTICITY, QUERY

Microsoft Azure
Source
Data
Real
Time
Microsoft Azure
Azure
Storage
Microsoft Azure
Microsoft Azure
Machine Learning, Analytics, and
Business Intelligence
Internet of Things – Business Insights
Queries
HDInsight
SQL Server
Storage
Storage
Storage
Event Hub
Streaming
Microsoft Azure
Destination
Apps+ Data

Architecture – Use Cloud Building Blocks
Blob Storage or
In Memory
(Landing Zone)
Blob Storage
(Persistent
Storage)
HDInsight
Clusters
(Hive, Pig, etc)
REST
Sqoop
Self-Service
Analytics
Reporting / DW
Curator
Optimized for write throughput
- Many small blobs
- Raw/binary format
- Data kept until curated
- Azure Blob Storage if persisted
- Azure Queues & Workers for in memory
Optimized for query efficiency
- Optimized size (combine blobs)
- Cleansed/masked
- Partitioned
- Well-defined, semi-structured data
Use Case Specific & General Processing
- Data governance requirements (PII scrub)
- Aggregate for efficient storage
- Publish to real-time consumers and long
term storage (Hadoop)
OtherAny Device!

Now You Do It
CLOUD DATA CAMP LAB 1, LAB 9
CREATE: STORAGE, SQL AZURE DATABASE, STREAMING JOB
DO: LOAD DATA, CREATE SCHEMA, GENERATE AND CONSUME “SENSOR” DATA
THANKS TO LARA RUBBELKE FOR DEMOS!

When to Use Hadoop

Typical Big Data Use Cases
Smart meter
monitoring
Equipment
monitoring
Advertising
analysis
Life sciences
research
Fraud
detection
Healthcare
outcomes
Weather
forecasting
Natural resource
exploration
Social network
analysis
Churn
analysis
Traffic flow
optimization
Legal
discovery Telemetry
IT infrastructure
optimization

Hadoop Shines When….
Data exploration, analytics and reporting, new data-driven actionable insights
Rapid iterating
Unknown unknowns
Flexible scaling
Data driven actions for early competitive advantage or first to market
Low number of direct, concurrent users
Low cost data archival

Hadoop Anti-Patterns….
Replace system whose pain points don’t align with Hadoop’s strengths
OLTP needs adequately met by an existing system
Known data with a static schema
Many end users
Interactive response time requirements
Your first Hadoop project + mission critical system

Relational
Database
SCALE (storage & processing)
Hadoop
Platform
schema
speed
governance
best fit use
processing
Required on write Required on read
Reads are fast Writes are fast
Standards and structured Loosely structured
Limited, no data processing Processing coupled with data
data typesStructured Multi and unstructured
Interactive OLAP Analytics
Complex ACID Transactions
Operational Data Store
Data Discovery
Processing unstructured data
Massive Storage/Processing

http://smallbitesofbigdata.cohttp://bit.ly/BDApr2015
Now You Do It
CLOUD DATA CAMP LAB 2
CREATE: HDINSIGHT CLUSTER

Why Hadoop in the
Cloud

Microsoft Hadoop Options
Cloud
HDInsight Service
Windows Azure Storage Blob (WASB)
HDP or Cloudera on VMs (Windows or Linux)
Any distro on VMs (Windows or Linux)
Hybrid / On-Premises
Parallel Data Warehouse (PDW) with Polybase
APS/PDW Hadoop Regions
OneBox for Developers
Hortonworks Data Platform
(HDP for Windows)

Why Hadoop in the Cloud?
Hadoop
It’s easier
You can concentrate on the analytics
WASB: separation of storage and compute
Shared data, globally accessible
Lowers the cost of discovery & innovation
No commitment as you learn
Cloud in General
Today’s disruptor, tomorrow’s reality
Elasticity, capacity
Less infrastructure and implementation work
Lower TCO
Business Continuity
Operational Agility

WASB: Separation of Storage & Compute
Windows Azure Storage Blob (WASB) = separate of storage and compute
Open source code available to any distro
Simplified data access
Reduced data movement
Faster access to new data
Enables ETL even when a cluster isn’t up = lower TCO
Share data concurrently

Why HDInsight
Separation of storage and compute is the default
Varied workloads: Query, Streaming, NoSQL
Elasticity: Node sizes, # of nodes
Committed to openness: Hortonworks, Linux, WASB

Now You Do It
CLOUD DATA CAMP LAB 3
DO: RDP TO HEAD NODE, STRUCTURE/QUERY HIVE WITH HQL
CONNECT: AZUREML, POWER QUERY

So Far….
Basic Big Data and Hadoop terminology
What projects fit well with Hadoop
Why Hadoop in the cloud is so Powerful
Sample end-to-end architecture
Hands-On: Storage, data load, SQL database, Service Bus Event Hub, HDInsight, Hive, AzureML,
Power Query, Power View

Tie It Together

What’s the Goal?
Ask a business question
Find and load data
Explore the data
Iterate
Analyze, Visualize, and/or move the data
Productionalize some, all, or none

Hadoop in the Cloud
Cindy Gross – Technical Fellow: Big Data and Cloud
@SQLCindy
cindyg@NealAnalytics.com

Big Data References
Get started / overview with a free Ebook “Introducing Microsoft Azure HDInsight”
http://blogs.msdn.com/b/microsoft_press/archive/2014/05/27/free-ebook-introducing-
microsoft-azure-hdinsight.aspx
Architect a solution with the Patterns and Practices guide “Developing big data solutions on
Microsoft Azure HDInsight“
http://blogs.msdn.com/b/masashi_narumoto/archive/2014/06/30/new-release-developing-
big-data-solutions-on-microsoft-hdinsight.aspx
The Data Science Laboratory Series is Complete
http://blogs.msdn.com/b/buckwoody/archive/2014/03/24/the-data-science-laboratory-
series-is-complete.aspx

Big Data References
Microsoft Big Data http://microsoft.com/bigdata
HDP for Windows http://hortonworks.com/products/hdp-windows/
Hadoop: The Definitive Guide by Tom White
Programming Hive Book by Capriolo, Wampler, Rutherglen
Big Data Learning Resources http://sqlblog.com/blogs/lara_rubbelke/archive/2012/09/10/big-data-learning-
resources.aspx
Hurricane Sandy Mash-Up: Hive, SQL Server, PowerPivot & Power View
http://blogs.msdn.com/b/cindygross/archive/2013/01/31/mash-up-hive-sql-server-data-in-powerpivot-amp-
power-view-hurricane-sandy-2012.aspx
Twitter Search https://twitter.com/#!/search/%23bigdata
Hive Reference http://hive.apache.org
HDInsight Tutorials http://www.windowsazure.com/en-us/documentation/services/hdinsight/?fb=en-us
Denny Lee http://dennyglee.com/category/bigdata/
Carl Nolan http://blogs.msdn.com/b/carlnol/archive/tags/hadoop+streaming/
Cindy Gross http://tinyurl.com/SmallBitesBigData

Big Data in the Cloud - Montreal April 2015

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Big Data in the Cloud - Montreal April 2015

Ähnlich wie Big Data in the Cloud - Montreal April 2015 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Big Data in the Cloud - Montreal April 2015

Hinweis der Redaktion