SlideShare ist ein Scribd-Unternehmen logo
1 von 40
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Big Data in the Cloud
Cindy Gross – Technical Fellow: Big Data and Cloud
@SQLCindy
cindyg@NealAnalytics.com
http://smallbitesofbigdata.com
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
SQLCindy
Cindy Gross
Neal Analytics Technical Fellow: Big Data and Cloud
Follow me on Twitter @SQLCindy
Subscribe to my blog: http://smallbitesofbigdata.com
Connect with me on LinkedIn http://www.linkedin.com/in/cindygross
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Key Takeaways
Basic Big Data and Hadoop terminology
What projects fit well with Hadoop
Why Hadoop in the cloud is so Powerful
Sample end-to-end architecture
See: Data, Hadoop, Hive, Streaming, Analytics, BI
Do: Data, Hadoop, Hive, Streaming, Analytics, BI
How this tech solves your business problems
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Your Goals
What are your backgrounds and needs?
What is your Big Data experience?
What questions do you have?
What do you want to know by the end of this talk
Meet the people around you
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Schedule
830a: Breakfast
915a: Intro, Pre-Reqs
930a: The Big Data Landscape
Noon: Lunch
1p: More Big Data
Done before 5p
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Pre-Req: Azure Subscription
Trial: http://azure.microsoft.com/en-us/pricing/free-trial/
MSDN Subscription: http://azure.microsoft.com/en-us/pricing/member-offers/msdn-benefits/
Startup BizSpark: http://azure.microsoft.com/en-us/pricing/member-offers/bizspark-startups/
Classroom: http://www.microsoftazurepass.com/azureu
Pay-As-You-Go or Enterprise Agreement: http://azure.microsoft.com/en-us/pricing/
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Pre-Reqs
Azure subscription with available HDInsight cores
Demo file: http://bit.ly/BDApr2015
Download Power Query add-in http://www.microsoft.com/en-
us/download/details.aspx?id=39379&CorrelationId=d8002172-0438-4ef5-b0fa-e635f8f17251
Enable PowerPivot and Power View in Excel options – com add-ins
HOL labs https://github.com/Azure-Readiness/CloudDataCamp “Clone in Desktop” or “Download ZIP”
+ UNZIP
GUI: Install CloudXplorer http://clumsyleaf.com/products/downloads
Cmd line: Install AzCopy http://azure.microsoft.com/en-us/documentation/articles/storage-use-
azcopy/
Install SQL 2014 SSMS http://www.microsoft.com/en-gb/download/details.aspx?id=42299
Today’s slides: http://www.slideshare.net/cindygross1/hadoop-in-the-cloud-montreal-april-2015
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
What is Big Data?
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
What do you think Big Data is?
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
What is Big Data?
It Is
Scale out, distributed processing
Enables elasticity
Encourages exploration
Faster data ingestion
Lower TCO
Empowers self-service BI and analytics
Rapid time to insight
It Is NOT
A well-defined thing
About volume, size
A replacement for everything
The answer to every problem
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
What is Hadoop? Conceptual View
It Is
A type of Big Data
Just another data source
A loose collection of open source code
Distributed by many
Handles loosely structured data
Write once, read many
It Is Not
Actually a thing!
The only way to do Big Data
Only about data
Basically Available
Soft State
Eventually Consistent
BASE ACID
Atomic
Consistent
Isolated
Durable
BASE - ACID
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
What is Hadoop? Tech View
http://hortonworks.com/hdp/
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
End to End Architecture
Microsoft Azure Data Services
Transform
+ analyze
Visualize
+ decide
Capture
+ manage
Data

http://smallbitesofbigdata.com
Demo
VIEW THE AZURE PORTALS
HDINSIGHT: ELASTICITY, QUERY
Microsoft Azure
Source
Data
Real
Time
Microsoft Azure
Azure
Storage
Microsoft Azure
Microsoft Azure
Machine Learning, Analytics, and
Business Intelligence
Internet of Things – Business Insights
Queries
HDInsight
SQL Server
Storage
Storage
Storage
Event Hub
Streaming
Microsoft Azure
Destination
Apps+ Data
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Architecture – Use Cloud Building Blocks
Blob Storage or
In Memory
(Landing Zone)
Blob Storage
(Persistent
Storage)
HDInsight
Clusters
(Hive, Pig, etc)
REST
Sqoop
Self-Service
Analytics
Reporting / DW
Curator
Optimized for write throughput
- Many small blobs
- Raw/binary format
- Data kept until curated
- Azure Blob Storage if persisted
- Azure Queues & Workers for in memory
Optimized for query efficiency
- Optimized size (combine blobs)
- Cleansed/masked
- Partitioned
- Well-defined, semi-structured data
Use Case Specific & General Processing
- Data governance requirements (PII scrub)
- Aggregate for efficient storage
- Publish to real-time consumers and long
term storage (Hadoop)
OtherAny Device!
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Now You Do It
CLOUD DATA CAMP LAB 1, LAB 9
CREATE: STORAGE, SQL AZURE DATABASE, STREAMING JOB
DO: LOAD DATA, CREATE SCHEMA, GENERATE AND CONSUME “SENSOR” DATA
THANKS TO LARA RUBBELKE FOR DEMOS!
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
When to Use Hadoop
Typical Big Data Use Cases
Smart meter
monitoring
Equipment
monitoring
Advertising
analysis
Life sciences
research
Fraud
detection
Healthcare
outcomes
Weather
forecasting
Natural resource
exploration
Social network
analysis
Churn
analysis
Traffic flow
optimization
Legal
discovery Telemetry
IT infrastructure
optimization
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Hadoop Shines When….
Data exploration, analytics and reporting, new data-driven actionable insights
Rapid iterating
Unknown unknowns
Flexible scaling
Data driven actions for early competitive advantage or first to market
Low number of direct, concurrent users
Low cost data archival
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Hadoop Anti-Patterns….
Replace system whose pain points don’t align with Hadoop’s strengths
OLTP needs adequately met by an existing system
Known data with a static schema
Many end users
Interactive response time requirements
Your first Hadoop project + mission critical system
Relational
Database
SCALE (storage & processing)
Hadoop
Platform
schema
speed
governance
best fit use
processing
Required on write Required on read
Reads are fast Writes are fast
Standards and structured Loosely structured
Limited, no data processing Processing coupled with data
data typesStructured Multi and unstructured
Interactive OLAP Analytics
Complex ACID Transactions
Operational Data Store
Data Discovery
Processing unstructured data
Massive Storage/Processing
http://smallbitesofbigdata.cohttp://bit.ly/BDApr2015
Now You Do It
CLOUD DATA CAMP LAB 2
CREATE: HDINSIGHT CLUSTER
THANKS TO LARA RUBBELKE FOR DEMOS!
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Why Hadoop in the
Cloud
http://smallbitesofbigdata.com
Microsoft Hadoop Options
Cloud
HDInsight Service
Windows Azure Storage Blob (WASB)
HDP or Cloudera on VMs (Windows or Linux)
Any distro on VMs (Windows or Linux)
Hybrid / On-Premises
Parallel Data Warehouse (PDW) with Polybase
APS/PDW Hadoop Regions
OneBox for Developers
Hortonworks Data Platform
(HDP for Windows)
Why Hadoop in the Cloud?
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Why Hadoop in the Cloud?
Hadoop
It’s easier
You can concentrate on the analytics
WASB: separation of storage and compute
Shared data, globally accessible
Lowers the cost of discovery & innovation
No commitment as you learn
Cloud in General
Today’s disruptor, tomorrow’s reality
Elasticity, capacity
Less infrastructure and implementation work
Lower TCO
Business Continuity
Operational Agility
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
WASB: Separation of Storage & Compute
Windows Azure Storage Blob (WASB) = separate of storage and compute
Open source code available to any distro
Simplified data access
Reduced data movement
Faster access to new data
Enables ETL even when a cluster isn’t up = lower TCO
Share data concurrently
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Why HDInsight
Separation of storage and compute is the default
Varied workloads: Query, Streaming, NoSQL
Elasticity: Node sizes, # of nodes
Committed to openness: Hortonworks, Linux, WASB
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Now You Do It
CLOUD DATA CAMP LAB 3
DO: RDP TO HEAD NODE, STRUCTURE/QUERY HIVE WITH HQL
CONNECT: AZUREML, POWER QUERY
THANKS TO LARA RUBBELKE FOR DEMOS!
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
So Far….
Basic Big Data and Hadoop terminology
What projects fit well with Hadoop
Why Hadoop in the cloud is so Powerful
Sample end-to-end architecture
Hands-On: Storage, data load, SQL database, Service Bus Event Hub, HDInsight, Hive, AzureML,
Power Query, Power View
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Tie It Together
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
What’s the Goal?
Ask a business question
Find and load data
Explore the data
Iterate
Analyze, Visualize, and/or move the data
Productionalize some, all, or none
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Key Takeaways
Basic Big Data and Hadoop terminology
What projects fit well with Hadoop
Why Hadoop in the cloud is so Powerful
Sample end-to-end architecture
See: Data, Hadoop, Hive, Streaming, Analytics, BI
Do: Data, Hadoop, Hive, Streaming, Analytics, BI
How this tech solves your business problems
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Hadoop in the Cloud
Cindy Gross – Technical Fellow: Big Data and Cloud
@SQLCindy
cindyg@NealAnalytics.com
http://smallbitesofbigdata.com
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Big Data References
Get started / overview with a free Ebook “Introducing Microsoft Azure HDInsight”
http://blogs.msdn.com/b/microsoft_press/archive/2014/05/27/free-ebook-introducing-
microsoft-azure-hdinsight.aspx
Architect a solution with the Patterns and Practices guide “Developing big data solutions on
Microsoft Azure HDInsight“
http://blogs.msdn.com/b/masashi_narumoto/archive/2014/06/30/new-release-developing-
big-data-solutions-on-microsoft-hdinsight.aspx
The Data Science Laboratory Series is Complete
http://blogs.msdn.com/b/buckwoody/archive/2014/03/24/the-data-science-laboratory-
series-is-complete.aspx
http://smallbitesofbigdata.comhttp://bit.ly/BDApr2015
Big Data References
Microsoft Big Data http://microsoft.com/bigdata
HDP for Windows http://hortonworks.com/products/hdp-windows/
Hadoop: The Definitive Guide by Tom White
Programming Hive Book by Capriolo, Wampler, Rutherglen
Big Data Learning Resources http://sqlblog.com/blogs/lara_rubbelke/archive/2012/09/10/big-data-learning-
resources.aspx
Hurricane Sandy Mash-Up: Hive, SQL Server, PowerPivot & Power View
http://blogs.msdn.com/b/cindygross/archive/2013/01/31/mash-up-hive-sql-server-data-in-powerpivot-amp-
power-view-hurricane-sandy-2012.aspx
Twitter Search https://twitter.com/#!/search/%23bigdata
Hive Reference http://hive.apache.org
HDInsight Tutorials http://www.windowsazure.com/en-us/documentation/services/hdinsight/?fb=en-us
Denny Lee http://dennyglee.com/category/bigdata/
Carl Nolan http://blogs.msdn.com/b/carlnol/archive/tags/hadoop+streaming/
Cindy Gross http://tinyurl.com/SmallBitesBigData

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your DataBig Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your Data
Kai Wähner
 
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku
 

Was ist angesagt? (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
 
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's ToolkitUsing Oracle Big Data Discovey as a Data Scientist's Toolkit
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
 
Big Data Usecases
Big Data UsecasesBig Data Usecases
Big Data Usecases
 
2011 march cloud computing atlanta
2011 march cloud computing atlanta2011 march cloud computing atlanta
2011 march cloud computing atlanta
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data Lösungen
 
Big Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your DataBig Data beyond Apache Hadoop - How to integrate ALL your Data
Big Data beyond Apache Hadoop - How to integrate ALL your Data
 
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
 
Introduction to big data and apache spark
Introduction to big data and apache sparkIntroduction to big data and apache spark
Introduction to big data and apache spark
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Data Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerData Science with Hadoop: A Primer
Data Science with Hadoop: A Primer
 
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
 
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case StudyOracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
 
Understanding Big Data for policy professionals
Understanding Big Data for policy professionalsUnderstanding Big Data for policy professionals
Understanding Big Data for policy professionals
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
 
Dataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin BuzzwordsDataiku Flow and dctc - Berlin Buzzwords
Dataiku Flow and dctc - Berlin Buzzwords
 
OWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - DataikuOWF 2014 - Take back control of your Web tracking - Dataiku
OWF 2014 - Take back control of your Web tracking - Dataiku
 
The Ecosystem is too damn big
The Ecosystem is too damn big The Ecosystem is too damn big
The Ecosystem is too damn big
 
Surviving the Hadoop Revolution
Surviving the Hadoop RevolutionSurviving the Hadoop Revolution
Surviving the Hadoop Revolution
 
The DBA Is Dead (Again). Long Live the DBA !
The DBA Is Dead (Again). Long Live the DBA !The DBA Is Dead (Again). Long Live the DBA !
The DBA Is Dead (Again). Long Live the DBA !
 

Andere mochten auch

A cloud-based Personal Learning Environment & Network (PLE&N) for peer-based ...
A cloud-based Personal Learning Environment & Network (PLE&N) for peer-based ...A cloud-based Personal Learning Environment & Network (PLE&N) for peer-based ...
A cloud-based Personal Learning Environment & Network (PLE&N) for peer-based ...
2016
 
Cultura Santiago Del Estero
Cultura Santiago Del EsteroCultura Santiago Del Estero
Cultura Santiago Del Estero
Arani29
 
Visualising the tabular model for power view upload
Visualising the tabular model for power view uploadVisualising the tabular model for power view upload
Visualising the tabular model for power view upload
Jen Stirrup
 

Andere mochten auch (20)

Solving the Internet of Things - AWS IoT Web Day
Solving the Internet of Things - AWS IoT Web Day Solving the Internet of Things - AWS IoT Web Day
Solving the Internet of Things - AWS IoT Web Day
 
Resumé Tradicional
Resumé Tradicional Resumé Tradicional
Resumé Tradicional
 
Tutorial icloud
Tutorial icloudTutorial icloud
Tutorial icloud
 
SVT Cloud Security Service 2013
SVT Cloud Security Service 2013SVT Cloud Security Service 2013
SVT Cloud Security Service 2013
 
A cloud-based Personal Learning Environment & Network (PLE&N) for peer-based ...
A cloud-based Personal Learning Environment & Network (PLE&N) for peer-based ...A cloud-based Personal Learning Environment & Network (PLE&N) for peer-based ...
A cloud-based Personal Learning Environment & Network (PLE&N) for peer-based ...
 
Cultura Santiago Del Estero
Cultura Santiago Del EsteroCultura Santiago Del Estero
Cultura Santiago Del Estero
 
G-Air UP Demo
G-Air UP DemoG-Air UP Demo
G-Air UP Demo
 
Separatism in Quebec
Separatism in QuebecSeparatism in Quebec
Separatism in Quebec
 
LA TUNICA DE COLORES
LA TUNICA DE COLORESLA TUNICA DE COLORES
LA TUNICA DE COLORES
 
Open Scotland - Opening up education across Scotland
Open Scotland - Opening up education across ScotlandOpen Scotland - Opening up education across Scotland
Open Scotland - Opening up education across Scotland
 
Cloud Computing in Alaska
Cloud Computing in AlaskaCloud Computing in Alaska
Cloud Computing in Alaska
 
Cloud networking workshop
Cloud networking workshopCloud networking workshop
Cloud networking workshop
 
Swindon- Talk on Cloud
Swindon- Talk on CloudSwindon- Talk on Cloud
Swindon- Talk on Cloud
 
Directorio de correos
Directorio de correosDirectorio de correos
Directorio de correos
 
[USI] Lambda-Architecture : comment réconcilier BigData et temps-réel
[USI] Lambda-Architecture : comment réconcilier BigData et temps-réel[USI] Lambda-Architecture : comment réconcilier BigData et temps-réel
[USI] Lambda-Architecture : comment réconcilier BigData et temps-réel
 
Visualising the tabular model for power view upload
Visualising the tabular model for power view uploadVisualising the tabular model for power view upload
Visualising the tabular model for power view upload
 
Building big data solutions on azure
Building big data solutions on azureBuilding big data solutions on azure
Building big data solutions on azure
 
Cloud Computing - CITE
Cloud Computing - CITECloud Computing - CITE
Cloud Computing - CITE
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 

Ähnlich wie Big Data in the Cloud - Montreal April 2015

Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
Fang Mac
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on Hadoop
DataWorks Summit
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
Andrew Brust
 

Ähnlich wie Big Data in the Cloud - Montreal April 2015 (20)

Data science big data and analytics
Data science big data and analyticsData science big data and analytics
Data science big data and analytics
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
Big dataanalyticsinthecloud
Big dataanalyticsinthecloudBig dataanalyticsinthecloud
Big dataanalyticsinthecloud
 
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive ArchitectureHadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
Hadoop Hive Tutorial | Hive Fundamentals | Hive Architecture
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on Hadoop
 
Traditional data word
Traditional data wordTraditional data word
Traditional data word
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
Prague data management meetup #30 2019-10-04
Prague data management meetup #30 2019-10-04Prague data management meetup #30 2019-10-04
Prague data management meetup #30 2019-10-04
 
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 Migration and Coexistence between Relational and NoSQL Databases by Manuel H... Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
Migration and Coexistence between Relational and NoSQL Databases by Manuel H...
 
Final deck
Final deckFinal deck
Final deck
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
Microsoft Ignite AU 2017 - Orchestrating Big Data Pipelines with Azure Data F...
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Big Data in the Cloud - Montreal April 2015

Hinweis der Redaktion

  1. Azure Subscription: http://youtu.be/lSxMtmRE114 Create HDInsight Cluster in Azure Portal http://smallbitesofbigdata.com/archive/2015/02/26/create-hdinsight-cluster-in-azure-portal.aspx
  2. Atomic: Everything in a transaction succeeds or the entire transaction is rolled back. Consistent: A transaction cannot leave the database in an inconsistent state. Isolated: Transactions cannot interfere with each other. Durable: Completed transactions persist, even when servers restart etc.
  3. Presenter guidance: Share how we think about the data platform in the cloud. Today, we’ll specifically talk about SQL in a VM (briefly), SQL DB, DocumentDB, HBase on HDInsight, and Tables/Blobs. There are lots of other adjacent services such as Redis Cache, Event Hubs, HDInsight, Azure ML, Data Factory, Stream Analytics that will not be addressed in this deck. Slide talk track: The top row is Power BI – you’re making decisions based on data The middle row is ML, Stream Analytics, HDInsight, and Data Factory – processing and making sense of the data The bottom row is where you ingest and store data - With Azure, organizations have access to a whole range of services that allow them to use the right tool for the right job when developing applications. In the cloud, organizations can collect and manage data in the form in which it’s born and store it in the form that best suits an application’s needs.
  4. They have a very simple architecture. Xbox consoles send raw data to a landing zone (it may spill to disk/blob storage). They process each small file as it lands, keep it until curation finishes. They curate the data – scrub out personally identifiable info, aggregate, split as needed (to send subsets of data such as 10 minutes of sliding data or the new users in the last month), combine many small files into a few large files, put into AVRO format (common, well-known SerDes), persist “permanently” to azure blob store. The data in the permanent store (WASB) is in a few large files, cleansed/masked, partitioned by day, semi-structured. HDInsight processes the data – analytics, sending to other systems (SQL, RS, PowerPivot, etc.) Demo (fake/cleansed data) Show RawStats (view in notepad, Cloud Explorer) = raw binary data in a proprietary xbox format – shown here (cleansed) with comma separators for readability. Each line is a session with a start time, gamerid, IP address, who they interacted with (gamerids separated by hyphens). This is what is in the landing zone – the raw data. Show RawCurator.pig (view in notepad). Compute/worker roles are watching for the raw data files. They pick them up and use Pig (and other MapReduce) to remove PII, aggregate, split, consolidate, remove the last octet of the IP for per state data…. Data is stored per arrival data – this sets us up for Hive partitions. This is a very simple workflow written by people who didn’t know Hadoop. Show gamerstats.xlsx. This is the curated data. Show PowerMap on top of sheet 3 (optionally also sheet 2 for marketing campaign data). This is using Hive/Hive ODBC driver to view new users. (optional) Show pssnippets: PowerShell to submit jobs
  5. Businesses using Big Data are “making it big”. They are taking advantage of all this ambient data and they’re moving ahead, gaining a foothold in new markets and gaining marketshare in existing markets. Think about how Netflix makes movie recommendations or how Google can predict a flu outbreak before the CDC does. HDInsight is very focused on the volume and variety problems. We have our RX/Stream Insight and BI stack added in to help address the solution velocity issues.
  6. http://blogs.msdn.com/b/cindygross/archive/2015/02/25/master-choosing-the-right-project-for-hadoop.aspx
  7. http://blogs.msdn.com/b/cindygross/archive/2015/02/25/master-choosing-the-right-project-for-hadoop.aspx
  8. Create HDInsight Cluster in Azure Portal http://smallbitesofbigdata.com/archive/2015/02/26/create-hdinsight-cluster-in-azure-portal.aspx
  9. Why big data in the cloud? collect data globally much is already in the cloud share globally cross data center HA/DR cost of hiring, training, retaining hardware personnel highly flexible, scalable easily pull in ambient data It's partly a question of where to spend your resources and how much control you want.
  10. Why Hadoop in the cloud? You can deploy Hadoop in a traditional on-site datacenter. Some companies–including Microsoft–also offer Hadoop as a cloud-based service. One obvious question is: why use Hadoop in the cloud? Here's why a growing number of organizations are choosing this option. The cloud saves time and money Open source doesn't mean free. Deploying Hadoop on-premises still requires servers and skilled Hadoop experts to set up, tune, and maintain them. A cloud service lets you spin up a Hadoop cluster in minutes without up-front costs. See how Virginia Tech is using Microsoft's cloud instead of spending millions of dollars to establish their own supercomputing center. The cloud is flexible and scales fast In the Microsoft Azure cloud, you pay only for the compute and storage you use, when you use it. Spin up a Hadoop cluster, analyze your data, then shut it down to stop the meter. We quickly spun up the Azure HDInsight cluster and processed six years worth of data in just a few hours, and then we shut it down&ellipsis; processing the data in the cloud made it very affordable. –Paul Henderson, National Health Service (U.K.) The cloud makes you nimble Create a Hadoop cluster in minutes–and add nodes on-demand. The cloud offers organizations immediate time to value. It was simply so much faster to do this in the cloud with Windows Azure. We were able to implement the solution and start working with data in less than a week. –Morten Meldgaard, Chr. Hansen
  11. http://blogs.msdn.com/b/cindygross/archive/2015/02/03/why-wasb-makes-hadoop-on-azure-so-very-cool.aspx
  12. Create HDInsight Cluster in Azure Portal http://smallbitesofbigdata.com/archive/2015/02/26/create-hdinsight-cluster-in-azure-portal.aspx