Big Data Visualisation with Hadoop and PowerPivot

•

14 gefällt mir•11,108 views

The slides don't cover the demo session, but this is an introduction to microsoft and big data which people might find interesting.

Technologie Business

Website:
http://www.jenstirrup.com
Twitter: @jenstirrup
Email: Jen.Stirrup@copper-
blue.com

Data Scientists
You’re Incredible!
And you are…. People who like data to be correct.

As long as you’re gonna be thinking anyway,
why not think big. (Donald Trump)
Because we can imagine, we are free (Jean-
Paul Satre)
What kind of modern world would we have if
Edison, Green and Dixon had not developed
cinematic technology before Hitchcock grew
up? (Kevin Kelly, futurist)

The Unknown Unknowns
• That is to say, there are things that we know
we don't know. But there are also unknown
unknowns. There are things we don't know
we don't know. (Donald Rumsfeld)

Data Management Strategy
OLTP
Single
Purpose
DW
Multi
Purpose
DW
MapReduce
Compute Trend

Increases ad revenue by processing 3.5
billion events per day
Massive Volumes
Processes 464 billion rows per quarter,
with average query time under 10 secs.
Measures and ranks online user
influence by processing 3 billion signals
per day
Cloud Connectivity
Connects across 15 social networks via
the cloud for data and API access
Uses sentiment analysis and web
analytics for its internal cloud
Real-Time Insight
Improves operational decision making
for IT managers and users

What is Hadoop?
“Flexible and Available
Architecture for Large Scale
computation and data processing
on a network of highly available
commodity hardware.”

Hadoop’s Lineage
* Resource: Kerberos Konference (Yahoo) – 2010

Distributed Storage
(HDFS)
HDInsight Ecosystem
Distributed Processing
(Map Reduce)
ODBC(Azure Data
Marketplace)
Windows Azure
Storage

Hadoop Capabilities
Machine
Learning
Graph
Processing
Distributed
Compute
Extract Load
Transform
Predictive
Analysis

Why Hadoop?
Open Source Software
Commodity Hardware
= Reduction of Costs for IT

Hadoop vs RDBMs
Apache Hadoop isn’t a substitute for a
database
• It is not Relational
• Key Value pairs
• Big Data

Hadoop vs RDBMs
• Unstructured / Semi structured
• Structured
• Works together with RDBMs

..Bringing home
all this
technology, all
your data in
familiar packages

BIG DATA REQUIRES AN END-TO-END APPROACH
Discover Combine Refine
Relational Non-relational Streaming
INSIGHT
DATA
ENRICHMENT
DATA
MANAGEMENT
Self-Service Collaboration Corporate Apps Devices
Analytical

Microsoft Hadoop Vision
Runs on Windows and Azure
• Active Directory
• System Center
• .Net Programmability
Microsoft Data Connectivity
• SQL Server / SQL Parallel Data Warehouse
• Azure Storage / Azure Data Market

Microsoft Hadoop Vision
Microsoft Business Intelligence
• Hive ODBC Connectivity
• BI Tools for Big Data
Collaboratewith and Contribute to OSS
• Collaborate with HortonWorks
• Provide improvements and Windows support back to OSS

On Premise
• Comes with:
•Hadoop command line (shell)
•Hadoop Status for name node and
map-reduce cluster
•HDInsight Dashboard

On Premise
• On prem:
http://www.microsoft.com/bigda
ta/
• Single node cluster (onebox) install
• C:hadoop
• Starts local services

On Azure
• On Windows Azure:
http://HadoopOnAzure.com/
• 3 node cluster running as a service in Azure
• Can be used for 5 days
• Provides samples and HDInsight Dashboard
• TAP Program

Agenda
•Big Data – What is it?
• Big Data or Big Hype?
• Big Data, Big Insights with
Hadoop

Because we can imagine,
we are free
Jean-Paul Satre
We have the tools. All we’ve got to
do is imagine what could be. We can
reinvent the present; we can
transform the world around us.
Jason Silva

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Big Data Hadoop Training Online by www.itjobzone.bizITJobZone.biz

democratization of data sql-konferenzJen Stirrup

Exploring Big Data Analytics ToolsMultisoft Virtual Academy

Introduction of Big data and Hadoop Arohi Khandelwal

Introduction To Big Data Analytics On Hadoop - SpringPeopleSpringPeople

Big Data Analytics for Non-ProgrammersEdureka!

Next Big Thing In IT SpaceAhsan Shamsudeen

Bigdata " new level"Vamshikrishna Goud

Big Data Final Presentation17aroumougamh

Big Data & Data ScienceBrijeshGoyani

Bigdata Analytics using HadoopNagamani Gurram

1. what is hadoop part 1wintersnow181189

Intro to Big Data HadoopApache Apex

Big data pptShweta Sahu

Big data and hadoopPrashanth Yennampelli

BigData AnalyticsMayank Kumar Sharma

Big data abstractnandhiniarumugam619

Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople

What is Big Data?CodePolitan

A Glimpse of Bigdata - Introductionsaisreealekhya

Was ist angesagt? (20)

Introduction to Big Data Hadoop Training Online by www.itjobzone.biz

democratization of data sql-konferenz

Exploring Big Data Analytics Tools

Introduction of Big data and Hadoop

Introduction To Big Data Analytics On Hadoop - SpringPeople

Big Data Analytics for Non-Programmers

Next Big Thing In IT Space

Bigdata " new level"

Big Data Final Presentation

Big Data & Data Science

Bigdata Analytics using Hadoop

1. what is hadoop part 1

Intro to Big Data Hadoop

Big data ppt

Big data and hadoop

BigData Analytics

Big data abstract

Top Big data Analytics tools: Emerging trends and Best practices

What is Big Data?

A Glimpse of Bigdata - Introduction

Ähnlich wie Big Data Visualisation with Hadoop and PowerPivot

Intro to Data ScienceTJ Stalcup

Big data with Hadoop - IntroductionTomy Rhymond

Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra

Level Seven - Expedient Big Data presentationDoug Denton

Thinkful DC - Intro to Data Science TJ Stalcup

Introduction to Big DataRoi Blanco

DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti

2017 06-14-getting started with data scienceThinkful

Presentation on Big DataMd. Salman Ahmed

big-data-notes1.pptSutanuGhosal1

Big DataMehmet Burak Akgün

Ds01 data scienceDotNetCampus

Data Culture Series - Keynote - 24th febJonathan Woodward

Data Science OverviewDavide Mauri

Big data introduction, Hadoop in detailsMahmoud Yassin

Big data4businessusersBob Hardaway

Big Data By Vijay Bhaskar SemwalIIIT Allahabad

big-data-8722-m8RQ3h1.pptxVaishnavGhadge1

Big dataFACTS Computer Software L.L.C

Big data pptOECLIB Odisha Electronics Control Library

Ähnlich wie Big Data Visualisation with Hadoop and PowerPivot (20)

Intro to Data Science

Big data with Hadoop - Introduction

Big-Data-Seminar-6-Aug-2014-Koenig

Level Seven - Expedient Big Data presentation

Thinkful DC - Intro to Data Science

Introduction to Big Data

DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...

2017 06-14-getting started with data science

Presentation on Big Data

big-data-notes1.ppt

Big Data

Ds01 data science

Data Culture Series - Keynote - 24th feb

Data Science Overview

Big data introduction, Hadoop in details

Big data4businessusers

Big Data By Vijay Bhaskar Semwal

big-data-8722-m8RQ3h1.pptx

Big data

Big data ppt

Mehr von Jen Stirrup

AI Applications in Healthcare and Medicine.pdfJen Stirrup

BUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATIONJen Stirrup

CuRious about R in Power BI? End to end R in Power BI for beginners Jen Stirrup

Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...Jen Stirrup

1 Introduction to Microsoft data platform analytics for releaseJen Stirrup

5 Comparing Microsoft Big Data Technologies for AnalyticsJen Stirrup

Comparing Microsoft Big Data Platform TechnologiesJen Stirrup

Introduction to Analytics with Azure Notebooks and PythonJen Stirrup

Sales Analytics in Power BIJen Stirrup

Analytics for MarketingJen Stirrup

Diversity and inclusion for the newbies and doersJen Stirrup

Artificial Intelligence from the Business perspectiveJen Stirrup

How to be successful with Artificial Intelligence - from small to successJen Stirrup

Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...Jen Stirrup

Data Visualization dataviz superpowerJen Stirrup

R - what do the numbers mean? #RStatsJen Stirrup

Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowJen Stirrup

Blockchain Demystified for Business Intelligence ProfessionalsJen Stirrup

Examples of the worst data visualization everJen Stirrup

Lighting up Big Data Analytics with Apache Spark in AzureJen Stirrup

Mehr von Jen Stirrup (20)

AI Applications in Healthcare and Medicine.pdf

BUILDING A STRONG FOUNDATION FOR SUCCESS WITH BI AND DIGITAL TRANSFORMATION

CuRious about R in Power BI? End to end R in Power BI for beginners

Artificial Intelligence Ethics keynote: With Great Power, comes Great Respons...

1 Introduction to Microsoft data platform analytics for release

5 Comparing Microsoft Big Data Technologies for Analytics

Comparing Microsoft Big Data Platform Technologies

Introduction to Analytics with Azure Notebooks and Python

Sales Analytics in Power BI

Analytics for Marketing

Diversity and inclusion for the newbies and doers

Artificial Intelligence from the Business perspective

How to be successful with Artificial Intelligence - from small to success

Artificial Intelligence: Winning the Red Queen’s Race Keynote at ESPC with Je...

Data Visualization dataviz superpower

R - what do the numbers mean? #RStats

Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow

Blockchain Demystified for Business Intelligence Professionals

Examples of the worst data visualization ever

Lighting up Big Data Analytics with Apache Spark in Azure

Kürzlich hochgeladen

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Slack Application Development 101 Slidespraypatel2

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Partners Life - Insurer Innovation Award 2024The Digital Insurer

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Kürzlich hochgeladen (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Slack Application Development 101 Slides

Data Cloud, More than a CDP by Matt Robison

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

08448380779 Call Girls In Civil Lines Women Seeking Men

A Domino Admins Adventures (Engage 2024)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Unblocking The Main Thread Solving ANRs and Frozen Frames

Injustice - Developers Among Us (SciFiDevCon 2024)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

CNv6 Instructor Chapter 6 Quality of Service

Partners Life - Insurer Innovation Award 2024

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Exploring the Future Potential of AI-Enabled Smartphone Processors

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Automating Google Workspace (GWS) & more with Apps Script

Boost PC performance: How more available memory can improve productivity

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Big Data Visualisation with Hadoop and PowerPivot

2. Website: http://www.jenstirrup.com Twitter: @jenstirrup Email: Jen.Stirrup@copper- blue.com

3. Data Scientists You’re Incredible! And you are…. People who like data to be correct.

4. Agenda

5. How did Big Data get Big?

6. As long as you’re gonna be thinking anyway, why not think big. (Donald Trump) Because we can imagine, we are free (Jean- Paul Satre) What kind of modern world would we have if Edison, Green and Dixon had not developed cinematic technology before Hitchcock grew up? (Kevin Kelly, futurist)

7. The Unknown Unknowns • That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know. (Donald Rumsfeld)

8. Data Scientists You’re Incredible!

10. Examples of Big Data

11.

12.

13. Big Data Takeaways. V is for:

14. Data is Black Gold

15.

16.

17. What does it mean for Enterprises?

18. Agenda

19. Big Data.

20. Data Management Strategy OLTP Single Purpose DW Multi Purpose DW MapReduce Compute Trend

21. Increases ad revenue by processing 3.5 billion events per day Massive Volumes Processes 464 billion rows per quarter, with average query time under 10 secs. Measures and ranks online user influence by processing 3 billion signals per day Cloud Connectivity Connects across 15 social networks via the cloud for data and API access Uses sentiment analysis and web analytics for its internal cloud Real-Time Insight Improves operational decision making for IT managers and users

22.

23. Hadoop is for Big Data.

24. What is Hadoop? “Flexible and Available Architecture for Large Scale computation and data processing on a network of highly available commodity hardware.”

25. Hadoop’s Lineage * Resource: Kerberos Konference (Yahoo) – 2010

26. Distributed Storage (HDFS) HDInsight Ecosystem Distributed Processing (Map Reduce) ODBC(Azure Data Marketplace) Windows Azure Storage

27. Hadoop Key Terms

28. Hadoop Capabilities Machine Learning Graph Processing Distributed Compute Extract Load Transform Predictive Analysis

29. Why Hadoop? Open Source Software Commodity Hardware = Reduction of Costs for IT

30. Hadoop vs RDBMs Apache Hadoop isn’t a substitute for a database • It is not Relational • Key Value pairs • Big Data

31. Hadoop vs RDBMs • Unstructured / Semi structured • Structured • Works together with RDBMs

32. Data Knowledge Action HDInsight

33. How can Microsoft help?

34.

35. ..Bringing home all this technology, all your data in familiar packages

36. Big Agenda

37. BIG DATA REQUIRES AN END-TO-END APPROACH Discover Combine Refine Relational Non-relational Streaming INSIGHT DATA ENRICHMENT DATA MANAGEMENT Self-Service Collaboration Corporate Apps Devices Analytical

38. Data Knowledge Action HDInsight

39. Microsoft Hadoop Vision Runs on Windows and Azure • Active Directory • System Center • .Net Programmability Microsoft Data Connectivity • SQL Server / SQL Parallel Data Warehouse • Azure Storage / Azure Data Market

40. Microsoft Hadoop Vision Microsoft Business Intelligence • Hive ODBC Connectivity • BI Tools for Big Data Collaboratewith and Contribute to OSS • Collaborate with HortonWorks • Provide improvements and Windows support back to OSS

41. On Premise • Comes with: •Hadoop command line (shell) •Hadoop Status for name node and map-reduce cluster •HDInsight Dashboard

42. On Premise • On prem: http://www.microsoft.com/bigda ta/ • Single node cluster (onebox) install • C:hadoop • Starts local services

43. On Azure • On Windows Azure: http://HadoopOnAzure.com/ • 3 node cluster running as a service in Azure • Can be used for 5 days • Provides samples and HDInsight Dashboard • TAP Program

44. Agenda •Big Data – What is it? • Big Data or Big Hype? • Big Data, Big Insights with Hadoop

45. Because we can imagine, we are free Jean-Paul Satre We have the tools. All we’ve got to do is imagine what could be. We can reinvent the present; we can transform the world around us. Jason Silva

46. Recap

Hinweis der Redaktion

Courtesy of Bruno Aziza at @SiSense
Relational databases are pushed to the limit.Data Management techniques haven't scaledTraditional systems haven't scaledBig data is about complexity as well as scalability.NoSQL as a paradigm shift.Hadoop can run and parallelise large scale batch computations on large amounts of data. however, there is a high latency in returning the results. It is not suitable for low latency.What are the features of a Big Data system?RobustFault TolerantHuman Fault TolerantData when you need itScaleableGeneralExtensibleReduced implementation complexityError handlingAuditing-- no different from a little Data Solution. Think inserts.
Relational databases are pushed to the limit.Data Management techniques haven't scaledTraditional systems haven't scaledBig data is about complexity as well as scalability.NoSQL as a paradigm shift.Hadoop can run and parallelise large scale batch computations on large amounts of data. however, there is a high latency in returning the results. It is not suitable for low latency.What are the features of a Big Data system?RobustFault TolerantHuman Fault TolerantData when you need itScaleableGeneralExtensibleReduced implementation complexityError handlingAuditing-- no different from a little Data Solution. Think inserts.
There are some things in life are so complicated and abstract that they’re awesome. Eternity, cosmic significance, and the infinite universe are just a few of these awesome, convoluted concepts that have kept us fascinated and confused since the beginning of human consciousness.Awe - perceptual expansion, such perceptual vastness that you literally have to configure your mental schemata just to accommodate, just to take in the scale, of the experienceanthological awakening, realization of the connectedness of all things, and also the continuum from inanimate to animate matter; all of it is nature, all of it is inevitable, all of it is emerging as part of the same evolutionary processPhysicist Freeman Dyson speaks of a new future where a new generation of artists will write genomes the way that Shakespeare used to write verses
Courtesy of WIPRO
Teradata and Lyn Langit slide.we’ve got 7 billion people, we got 6 billion devices90% of the world’s data was created in the last two years aloneNot the data that’s kept behind corporate walls. unstructured content, most of which didn’t even exist years ago: documents, tweets, images, videos posted to YouTube, data gathered from surveillance cameras. We post, we blog, we share, we tweet, we like or don’t like. We have a voice and we leave a digital trail. And every tweet we send is being followed, monitored, analyzed, acted on. Companies are analyzing social to find out what you’re thinking, to know what new products and services you want even before you do. A new initiative by the U.N. is actually using sentiment analyses to help predict the civil unrest, job losses, spending reductions, disease outbreaks
Digital Marketingoptimisation – golden path analysis, clickthroughtsDigital Exploration – Discovery, new marketsMachine generated analytics – logs, real time, telemetry. Location. Remote sensors.Data Retention – archivingTraditionally: Physics Experiments, Sensor data, Satellite data, …Now:Operational LogsCustomer behaviorSocial interactions online…From Terabytes in the 1990 over Petabytes today to Zetabytes in the future
What do we have now? It is like a vacuum tube; slow and expensive.Why did Big Data get big?
What do we have now? It is like a vacuum tube; slow and expensive.Why did Big Data get big?
Volume – data comes in one size – large.Variety – structured and unstructure data.Veracity – good and bad data.Velocity – fast moving.Value – business value
Unlike real crude oil, data can be re-used. It can be mined for profit.It needs to be re-shaped in order to be used.If you don’t’ have your data, you don’t have anything! You lose your business.
Thanks to @SiSense and Bruno Aziza
If you don’t’ have your data, you don’t have anything! You lose your business.
Actionable InsightPredictive InsightBusiness ImpactCustomer Discernment
Relational databases are pushed to the limit.Data Management techniques haven't scaledTraditional systems haven't scaledBig data is about complexity as well as scalability.NoSQL as a paradigm shift.Hadoop can run and parallelise large scale batch computations on large amounts of data. however, there is a high latency in returning the results. It is not suitable for low latency.What are the features of a Big Data system?RobustFault TolerantHuman Fault TolerantData when you need itScaleableGeneralExtensibleReduced implementation complexityError handlingAuditing-- no different from a little Data Solution. Think inserts.
Big DataThis is a picture down the center isle of a shipping container from one of Microsoft’s datacenters. We put ~1800 computers inside one of these containers. Some of us had the privilege of working on the data storage and computational platform that powers Bing. We used 22 of these containers, spanning 40,000 machines where we stored over 100PB of data. This was three years ago, and now these servers are almost obsolete.Big Data is in constant motion and growing at an incredible rate,90% of the world’s data generated in just the past two years. That's remarkable growth. Technology history has taught us that the one with themost data wins. The empires of data like Twitter, Facebook, Yahoo all of whom are able to capitalize on the notion that data equates to power. More and more companies are increasingly utilizing Hadoop to power Big Data analytics and drive revenue and profit.It’s all about your Data.
Some examples of organizations that delivering new value based in the form of revenue growth, cost savings or creating entirely new business models.Yahoo - AS with Hive, Klout - AS with Hive (white paper), GE - Hive AnalyticsYahoo! (Gartner BI Excellence Award Winner) is driving growth for existing revenue streams:Yahoo! manages a powerful, scalable advertising exchange that includes publishers and advertisers.Advertisers want to get the most out of their investment by reaching their targeted audiences effectively and efficiently.Yahoo! needs visibility into how consumers are responding to ads alongmany dimensions (websites, creative, time of day, gender, age, location) to make the exchangework as efficiently and effectivelyas possible.Yahoo! doubled its revenue by allowing campaign managers to “tune” campaign targeting and creative.Yahoo! drove an increase in spending from advertisers since they got better performance by advertising through Yahoo!.Yahoo! TAO exposed customer segment performance to campaign managers and advertisers for the first time.Klout is creating new businesses and revenue streams:Klout’s mission is to help everyone understand and leverage their influence. Klout uses Big Data to unify the social web (consumers, brands, and partners) with social networking and activity, along with data to generate a Klout score and enable analysis, targeting, and social graphs.Helps consumers manage their “social brand.”Helps brands reach influencers at scale.Helps data partners enhance their services (customer loyalty, CRM, media and identity, and marketing). For example, the Palms uses Klout scores in addition to their normal customer rewards program to determine whether or not to upgrade their customers to a better room during their stay. The Huffington Post uses Klout to help serve the best curated Twitter content.Klout Case Study: http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012-Enterprise/Klout/Data-Services-Firm-Uses-Microsoft-BI-and-Hadoop-to-Boost-Insight-into-Big-Data/710000000129Case Study on Thailand’s Department of Special Investigations : http://www.microsoft.com/casestudies/Microsoft-SQL-Server-2012-Enterprise/Department-of-Special-Investigation/Thai-Law-Enforcement-Agency-Optimizes-Investigations-with-Big-Data-Solution/710000001175 GE is driving operational efficiencies:GE is running several use cases on its Hadoop cluster while incorporating several different disparate sources to produce results. Along with sentiment analysis, GE is running web analytics on its internal cloud structure and looking at load usage, user analytics, and failure mode analytics. GE built a recommendation engine for its intranet involving various press releases users might be interested in based on their function, user profiles, and prior visits to its site. GE is working with several types of remote monitoring and diagnostic data from energy and wind businesses.
Business Users need data. There is a paradigm shift towards it, despite what the cartoon says.
Processing Platform for Big Data ProcessingUsing the “Map-Reduce” Processing ParadigmWhen people talk about Hadoop they are often talking about specific computational patterns including map reduce, which emerged as a method to process lots of unstructured data on top of a distributed storage system in a highly fault tolerant and embarrassingly scalable way. Hadoop allows us to store and process large amounts of data on commodity hardware. In the past you would spend large amounts of money on very specialized hardware. Today you can do this with off the shelf hardware running Hadoop. Now, Hadoop doesn’t have a monopoly on “big”, “real time” or “unstructured” but does provide some unique capabilities.
Acid – Atomicity, Consistency, Isolation, Durability
Assuming that the volumes of data are larger than those conventional relational database infrastructures can cope with, processing options break down broadly into a choice between massively parallel processing architectures — data warehouses or databases such as Greenplum — and Apache Hadoop-based solutions. This choice is often informed by the degree to which the one of the other "Vs" — variety — comes into play. Typically, data warehousing approaches involve predetermined schemas, suiting a regular and slowly evolving dataset. Apache Hadoop, on the other hand, places no conditions on the structure of the data it can process.
Hadoop, on the other hand, places no conditions on the structure of the data it can process.
I see the real breakthrough insights coming through when you take what is the traditional "Business Intelligence" and add more capabilities like machine learning, predictive analysis, statistical analysis, large scale graph processing, pattern mining, trend analysis, economic modeling. All of which today are a reality in Hadoop. The implications of this are quite astounding when you think about it. This is huge.
Acid – Atomicity, Consistency, Isolation, Durability
Big Data; in terms of data volume, variability and velocity at scale are is the first problem. But the Big Data solutions and technology by themselves don't lead to solving business objectives. We don't have a Hadoop problem they have analytics, pattern mining, trend analysis, statistical inferenceing, economic modeling, market regression level problems.Data science starts where the utility class services like Big Data Hadoop end. The real opportunity is to expose data science to everyone.As powerful as Hadoop is, today it’s still more of a computer scientist’s or academically-trained analyst’s tool than it is an enterprise analytics product. Hadoop itself is controlled through programming code rather than anything that looks like it was designed for business unit personnel. Hadoop data is often more “raw” and “wild” than data typically fed to data warehouse and OLAP (Online Analytical Processing) systems. This is where I and Microsoft see opportunity. Essentially; wouldn't it be cool if mere mortals could use this stuff and consume insights that are directly coming from Hadoop? Microsoft HDInsight enables you to gain insight from virtually any data, connect with the world of data, improve decision making, and enhance the development of the next generation of products and services.Nearly everyone in your organization can analyze and make more informed decisions with the right tools.PowerPivot for Microsoft Excel and Power View for SharePoint give nearly all users a view into structured and unstructured data.With the Hive Add-in for Excel and Hive ODBC Driver, almost anyone in your organization can directly access Hadoop datafrom end-user tools.Hadoop simplifies programming for developers with JavaScript for MapReduce jobs. The JavaScriptimplementation can also reduce your code by up to 10 times compared to Java.
The second thing I want to talk about is Hadoop and how Hadoop is setup to deliver Breakthrough Insights from your data.How many of you are familiar with Hadoop? How many of you are using Hadoop for projects today?How many are planning on using Hadoop in the next 12mo? How about in the cloud?When people talk about Hadoop they are often talking about specific computational patterns including map reduce, which emerged as a method to process lots of unstructured data on top of a distributed storage system in a highly fault tolerant and embarrassingly scalable way. Hadoop allows us to store and process large amounts of data on commodity hardware. In the past you would spend large amounts of money on very specialized hardware. Today you can do this with off the shelf hardware running Hadoop. Now, Hadoop doesn’t have a monopoly on “big”, “real time” or “unstructured” but does provide some unique capabilities.
The second thing I want to talk about is Hadoop and how Hadoop is setup to deliver Breakthrough Insights from your data.How many of you are familiar with Hadoop? How many of you are using Hadoop for projects today?How many are planning on using Hadoop in the next 12mo? How about in the cloud?When people talk about Hadoop they are often talking about specific computational patterns including map reduce, which emerged as a method to process lots of unstructured data on top of a distributed storage system in a highly fault tolerant and embarrassingly scalable way. Hadoop allows us to store and process large amounts of data on commodity hardware. In the past you would spend large amounts of money on very specialized hardware. Today you can do this with off the shelf hardware running Hadoop. Now, Hadoop doesn’t have a monopoly on “big”, “real time” or “unstructured” but does provide some unique capabilities.
There are other talks that will go into Big Data and Hadoop so we’ll only do a quick overview of that right now. We’ll spend most of our time on Hive.
Data democracy
Ask the audience first.

Big Data Visualisation with Hadoop and PowerPivot

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Big Data Visualisation with Hadoop and PowerPivot

Ähnlich wie Big Data Visualisation with Hadoop and PowerPivot (20)

Mehr von Jen Stirrup

Mehr von Jen Stirrup (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Big Data Visualisation with Hadoop and PowerPivot

Hinweis der Redaktion