SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
Introduction to
Big Data
Joey Li
joeylicc@gmail.com
@joeylicc
joeylicc.wordpress.com
What is Big Data?
Big Data is a collection of data sets so large and complex
that it becomes difficult to process using traditional
database systems.
Big Data Challenges (3Vs)
Volume
Amount of Data

Velocity
Speed of Data
In & Out

Variety
Range of
Data Types &
Sources
Microsoft Solution to Big Data
●
●
●
●
●

Microsoft HDInsight
Microsoft .NET SDK for Hadoop
Microsoft ODBC Driver for Hive
Microsoft Excel (Power View & PowerPivot)
Microsoft SharePoint (Power View)
Microsoft HDInsight
● 100% Apache Hadoop compatible Big Data
implementation
● Microsoft support of HDInsight on Windows Server and
Windows Azure
● Simplified deployment and ease of manageability with
System Center 2012 or Windows Azure
● Elegant connectivity to Microsoft Office Excel 2013 and
Business Intelligence tools
What is Hadoop?
Apache Hadoop is an open-source software
framework that allows for the distributed processing of
large data sets across clusters of computers using
simple programming model. It is designed to scale up from
single servers to thousands of machines, each offering
local computation and storage.
What is Hadoop? (Cont.)
Hadoop includes 2 major modules
1. Hadoop Distributed File System (HDFS)
A distributed file system that provides high-throughput
access to application data
2. Hadoop MapReduce
A programming model for parallel processing of large
data sets
Hadoop Architecture
Hadoop Cluster
HDFS Write Operation
HDFS Read Operation
MapReduce
Hadoop Ecosystem
Microsoft .NET SDK for Hadoop
●
●
●
●

HDInsight Cluster Management
Hadoop Job Submission
Customize Map/Reduce Job
LINQ to Hive
Microsoft ODBC Driver for Hive
● Connect the following tools to Hadoop for
data insight
○ Microsoft Excel (Power View & PowerPivot)
○ Microsoft SharePoint (Power View)
○ Microsoft SQL Server
■ Database Engine
■ Analysis Services
Learning Hadoop
● Get Started with Hadoop@Hortonworks
http://hortonworks.com/get-started/

● Big Data University
http://bigdatauniversity.com/

● Getting Started with Microsoft Big Data
http://www.microsoftvirtualacademy.com/training-courses/getting-startedwith-microsoft-big-data
References
● Big Data@Wikipedia
http://en.wikipedia.org/wiki/Big_data

● Big Data@Microsoft
http://www.microsoft.com/en-us/sqlserver/solutions-technologies/businessintelligence/big-data.aspx

● Hortonworks Data Platform (HDP)
http://hortonworks.com/
References (Cont.)
● Apache Hadoop
http://hadoop.apache.org/

● Apache Hadoop@Wikipedia
http://en.wikipedia.org/wiki/Apache_Hadoop

● Microsoft .NET SDK for Hadoop
http://hadoopsdk.codeplex.com/

● Microsoft ODBC Driver for Hive
http://www.microsoft.com/en-us/download/details.aspx?id=37134

Weitere ähnliche Inhalte

Was ist angesagt?

Big data
Big dataBig data
Big data
hsn99
 

Was ist angesagt? (20)

Big data
Big dataBig data
Big data
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big data
Big dataBig data
Big data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data
Big dataBig data
Big data
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data
Big dataBig data
Big data
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data Ecosystem
Big Data EcosystemBig Data Ecosystem
Big Data Ecosystem
 
Big Data ppt
Big Data pptBig Data ppt
Big Data ppt
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Big Data
Big DataBig Data
Big Data
 
What is big data?
What is big data?What is big data?
What is big data?
 
Big Data
Big DataBig Data
Big Data
 

Andere mochten auch

Andere mochten auch (16)

Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challenges
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Virtualization, the cloud enabler
Virtualization, the cloud enablerVirtualization, the cloud enabler
Virtualization, the cloud enabler
 
Big data and its applications
Big data and its applicationsBig data and its applications
Big data and its applications
 
Big Data simplified
Big Data simplifiedBig Data simplified
Big Data simplified
 
Big Data Application Architectures - IoT
Big Data Application Architectures - IoTBig Data Application Architectures - IoT
Big Data Application Architectures - IoT
 
Big Data Tutorial V4
Big Data Tutorial V4Big Data Tutorial V4
Big Data Tutorial V4
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
Big Idea For Big Data
Big Idea For Big DataBig Idea For Big Data
Big Idea For Big Data
 
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s GoingBig Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 

Ähnlich wie Introduction to Big Data

Big_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperBig_SQL_3.0_Whitepaper
Big_SQL_3.0_Whitepaper
Scott Gray
 
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azureApache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
Brad Sarsfield
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
k4ndar
 

Ähnlich wie Introduction to Big Data (20)

SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869Big Data and Enterprise Data - Oracle -1663869
Big Data and Enterprise Data - Oracle -1663869
 
Microsoft Big Data
Microsoft Big DataMicrosoft Big Data
Microsoft Big Data
 
Big_SQL_3.0_Whitepaper
Big_SQL_3.0_WhitepaperBig_SQL_3.0_Whitepaper
Big_SQL_3.0_Whitepaper
 
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
[Azureビッグデータ関連サービスとHortonworks勉強会] Azure HDInsight
 
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
TDC2017 | POA Trilha BigData - IBM BigSQL - Engine de consulta de dados de al...
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 
Apache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azureApache hadoop for windows server and windwos azure
Apache hadoop for windows server and windwos azure
 
Using Machine Learning with HDInsight
Using Machine Learning with HDInsightUsing Machine Learning with HDInsight
Using Machine Learning with HDInsight
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study Material
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
 
EDB Guide
EDB GuideEDB Guide
EDB Guide
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Introduction to Big Data

  • 1. Introduction to Big Data Joey Li joeylicc@gmail.com @joeylicc joeylicc.wordpress.com
  • 2. What is Big Data? Big Data is a collection of data sets so large and complex that it becomes difficult to process using traditional database systems. Big Data Challenges (3Vs) Volume Amount of Data Velocity Speed of Data In & Out Variety Range of Data Types & Sources
  • 3. Microsoft Solution to Big Data ● ● ● ● ● Microsoft HDInsight Microsoft .NET SDK for Hadoop Microsoft ODBC Driver for Hive Microsoft Excel (Power View & PowerPivot) Microsoft SharePoint (Power View)
  • 4. Microsoft HDInsight ● 100% Apache Hadoop compatible Big Data implementation ● Microsoft support of HDInsight on Windows Server and Windows Azure ● Simplified deployment and ease of manageability with System Center 2012 or Windows Azure ● Elegant connectivity to Microsoft Office Excel 2013 and Business Intelligence tools
  • 5. What is Hadoop? Apache Hadoop is an open-source software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
  • 6. What is Hadoop? (Cont.) Hadoop includes 2 major modules 1. Hadoop Distributed File System (HDFS) A distributed file system that provides high-throughput access to application data 2. Hadoop MapReduce A programming model for parallel processing of large data sets
  • 13. Microsoft .NET SDK for Hadoop ● ● ● ● HDInsight Cluster Management Hadoop Job Submission Customize Map/Reduce Job LINQ to Hive
  • 14. Microsoft ODBC Driver for Hive ● Connect the following tools to Hadoop for data insight ○ Microsoft Excel (Power View & PowerPivot) ○ Microsoft SharePoint (Power View) ○ Microsoft SQL Server ■ Database Engine ■ Analysis Services
  • 15. Learning Hadoop ● Get Started with Hadoop@Hortonworks http://hortonworks.com/get-started/ ● Big Data University http://bigdatauniversity.com/ ● Getting Started with Microsoft Big Data http://www.microsoftvirtualacademy.com/training-courses/getting-startedwith-microsoft-big-data
  • 16. References ● Big Data@Wikipedia http://en.wikipedia.org/wiki/Big_data ● Big Data@Microsoft http://www.microsoft.com/en-us/sqlserver/solutions-technologies/businessintelligence/big-data.aspx ● Hortonworks Data Platform (HDP) http://hortonworks.com/
  • 17. References (Cont.) ● Apache Hadoop http://hadoop.apache.org/ ● Apache Hadoop@Wikipedia http://en.wikipedia.org/wiki/Apache_Hadoop ● Microsoft .NET SDK for Hadoop http://hadoopsdk.codeplex.com/ ● Microsoft ODBC Driver for Hive http://www.microsoft.com/en-us/download/details.aspx?id=37134