A Hadoop Primer

•

6 gefällt mir•763 views

sogrady

A simple introduction to Hadoop talk given to the Maine Java Users' Group February 15, 2011.

Technologie

Project Architecture

Source: Running Hadoop On Ubuntu Linux, Michael G. Noll, 8.8.07

12

“The big issue is not that everyone will
suddenly operate at petabyte scale; a lot of
folks do not have that much data.

The more important topics are the specifics
of the storage and processing infrastructure
and what approaches best suit each
problem.”
- Bradford Cross, Flightcaster/Woven

20

“build Amazon's product search indices”
“build the recommender system for behavioral targeting”
“ETL style processing and statistics generation”
“information extraction & search”
“searching and analysis of millions of rental bookings”
“we use Hadoop to summarize of user's tracking data”
“we use Hadoop to store ad serving logs”
“the freedom to query the data in an ad-hoc manner”
“generating web graphs on 100 nodes”
“we use Hadoop for batch-processing large RDF datasets”
“facial similarity and recognition across large datasets“
“We are using Hadoop and Nutch to crawl Blog posts”
“Used for ETL & data analysis on terascale datasets”
Source: http://wiki.apache.org/hadoop/PoweredBy

24

Crawling Largeish
Unstructured Datasets

30

Weitere ähnliche Inhalte

Was ist angesagt?

Dataiku big data paris - the rise of the hadoop ecosystemDataiku

Introduction to Big Data and hadoopSandeep Patil

Apache Con Eu2008 Hadoop Tour Tom Whitetomwhite

Introduction of Big data and Hadoop Arohi Khandelwal

ESIP 2018 - The Case for Archives of ConvenienceDan Pilone

Hadoop at Yahoo! -- Hadoop World NY 2009yhadoop

Cassandra euJeremy Hanna

Introduction To Big Data Analytics On Hadoop - SpringPeopleSpringPeople

Hunk - Unlocking the Power of Big DataSplunk

Beauty and Big DataSri Ambati

Big Data Analytics for Non-ProgrammersEdureka!

Big data PPT Nitesh Dubey

Big Dataipower softwares

Introduction to Apache HadoopChristopher Pezza

Open source big data landscape and possible ITS applicationsSoftwareMill

Big data referenceszarigatongy

Data infrastructure architecture for medium size organization: tips for colle...DataWorks Summit/Hadoop Summit

Small intro to Big Data - Old versionSoftwareMill

Winning With Big Data: Secrets of the Successful Data ScientistDataspora

Winning with Big Data: Secrets of the Successful Data ScientistDataspora

Was ist angesagt? (20)

Dataiku big data paris - the rise of the hadoop ecosystem

Introduction to Big Data and hadoop

Apache Con Eu2008 Hadoop Tour Tom White

Introduction of Big data and Hadoop

ESIP 2018 - The Case for Archives of Convenience

Hadoop at Yahoo! -- Hadoop World NY 2009

Cassandra eu

Introduction To Big Data Analytics On Hadoop - SpringPeople

Hunk - Unlocking the Power of Big Data

Beauty and Big Data

Big Data Analytics for Non-Programmers

Big data PPT

Big Data

Introduction to Apache Hadoop

Open source big data landscape and possible ITS applications

Big data references

Data infrastructure architecture for medium size organization: tips for colle...

Small intro to Big Data - Old version

Winning With Big Data: Secrets of the Successful Data Scientist

Winning with Big Data: Secrets of the Successful Data Scientist

Ähnlich wie A Hadoop Primer

HadoopOded Rotter

002 Introduction to hadoop v3Dendej Sawarnkatat

Hadoop DeveloperEdureka!

Big Data in the Microsoft PlatformJesus Rodriguez

Oct 2011 CHADNUG Presentation on HadoopJosh Patterson

HadoopZubair Arshad

Hadoop and Big Data: RevealedSachin Holla

Introduction to Apache Hadoop EcosystemMahabubur Rahaman

Hadoop seminarKrishnenduKrishh

Analyzing Big data in R and Scala using Apache Spark 17-7-19Ahmed Elsayed

Unit IV.pdfKennyPratheepKumar

Introduction to BIg Data and HadoopAmir Shaikh

Simple, Modular and Extensible Big Data Platform ConceptSatish Mohan

What is hadoopAsis Mohanty

HadoopHimanshu Soni

Bi on Big Data - Strata 2016 in LondonDremio Corporation

Introduction to hadoopGanesh Sanap

Introduction to Big data & Hadoop -IEdureka!

Hadoop technology doctipanagiriharika

Hadoop PrimerSteve Staso

Ähnlich wie A Hadoop Primer (20)

Hadoop

002 Introduction to hadoop v3

Hadoop Developer

Big Data in the Microsoft Platform

Oct 2011 CHADNUG Presentation on Hadoop

Hadoop

Hadoop and Big Data: Revealed

Introduction to Apache Hadoop Ecosystem

Hadoop seminar

Analyzing Big data in R and Scala using Apache Spark 17-7-19

Unit IV.pdf

Introduction to BIg Data and Hadoop

Simple, Modular and Extensible Big Data Platform Concept

What is hadoop

Hadoop

Bi on Big Data - Strata 2016 in London

Introduction to hadoop

Introduction to Big data & Hadoop -I

Hadoop technology doc

Hadoop Primer

Mehr von sogrady

What Will You Build, and Why?sogrady

The Open Source Forecast is Cloudysogrady

Innovate / Disruptsogrady

Freedom: For Better and For Worsesogrady

The Cloud and the New Kingmakerssogrady

What a Long Strange Trip It's Beensogrady

The Rise and Fall and Rise of Java (2013)sogrady

The New Kingmakerssogrady

What Java Can Learn From JavaScriptsogrady

Open Cloud & The Future of Cloud Computing sogrady

Begun, the IP Wars Havesogrady

Java in the Age of the JVMsogrady

RedMonk Analytics: Why, How and Whatsogrady

The Future of the Cloud is Opensogrady

Showcase Your Data w/ RedMonk Analyticssogrady

Snapshot: Developer Activitysogrady

Survival of the Forgessogrady

All Data Big and Smallsogrady

Open Source + Big Data = Big Money sogrady

Mehr von sogrady (20)

What Will You Build, and Why?

The Open Source Forecast is Cloudy

Innovate / Disrupt

Freedom: For Better and For Worse

The Cloud and the New Kingmakers

What a Long Strange Trip It's Been

The Rise and Fall and Rise of Java (2013)

The New Kingmakers

What Java Can Learn From JavaScript

Open Cloud & The Future of Cloud Computing

Begun, the IP Wars Have

Java in the Age of the JVM

RedMonk Analytics: Why, How and What

The Future of the Cloud is Open

Showcase Your Data w/ RedMonk Analytics

Snapshot: Developer Activity

Survival of the Forges

All Data Big and Small

Open Source + Big Data = Big Money

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

GenAI Risks & Security Meetup 01052024.pdflior mazor

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

Histor y of HAM Radio presentation slidevu2urc

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

Partners Life - Insurer Innovation Award 2024The Digital Insurer

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

A Year of the Servo Reboot: Where Are We Now?Igalia

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Kürzlich hochgeladen (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Exploring the Future Potential of AI-Enabled Smartphone Processors

A Domino Admins Adventures (Engage 2024)

Powerful Google developer tools for immediate impact! (2023-24 C)

GenAI Risks & Security Meetup 01052024.pdf

Automating Google Workspace (GWS) & more with Apps Script

What Are The Drone Anti-jamming Systems Technology?

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

Histor y of HAM Radio presentation slide

presentation ICT roal in 21st century education

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Boost Fertility New Invention Ups Success Rates.pdf

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Partners Life - Insurer Innovation Award 2024

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

How to Troubleshoot Apps for the Modern Connected Worker

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

A Year of the Servo Reboot: Where Are We Now?

How to Troubleshoot Apps for the Modern Connected Worker

A Hadoop Primer

1. A Hadoop Primer Feb 2011 10.20.2005

2. http://redmonk.com/public/hadoop.pdf 2

3. The Background 3

4. October, 2003 4

5. December, 2004 5

6. Map::Reduce 6

7. Job::Map Reduce::Output 7

8. Counting Shakespeare 8

9. The Birth of Hadoop 9

10. 10

11. 11

12. Project Architecture Source: Running Hadoop On Ubuntu Linux, Michael G. Noll, 8.8.07 12

13. Project Traction 13

14. Employment Potential 14

15. Hadoop Users 15

16. Why Hadoop? 16

17. More Machines = More Faster 17

18. The reason everyone knows 18

19. BIG DATA 19

20. “The big issue is not that everyone will suddenly operate at petabyte scale; a lot of folks do not have that much data. The more important topics are the specifics of the storage and processing infrastructure and what approaches best suit each problem.” - Bradford Cross, Flightcaster/Woven 20

21. The reason not everyone knows 21

22. ru d U st tu e Data n r c 22

23. What Hadoop Is 23

24. “build Amazon's product search indices” “build the recommender system for behavioral targeting” “ETL style processing and statistics generation” “information extraction & search” “searching and analysis of millions of rental bookings” “we use Hadoop to summarize of user's tracking data” “we use Hadoop to store ad serving logs” “the freedom to query the data in an ad-hoc manner” “generating web graphs on 100 nodes” “we use Hadoop for batch-processing large RDF datasets” “facial similarity and recognition across large datasets“ “We are using Hadoop and Nutch to crawl Blog posts” “Used for ETL & data analysis on terascale datasets” Source: http://wiki.apache.org/hadoop/PoweredBy 24

25. What Hadoop Isn't 25

26. A relational database killer No Yes 26

27. Beyond Hadoop 27

28. The Hadoop Ecosystem 28

29. What We Use Hadoop For 29

30. Crawling Largeish Unstructured Datasets 30

31. Like 1.3M StackOverflow Questions 31

32. Or 1.7M HackerNews Entries 32

33. Or Years of Apache Log Files 33

34. How to Get Started 34

35. We use Cloudera 35

36. Mostly because it's easy 36