Weitere ähnliche Inhalte
Ähnlich wie Hadoop, Pig, and Python (PyData NYC 2012) (20)
Kürzlich hochgeladen (20)
Hadoop, Pig, and Python (PyData NYC 2012)
- 2. Overview
OF THIS SESSION
Why Python on Hadoop?
Fast Hadoop overview
Jython
Python
MrJob
Pig
(How they work, challenges, efficiency,
how to start)
- 26. Pig
ON HADOOP
Works with Jython
Not Python
Stream, no types
UDF read stdin
UDF deserialize, no types
Serialize for Pig
Write to stdout
Exceptions
- 33. Hadoop + Python
HARD STUFF: SETUP
Get Hadoop running
Software where it needs to be
Processes communicating
Data available
- 34. Hadoop + Python
HARD STUFF: DEVELOP
Learn
Project structure, modularity
Dev environment like Production
- 35. Hadoop + Python
HARD STUFF: VALIDATE
Syntax check
Packages available
Data readable
Data writable
Without long waits for failure
- 38. Hadoop + Python
HARD STUFF: DEPLOY
Environments identical
Code correctly deployed
Configuration changes
Non-disruptive
- 40. Hadoop + Python
HARD STUFF: LOGS
Distributed logs hard to make sense of
Hadoop logs hard to understand
Ephemeral clusters lose logs
- 41. Hadoop + Python
HARD STUFF: MORTAR’S APPROACH
Setup: PaaS, pip installation,
connectors
Develop: learning, structure, instant
dev env
Validate: fast validate
Debug: printf, more coming
Test: Rails-like test suites
Deploy: one-button deploy