Weitere ähnliche Inhalte
Ähnlich wie Aws Quick Dirty Hadoop Mapreduce Ec2 S3 (20)
Mehr von Skills Matter (20)
Kürzlich hochgeladen (20)
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
- 15. HTTP Logs Log file A: (...) FreeTouchScreenNokia5230 (...) (...) GetRidofAllSpeedCameras(...) (...) USManWinsLottery (...) (...) BNPToLaunchElectionManifesto (...) Log file B: (...) FreeTouchScreenNokia5230 (...) (...) BodyLanguageTellsAll (...)
- 24. Launching a virtual Hadoop Cluster $ elastic-mapreduce --create --name "Wiki log crunch" --alive --num-instances –instance-type c1.medium 20 Created job flow <job flow id> $ ec2din (...)
- 38. Add a step $ elastic-mapreduce --jobflow <jfid> --stream --step-name "Wiki log crunch" --input s3n://dsikar-wikilogs-2009/dec/ --output s3n://dsikar-wikilogs-output/21 --mapper s3n://dsikar-wiki-scripts/wikidictionarymap.pl --reducer s3n://dsikar-wiki-scripts/wikireduce.pl http://<instance public dns>:9100
- 39. s3cmd # make bucket $ s3cmd mb s3://dsikar-wikilogs # put log files $ s3cmd put pagecounts-200912*.gz s3://dsikar-wikilogs/dec $ s3cmd put pagecounts-201004*.gz s3://dsikar-wikilogs/apr # list log files $ s3cmd ls s3://dsikar-wikilogs/ # put scripts $ s3cmd put *.pl s3://dsikar-wiki-scripts/ # delete log files $ s3cmd del --recursive --force s3://dsikar-wikilogs/ # remove bucket $ s3cmd rb s3://dsikar-wikilogs/
- 44. That's all folks and thanks for attending: QUICK AND DIRTY PARALLEL PROCESSING ON THE CLOUD Daniel Sikar
Hinweis der Redaktion
- So without further ado lets get this show on the road and run a job concurrently on a few virtual machines.