20. âThe big issue is not that everyone will
suddenly operate at petabyte scale; a lot of
folks do not have that much data.
The more important topics are the specifics
of the storage and processing infrastructure
and what approaches best suit each
problem.â
- Bradford Cross, Flightcaster/Woven
20
24. âbuild Amazon's product search indicesâ
âbuild the recommender system for behavioral targetingâ
âETL style processing and statistics generationâ
âinformation extraction & searchâ
âsearching and analysis of millions of rental bookingsâ
âwe use Hadoop to summarize of user's tracking dataâ
âwe use Hadoop to store ad serving logsâ
âthe freedom to query the data in an ad-hoc mannerâ
âgenerating web graphs on 100 nodesâ
âwe use Hadoop for batch-processing large RDF datasetsâ
âfacial similarity and recognition across large datasetsâ
âWe are using Hadoop and Nutch to crawl Blog postsâ
âUsed for ETL & data analysis on terascale datasetsâ
Source: http://wiki.apache.org/hadoop/PoweredBy
24