- Apache Hadoop is an open-source software framework for distributed storage and processing of large datasets across clusters of computers. It allows for the reliable storage of petabytes of data and large-scale computations across commodity hardware.
- Apache Hadoop is used widely by internet companies to analyze web server logs, power search engines, and gain insights from large amounts of social and user data. It is also used for machine learning, data mining, and processing audio, video, and text data.
- The future of Apache Hadoop includes making it more accessible and easy to use for enterprises, addressing gaps like high availability and management, and enabling partners and the community to build on it through open APIs and a modular architecture.
13. Traditional Enterprise Architecture Data Silos + ETL Serving Applications Unstructured Systems Traditional ETL & Message buses Traditional ETL & Message buses EDW Data Marts BI / Analytics Traditional Data Warehouses, BI & Analytics Serving Logs Social Media Sensor Data Text Systems …
14. Serving Applications Unstructured Systems Traditional ETL & Message buses Traditional ETL & Message buses Hadoop Enterprise Architecture Connecting All of Your Big Data EDW Data Marts BI / Analytics Traditional Data Warehouses, BI & Analytics Apache Hadoop EsTsL (s = Store) Custom Analytics Serving Logs Social Media Sensor Data Text Systems …
15. Serving Applications Unstructured Systems 80-90% of data produced today is unstructured Gartner predicts 800% data growth over next 5 years Traditional ETL & Message buses Traditional ETL & Message buses Hadoop Enterprise Architecture Connecting All of Your Big Data EDW Data Marts BI / Analytics Traditional Data Warehouses, BI & Analytics Serving Logs Social Media Sensor Data Text Systems … Apache Hadoop EsTsL (s = Store) Custom Analytics
16.
17. Big Data Platforms Cost per TB, Adoption Size of bubble = cost effectiveness of solution Source:
Introduce self, background. Thanks SDC, attendees, etc.
Tell inception story, plan to differentiate Yahoo, recruit talent, insure that Y! was not built on legacy private system From YST
Look for Analyst/Gartner data on growing cost…..
The data nodes not have RAID, just JBOD
The data nodes not have RAID, just JBOD
The data nodes not have RAID, just JBOD
Arun to develop
Our commitment to Apache has already changed the market! Ultimately contributing the code that maters and making it work is the currency in open source