Presentation for the 4/11/17 Apache Kafka Bay Area, hosted by Uber.
What happens if you take everything that is happening in your company -- every click, every database change, every application log -- and make it all available as a real-time stream of well structure data? This session will explain how to combine the full Apache Kafka toolkit to accomplish this and shift from batch-oriented data integration and data processing to real-time streams and real-time processing.
We will explain how the design and implementation of Kafka enables it to act as a scalable platform for streams of event data. The Kafka Connect API is a tool for scalable, fault-tolerant data import and export and turns Kafka into a hub for all your real-time data and bridges the gap between real-time and batch systems. The Kafka Streams API is a new library built right into Kafka that provides the corresponding processing support. It is built leveraging Kafka's existing low-level clients, and provides a very low barrier to entry, easy operationalization, and a natural DSL for writing stream processing applications. These three components provide all the components you need for a data pipeline: storage, import/export, and processing.
Finally, we'll describe an architecture for a stream data platform that combines these tools to react to all your inbound streams of state. This architecture only requires tools that ship with Apache Kafka, is lightweight in terms of deployment and management, and yet can scale to support large organizations with massive data pipelines.