Building a Real-Time Data Pipeline with Spark, Kafka, and Python

2. Douglas Butler Product Manager

4. massively parallel, lock free, FAST distributed SQL database in-memory, on-disk ACID JSON and geospatial transactions and analytics

9. 2 Minute Install

11. A Simple Pipeline

12. from pystreamliner.api import Extractor class CustomExtractor(Extractor): def initialize(self, streaming_context, sql_context, config, interval, logger): logger.info("Initialized Extractor") def next(self, streaming_context, time, sql_context, config, interval, logger): rdd = streaming_context._sc.parallelize([[x] for x in range(10)]) return sql_context.createDataFrame(rdd, ["number"])

15. > memsql-ops pip install [package] distributed cluster-wide any Python package bring your own

16. Real-time pipeline

17. Q & A time

Building a Real-Time Data Pipeline with Spark, Kafka, and Python

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Mehr von SingleStore

Mehr von SingleStore (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Building a Real-Time Data Pipeline with Spark, Kafka, and Python