During this session we’ll discuss the pros and cons of a new structured streaming data processing model in Spark and a nifty way of enhancing Spark with SnappyData, an open-source framework providing great features for both persistent and in-motion data analysis.
Based on a real-life use case, where we designed and implemented a streaming application filtering, consuming and aggregating tons of events, we will talk the role of the persistent back-end and stream processing integration in the real-time applications in terms of performance, robustness and scalability of the solution.
15. val resultStream: SchemaDStream = snsc.registerCQ(
"select application_name, log_type" +
"count(source_host) as source" +
"from applicationStream window (duration 1 seconds, slide 1 seconds)" +
"group by application_name, log_type"
)
DATA
STREAM
FILTERING
AGGREGATED
WINDOW
PERPETUALLY
GROWING
REFERENCE
DATA
EXTERNAL
ACCESS
16. resultStream.foreachDataFrame(df => {
val dfApplications = df.groupBy("application_name")
.agg(sum(”log_type"))
dfApplications.write.insertInto("application_name")
val resultStream: SchemaDStream = snsc.registerCQ(
"select application_name, log_type" +
"count(source_host) as source" +
"from applicationStream window (duration 1 seconds, slide 1 seconds)" +
"group by application_name, log_type"
)
DATA
STREAM
FILTERING
AGGREGATED
WINDOW
PERPETUALLY
GROWING
REFERENCE
DATA
EXTERNAL
ACCESS
17. snsc.sql("create table eventTable(application string, event string)
using row")
resultStream.foreachDataFrame(df => {
val result = df.collect()
...
val stmt = conn.prepareStatement("put into eventTable
(application_name, log_type) values (?,?+(nvl(
(select application_name from eventTable where
application_name= ?),0)))")
DATA
STREAM
FILTERING
AGGREGATED
WINDOW
PERPETUALLY
GROWING
REFERENCE
DATA
EXTERNAL
ACCESS