Architecture matters. That's why today's innovators are taking a hard look at streaming data, an increasingly attractive option that can transform business in several ways: replacing aging data ingestion techniques like ETL; solving long-standing data quality challenges; improving business processes ranging from sales and marketing to logistics and procurement; or any number of activities related to accelerating data warehousing, business intelligence and analytics.
Register for this DM Radio Deep Dive Webinar to learn how streaming data can rejuvenate or supplant traditional data management practices. Host Eric Kavanagh will explain how streaming-first architectures can relieve data engineers from time-consuming, error-prone processes, ideally bidding farewell to those unpleasant batch windows. He'll be joined by Kevin Petrie of Attunity, who will explain why (with real-world story successes) streaming data solutions can keep the business fueled with trusted data in a timely, efficient manner for improved business outcomes.
3. Hardware (network, storage, servers)
Data Sources
Data Staging
Data Volumes
Data Flow
Data Governance
Data Usage
Data Structures
Schema Definition
Ingest Speeds
Data Workloads
Everything Is In Flux
4. The Impact of Parallelism
We used to see 10x performance
improvement every 6 years, now we
see 1000x (and that’s just an
approximation) regularly
6. A Renaissance in Data Engineering Is Underway
- Web giants innovated to solve their own challenges
- Facebook, Google, LinkedIn, Yahoo! and others…
- By open-sourcing software, these companies
changed how the industry operates, how tech is built
- The result is a new world of scale-out software
- Innovations span the spectrum of functionality:
database, analytics, networking, data flow, security
- Paramount among these in terms of significance is
the world of streaming data and supporting tools
7.
8. Competition Demands Digital Transformation
Adopted by the EU, but affects the USA
Behemoths are straws in the wind
- They identified huge opportunities
- Upended entire industries
- Built bulletproof infrastructure
- Deconstructed business processes
- Re-architected processes at scale
- Instilled a data-driven approach
11. Streaming as Primary Method
• Must do it: at or near real-time, partial updates with low cycle time
• Events: financial fraud, stock trading, high frequency sensor to controller
(autonomous vehicles)
• Many different items co-mingled (Internet of Things)
• Older examples: Internet packets; digital sensors; computer high density disks
• Boosts business: faster awareness leads to faster response leads to
improved business (consumer activity monitoring)
• Add flexibility and lower Total Cost Ownership (if x and if y and if z)
• Avoid committee-itis
• Faster process-analyze-change cycles
• Allow personnel to address more topics in same time period
13. Technical Issues: High Level
• Latency: built in delay time caused by many factors. Are you able and
willing to invest in finding and removing these items? Can be
significant cost to do so. Inherent latency in your use case.
• Order misalignment: data arrives or produced out of order
• Errors: detect and correct (don’t underestimate this). Embedded data
quality problems. Processing logic flaws. How to surgically update.
• Power usage: Higher power use per compute period (mobile)
• Storage space: overhead for parallel and streaming look ahead, look
back requires multiple copies. Content mgmt. procedures become
very important.
14.
15. Technical Issues: Low Level
• Requires deep expertise (hard to find usually)
• Memory: must contain all data needed per calculation including
lookup codes, cross-references, accumulation arrays
• Memory mgmt. is usually weak in high level languages like Java, C#
• Cross-stream data exchange: how to truly separate data for full
independence (age old parallel computing challenge) or how to store
and forward across streams (what is needed, how much, how long)
• Dependencies: state mgmt., server health
• Latency: network slow down, security hand shakes (e.g. TLS),
database freezes, file contention, cluster IO
16. Buyer Advice: Look for the Special Things
• In memory: of course but especially for engineering of known state-
of-art problems like heap space, garbage collection, swapping
• Integration: both done for you (framework) and well documented
interface APIs using standard computer languages
• Error reporting: this is critical. Your personnel cost will go up
significantly without this because you will (really!!) experience many
crashes. Need meaningful error messages pointing to actual problem.
• Demonstrates knowledge and work in next generation: expanded
memory spaces; integrated memory-compute.
• Shows end-to-end more complicated use cases than yours. (Since
yours will become more complicated than you know quickly).