1. Redefining Smart Grid Architectural
Thinking Using Stream Computing
• Cognizant 20-20 Insights
Executive Summary
After an extended pilot phase, smart meters
have moved into the mainstream for measuring
the performance of a multiplicity of business
functions across the power utilities industry.
Moving forward, the next objective is to create new
ways of handling large data sets for constructing
actionable responses to smart-meter-generated
data, particularly when it comes to processes
such as validation estimation and evaluation,
demand response and load management.
As smart meters proliferate across power grids,
energy utilities are dealing with huge packets of
data coursing through their IT networks. More and
more granular data holds the promise of enabling
faster and more informed decision making that
drives operational improvements and, perhaps,
enables consumers to better manage their own
power consumption. To get there, however,
utilities must first conquer growing network
latency challenges caused not only by the huge
profusion of smart-meter-generated data but
also by processing inefficiencies created by their
dependence on more centralized models.
Forward-thinking utilities need more distributed
and virtual complex event processing models that
transform real-time operational data into applied
insights.Creatingreal-timeoperationalknowledge
can drive better demand response management,
improve quality of service and preempt fraud and
service outages before they inflict reputational
damage. Rethinking their basic information archi-
tecture will help utilities transform their power
grids into adaptive and intelligent infrastructures
that inform continuous improvements in opera-
tional efficiency and business effectiveness.
This white paper explores the challenges and
benefitsofSmartGridcreationandoffersconcrete
thinking on new architectural approaches built
on emerging software standards that more
effectively leverage established forms of stream
computing.1
It examines new thinking on ways to
capture and analyze data generated by smart
meters that can help power utilities achieve new
thresholds of performance over the near- and
long-term, while building better relationships with
consumers. We examine how stream data2
aids
usage forecasts (predicted by converting historic
data coupled with real-time events into opera-
tional KPIs) and identifies anomalies and patterns
in an ever-changing and high-transaction environ-
ment. In our view, when operational data is trans-
ported on a pervasive communication infrastruc-
ture (and coupled with two-way communication
between utilities and consumers) data integration
challenges can be overcome, setting the stage for
a brighter and more energy-efficient future.
Using Cloud Platforms for Smart Meter
Infrastructure
One way to unlock the data treasure trove
enabled by smart meters is by tapping virtual
data processing infrastructure delivered via
cloud computing. Clouds offer the advantages of
scalable and elastic resources to build software
cognizant 20-20 insights | june 2011
2. infrastructure that support such dynamic,
always-on applications. But the unique needs of
energy informatics applications also highlight the
challenges of using cloud platforms, such as the
need to support efficient and reliable streaming,
low-latency scheduling and scale-out, as well as
effective data sharing.
Cloud platforms are an intrinsic component in
creating a software architecture to drive more
effective use of Smart Grid applications. The
primary reason: Cloud data centers can accom-
modate the large-scale data interactions that
take place on Smart Grids and are better archi-
tected than centralized systems to process the
huge, persistent flows of data generated across
the utility value chain. Figure 1 shows how this
might work within a power utilities company.
The computational demand for demand-response
applications will be a function of the energy
deficit between supply and demand. This typically
oscillates based on the time of the day and
possible weather conditions. This translates into a
need for compute- intensive, low-latency response
at midday and limited analysis in off-peak evening
hours. The elastic nature of cloud resources makes
it possible for utilities to avoid costly capital
investment for their peak computation needs.
This results in information sharing on real-time
energy usage and power pricing. As Figure 1
shows, Smart Grid applications that span smart
meters (distributed at the consumer level),
cloud platforms (for data integration by service
providers) and clusters (at energy utilities)
introduce systems heterogeneity, which efficient
streaming is positioned to rationalize.
The need to perform complex processing with
minimal latency over large volumes of data has
led to the evolution of various data processing
paradigms. Industry majors such as IBM, Oracle,
Microsoft and SAP have developed event-oriented
application development approaches to decrease
the latency in processing large volumes of data.
These efforts reveal the following:
Since smart meters generate interval data•
that is time-series in nature, companies need
efficient ways of processing queries incremen-
tally and via in-memory technologies. They
then need a way to apply the results to their
emerging dynamic business process models.
Since this buffered data is also stored offline•
for static analysis, mining, tracing and back-
testing, companies need a means of managing
and accessing these stores efficiently.
As Smart Grids proliferate, businesses require
greater data availability rates. Companies can no
longer afford to collect all time-series data, load it
into a database and then build database indexes
for query efficiency. Instead, businesses need
cognizant 20-20 insights 2
Consumers and Smart Meters: Interactions on a Cloud Stream
Active feedback of pricing
Load curtailment signals
Commercial
Consumption
Power
Generation
Historian
Pattern
Recognition
Residential
Consumption
Hourly Consumption
Prediction
Power consumption data stream
Weather data
Power production data
Figure 1
3. cognizant 20-20 insights 3
these queries to be continuously and incremen-
tally computed and updated as new relevant data
arrives from smart meter sources. This will avoid
the need to re-process existing data. Incremental
computation is necessary to create a low-latency
response to continuously flowing time-series data.
Complex event processing (CEP) is a widely used
technique in smart meter data processing, where
data is continuously monitored, verified and acted
upon, given ongoing and changing conditions.
With this approach, data, including the event
streams from multiple sources, is processed based
on a declarative query language. Importantly, all
of this is accomplished with near-zero latency.
Event-Driven Data Processing
Challenges
The key attributes of complex event processing
include:
Express fundamental query logic:• Incorpo-
rate windowed processing and time progress
as a core component for query logic.
Handle error or delayed data:• Delayed
processing until guaranteed, with no late-arriv-
ing events. This increases latency; otherwise,
aggressively process event and produce
tuples.3
Extensibility:• Given the complexity of meter
data and event operations, there is a need
to support custom-built streaming logic as
libraries.
Universal specification:• Semantics of query
logic need to be independent of when and how
programmers physically read and understand
events. Applications time, rather than system
time, is used to enable a universal time zone.
These attributes ensure that with complex event
processing, query logic is kept generic regarding
how events are read and how their output is inter-
preted. Tuples should follow universal time, which
can be read and processed on any system.
Performance Implications
In-stream processing doesn’t allow data to be
written back to the disk for processing later from
internal state in main memory. With smart meter
data, an event queue is filled to capacity once
the arrival rate is greater than the processing
capability of the system. The metrics used to
manage the data stream are latency, throughput,
correctness and memory usage.
Ease of Management
To effectively deploy smart meters and the data
they generate, a number of factors need to be
addressed, including query composability and
ease of deployment over a variety of environ-
ments, such as single servers and clusters. Query
composability requires the ability to “publish”
query results, as well as the ability for Continuous
Query (CQ) to consume results of existing CQs
and streams.
Typical meter streaming queries entail rules such
as:
Present the top three values every 10 minutes.•
Compute a running average of each sensor•
value over the last 20 seconds.
Filter out sensor readings when the device was•
in a maintenance period.
Illustrate when event “A” was followed by event•
“B” within three minutes.
OSIsoft’s PI System provides power utilities
with the leading operation data management
infrastructure for Smart Grid components that
encompass power generation, transmission and
distribution. This software provides capabilities
for energy management, condition-based mainte-
nance, operational performance monitoring, cur-
tailment programs, renewable energy monitoring
and phasor monitoring of transmission lines,
among other functionalities. `
OSIsoft MDUS integrates a utility’s meter system
and SAP’s AMI Integration for Utilities to perform
tasks such as billing. It also provides the ability to
integrate meter data with other operational data.
It serves as a real-time data collector, which is
head-end system vendor-independent.
Integration of meter data into business systems
such as billing requires data validation and other
forms of aggregations. OSIsoft has chosen to
leverageCEPtoaccomplishthistask.CEPprovides
the scalability required by SAP AMI and utilizes a
SQL-based language for defining the validation
rules. OSIsoft uses Microsoft’s StreamInsight
CEP engine, which enables utilities to define and
implement required meter data validation. This
puts this important facet of regulatory compliance
requirements into the hands of the utility’s IT and
business specialists.
4. cognizant 20-20 insights 4
There are two ways streaming can be adopted in
current meter data systems:
Server-side streaming:• The stream is pro-
cessed on the (OSIsoft) PI snapshot and
streamed with the processed results back to
the PI server (see Figure 2).
Edge processing:• In this model, the CQs
are applied at the data source (and at the PI
interface level), where the results are only
stored as processed data (see Figure 3).
Cloud and Adaptive Rate Control
The growing importance for utilities to process
and analyze thousands of meter data streams
suggests that they should
consider the adoption of
cloud platforms to achieve
scalable, latency-sensitive
stream processing. One
approach to consider is
adaptive rate control, which
is the process of restrict-
ing the stream rate to meet
accuracy requirements for
Smart Grid applications.
This approach consumes
less bandwidth and com-
putational overhead within
the cloud for stream
processing. The experi-
mentation of the Smart Grid stream processing
pipeline, modeled using IBM InfoSphere Streams
and deployed on the Eucalyptus4
private cloud,5
shows 50% bandwidth savings, resulting from
adaptive stream rate control.
Low-latency stream processing is a key com-
ponent of the software architecture required
to support demand-response applications. The
stream processing system ingests smart meter
data arriving from consumers and acts as a first
responder to detect local and global power usage
skews and to alert the utility operator. At 1KB per
event generated each minute, 2TB of data will
stream each day. Processing such large-scale
streams can be compute- and data-intensive;
public or private cloud platforms provide a scal-
able and flexible infrastructure for building such
Smart Grid applications.
However, computational and bandwidth con-
straints at the consumer and utility levels mean
that power usage data streamed at static rates
from smart meters to the utility can either be
at too high a latency to detect usage skews in a
timely manner or at too high a rate to computa-
tionally overwhelm the system. Smart meters
connect to the utility using heterogeneous
networks and range from low bandwidth power
line carriers at ~20Kbps, to 3G cellular networks
at ~2Mbps, as well as ZigBee at ~250Kbps.
Network bandwidth can thus be a scare resource
at the consumer end. In the case of smart meters,
traffic can be bursty, since data is sent indepen-
dently, causing instantaneous bandwidth needs
to spike.
In the case of high power demand, meters emit
a large volume of information, which requires a
throttle controller to respond to these events and
control latency.
Applying InfoSphere Streams
IBM InfoSphere Streams is a stream processing
system that continuously analyzes massive
volumes of streaming data for business activity
monitoring and active diagnostics. It consists
of a runtime environment that contains stream
instances running on one or more hosts. Within
InfoSphere is a Stream Processing Application
Declarative Engine (known as SPADE), which is
a stream programming model (executed by the
runtime environment) that supports stream
data sources that continuously generate tuples
containing typed attributes.
The growing
importance for utilities
to process and analyze
thousands of meter
data streams suggests
that they should
consider the adoption
of cloud platforms
to achieve scalable,
latency-sensitive
stream processing.
PI Interface Node
Foreign
Device
System
Data
Source
PI Server
Queries
(vs .NET-LINQ)
Complex Event Processing Engine
Input Adapter(s)
Input Adapter(s)
Output Adapter(s)
Output Adapter(s)
Stream Insight Engine
Stream Insight Engine
PI Server
Queries
(vs .NET-
LINQ)
Figure 2
PI Interface Node
Foreign
Device
System
Data
Source
PI Server
Queries
(vs .NET-LINQ)
Complex Event Processing Engine
Input Adapter(s)
Input Adapter(s)
Output Adapter(s)
Output Adapter(s)
Stream Insight Engine
Stream Insight Engine
PI Server
Queries
(vs .NET-
LINQ)
Figure 3
5. cognizant 20-20 insights 55
Figure 5 shows the smart meters present on the
public Internet that generate power usage data
streams accessible over TCP sockets. Here, the
InfoSphere streams run on a cluster that doesn’t
support out-of-box deployment on a cloud plat-
form. To instantiate a stream processing environ-
ment on a Eucalyptus private cloud, a customized
VM image must be built that supports InfoSphere
streams. Communication to the stream instance
is activated when the VM instances are online.
This communication, however, is initiated exter-
nally by a SPADE application started on a stream
instance and configured with a list of named
stream instances on specific hosts.
Each smart meter is a stream source whose
tuples have the identity of the smart meter,
power used within a time duration, as well as the
timestamps of the measurement interval. Addi-
tional meta data about the smart meter and con-
sumer is part of the payload but will be ignored
for the purposes of this discussion. Each tuple
is about 1KB in size. The pipeline first checks if
each individual power usage tuple reports usage
that exceeds a certain constant threshold, Umax
m defined by the utility.
Crossing this threshold will trigger a critical
notification to a utility manager. Next, a relative
condition will check to see if the user’s consump-
tion increases by more than 25% since his/her
previous consumption. This will trigger a less
critical notification. The pipeline then archives
the tuple into a sink file and proceeds to compute
a running sum of the daily usage by the consumer.
Subsequently, the running average over a
tumbling window is updated. These operations are
performed for each smart meter stream (shaded
in brown in Figure 4.
Next, the pipeline aggregates smart meter tuples
across all streams using a tumbling window to
calculate the cumulative consumer energy usage
within a 15-minute time window. This stream
operator (shaded blue in Figure 4) calculates the
total load on the utility. It can be used to alert the
utility manager in case, say, the total consumption
reaches 80%, 90% and >100% of available power
capacity at the utility. Operators shown in dotted
lines (Figure 4) are not part of the application
logic and form the adaptive throttling introduced
next. This core model could be used in demand
response management.
SAP Event Insight
The emergence of smarter grids powered by
stream computing has made clear the need for
more robust processing at the enterprise systems
level. These systems typically struggle to keep
pace with high data volume and a large number
of heterogeneous and widely dispersed data
sources and changing data requirements. This is
being resolved by enterprise software systems
such as mySAP ERP, which have begun to adapt
in-memory processing algorithms for this new
architectural proposition. The result is that SAP
can now deliver an event insight application that
understands the impact of operational events
in real time. In-memory processing not only
brings just-in-time rhyme and reason to real-time
business events, but it can also do so with signifi-
cantly less effort, a reduction in reporting, oper-
ational and opportunity costs, which can power
competitive advantage.
Tracking Energy Consumption
Figure 4
A stream processing pipeline is used to continuously monitor energy usage. Processing elements
in dotted lines show the addition of throttle logic.
Superscript = Meter ID
Subscript = Time
Notify Notify
DB/File
(m1
,t1,u1
1)
(mn
,t1,un
1)
if(u1
1 >Umax
) if(u1
1 >.136*u1
avg) Update u1
sum Update u1
avg
Update u1
avg
R1
++
if(c1
-u1
avg < accept)
R1
Condition
Running
daily sum
AMI’s 15-min
average
Utility’s 15-min
average
Increase
AMI rate
Decrease
AMI rate
Network
Condition
StoreConditionCondition
6. cognizant 20-20 insights 6
Looking Down the Road
We see stream computing as a key element of the
future of work that could be applied broadly by
the power utilities industry. Our view is that its
deployment would minimize network latency and
function as a key component for demand response
management. Moreover, we are planning to inves-
tigate stream computing on the cloud platform.
Our research will appraise the throughput of
a stream processing system and its latency in
processing each tuple as the stream rates adapt.
This approach will help utilities that are adopting
Smart Grids in their mainstream business with
network optimization and intelligent processing,
saving money by automating their demand
response program and load management
processes. Standardizing these processes saves
IT maintenance expense, freeing capital to be
invested in other core business activities.
In a business context, this approach will help
utilities with energy efficiency programs and
grid management. It does this by providing a
mechanism to convert dollars saved by eliminat-
ing inefficient energy generation and distribution
toward more effective asset management.
Architecture of Stream Processing and the Throttle Controller
Control Feedbacks
InfoSphere Streams
Response
Industrial/Commercial
Residential Building
AMI
AMI
DataFiles
TCP/IP
Input Streams Streams Processing
Throttle Controller
DataFiles
Electric
Gas
Electric
Gas
Figure 5
Footnotes
1
Stream computing is a high-performance computer system that analyzes multiple data streams from
many sources, live. Stream computing uses software algorithms to analyze data in real time, which
increases speed and accuracy when dealing with data handling and analysis.
2
Stream data is a sequence of digitally encoded coherent signals (packets of data or data packets) used
to transmit or receive information.
3
Tuple is an ordered pair of energy data to be processed and is an effective way of representing
in-stream computing.
4
Eucalyptus Cloud is a software platform for the implementation of private cloud computing on
computer clusters.
7. cognizant 20-20 insights 77
5
Private clouds are internal clouds that, according to some vendors, emulate cloud computing on private
networks. These (typically virtualization automation) products offer the ability to host applications or
virtual machines in a company’s own set of hosts. They provide the benefits of utility computing,
such as shared hardware costs, the ability to recover from failure and the ability to scale up or down
depending upon demand.
References
“IBM Infosphere Streams Version 1.2.1: Programming Model and Language Reference,” IBM, Oct. 4,
2010, http://publib.boulder.ibm.com/infocenter/streams/v1r2/topic/com.ibm.swg.im.infosphere.streams.
product.doc/doc/IBMInfoSphereStreams-LangRef.pdf.
D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J. H. Hwang, W. Lindner, A. Maskey,
A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing and S. B. Zdonik, “The Design of the Borealis Stream Processing
Engine,” Proceedings of the Second Biennial Conference on Innovative Data Systems Research, pp
277-289, January 2005.
D. J. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul and S.
Zdonik. “Aurora: A New Model and Architecture for Data Stream Management,” The VLDB Journal, Vol
12, Issue 2, August 2003.
A. Arasu, S. Babu and J. Widom. “The CQL Continuous Query Language: Semantic Foundations and
Query Execution.” The VLDB Journal, Vol 15, Issue 2, June 2006.
A. M. Ayad, J. F. Naughton. “Static Optimization of Conjunctive Queries with Sliding Windows Over Infinite
Streams,” Proceedings of the International Conference on Management of Data, SIGMOD 2004, ACM.
C. Ballard, D. M. Farrell, M. Lee, P. D. Stone, S. Thibault and S. Tucker, “IBM InfoSphere Streams Harnessing
Data in Motion,” IBM, September 2010.
A. Biem, E. Bouillet, H. Feng, A. Ranganathan, A. Riabov, O. Verscheure, H. Koutsopoulos and C. Moran,
“IBM InfoSphere Streams for Scalable, Real-Time Intelligent Transportation Services,” Proceedings of
the International Conference on Management of Data, SIGMOD 2010, pp 1,093-1,104, ACM.
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy,
S. Madden, V. Raman, F. Reiss and M. A. Shah, “TelegraphCQ: Continuous Dataflow Processing for an
Uncertain World,” SIGMOD 2003, ACM.
StreamBase, http://www.streambase.com/
D. Abadi et al., “The Design of the Borealis Stream Processing Engine.”
“Why IP is the Right Foundation for the Smart Grid,” Cisco Systems, Inc., January 2010.
“The Role of the Internet Protocol (IP) in AMI Networks for Smart Grid,” National Institute of Standards
and Technology, NIST PAP 01, Oct. 24, 2009.
D. Zinn, Q. Hart, B. Ludaescher and Y. Simmhann, “Streaming Satellite Data to Cloud Workflows for
On-Demand Computing of Environmental Products,” Workshop on Workflows in Support of Large-Scale
Science (WORKS), 2010.
Arvind Arasu, Shivnath Babu, Jennifer Widom, ”CQL: A Language for Continuous Queries over Streams
and Relations,” Database Programming Languages, 9th International Workshop, DBPL 2003, Potsdam,
Germany, Sept. 6-8, 2003.
“Pattern Detection with StreamInsight” Microsoft StreamInsight blog, Sept. 2, 2010, http://tinyurl.
com/2afzbhd
“InfoSphere Streams,” IBM, http://www.ibm.com/software/data/infosphere/streams