social pharmacy d-pharm 1st year by Pragati K. Mahajan
B5 15 - large-scale complex it systems
1. doi:10.1145/ 2209249 . 2 2 0 9 2 6 8
The reductionism behind today’s software-
engineering methods breaks down in the face
of systems complexity.
by Ian Sommerville, Dave Cliff, Radu Calinescu,
Justin Keen, Tim Kelly, Marta Kwiatkowska,
John McDermid, and Richard Paige
Large-Scale
Complex
IT Systems
On the afternoon of May 6, 2010, the U.S. equity
markets experienced an extraordinary upheaval. Over
approximately 10 minutes, the Dow Jones Industrial
Average dropped more than 600 points, representing
the disappearance of approximately $800 billion of
market value. The share price of several blue-chip
multinational companies fluctuated curred, it reversed, so over the next few
dramatically; shares that had been at minutes most of the loss was recovered
tens of dollars plummeted to a penny and share prices returned to levels
in some cases and rocketed to values close to what they had been before the
over $100,000 per share in others. As crash.
suddenly as this market downturn oc- This event came to be known as the
“Flash Crash,” and, in the inquiry re-
key insights port published six months later,7 the
trigger event was identified as a single
Coalitions of systems, in which the system
block sale of $4.1 billion of futures con-
elements are managed and owned
independently, pose challenging new tracts executed with uncommon ur-
problems for systems engineering. gency on behalf of a fund-management
When the fundamental basis of
company. That sale began a complex
engineering—reductionism—breaks pattern of interactions between the
down, incremental improvements to
current engineering techniques are
high-frequency algorithmic trading
unable to address the challenges of systems (algos) that buy and sell blocks
developing, integrating, and deploying of financial instruments on incredibly
large-scale complex IT systems. short timescales.
Developing complex systems requires
A software bug did not cause the
a socio-technical perspective involving
human, organizational, social, and
Flash Crash; rather, the interactions of
political factors, as well as technical independently managed software sys-
factors. tems created conditions unforeseen
ju ly 2 0 1 2 | vo l. 55 | n o. 7 | c om m u n ic at ion s of t he acm 71
2. contributed articles
(probably unforeseeable) by the own- and managerially independent. Char-
ers and developers of the trading sys- acterizing SoS, he covered a range of
tems. Within seconds, the result was a systems, from directed (developed for
failure in the broader socio-technical a particular purpose) to virtual (lack-
markets that increasingly rely on the
algos (see the sidebar “Socio-Techni- Developers cannot ing a central management authority or
centrally agreed purpose). LSCITS is a
cal Systems”).
Society depends on complex IT sys-
analyze inherent type of SoS in which the elements are
owned and managed by different orga-
tems created by integrating and or- complexity nizations. In this classification, the col-
chestrating independently managed
systems. The incredible increase in
during system lection of systems that led to the Flash
Crash (an LSCITS) would be called a
scale and complexity in them over the development, “virtual system of systems.” However,
past decade means new software-engi-
neering techniques are needed to help
as it depends since Maier’s article was published in
1998, the word “virtual” has generally
us cope with their inherent complexity. on the system’s taken on a different meaning—virtual
Here, we explain the principal reasons
today’s software-engineering meth- dynamic operating machines; consequently, we propose
an alternative term that we find more
ods and tools do not scale, proposing environment. descriptive—“coalition of systems.”
a research and education agenda to The systems in a coalition of systems
help address the inherent problems work together, sometimes reluctantly,
of large-scale complex IT systems, or as doing so is in their mutual interest.
LSCITS, engineering. Coalitions of systems are not explicitly
designed but come into existence as
Coalitions of Systems different member systems interact ac-
The key characteristic of these systems cording to agreed-upon protocols. Like
is that they are assembled from other political coalitions, there might even
systems that are independently con- be hostility between various members,
trolled and managed. While there is and members enter and leave accord-
increasing awareness in the software- ing to their interpretation of their own
engineering community of related is- best interests.
sues,10 the most relevant background The interacting algos that led to the
work comes from systems engineering. Flash Crash represent an example of a
Systems engineering focuses on devel- coalition of systems, serving the pur-
oping systems as a whole, as defined by poses of their owners and cooperating
the International Council for Systems only because they have to. The owners
Engineering (http://www.incose.org/): of the individual systems were compet-
“Systems engineering integrates all the ing finance companies that were often
disciplines and specialty groups into mutually hostile. Each system jealous-
a team effort forming a structured de- ly guarded its own information and
velopment process that proceeds from could change without consulting any
concept to production to operation. other system.
Systems engineering considers both Dynamic coalitions of software-
the business and the technical needs intensive systems are a challenge for
of all customers with the goal of pro- software engineering. Designing de-
viding a quality product that meets the pendability into the coalition is not
user needs.” possible, as there is no overall design
Systems engineering emerged to authority, nor is it possible to central-
take a systemwide perspective on com- ly control the behavior of individual
plex engineered systems involving systems. The systems in the coalition
structures and electrical and mechani- can change unpredictably or be com-
cal systems. Almost all systems today pletely replaced, and the organiza-
are software-intensive, and systems tions running them might themselves
engineers address the challenge of cease to exist. Coalition “design” in-
constructing ultra-large-scale software volves the protocols for communica-
systems.17 The most relevant aspects of tions, and each organization using
systems engineering is work on “sys- the coalition orchestrates the constit-
tem of systems,” or SoS,12 about which uent systems its own way. However,
Maier said the distinction between the designers and managers of each
SoS and complex monolithic systems individual system must consider how
is that SoS elements are operationally to make it robust enough to ensure
72 comm unicatio ns o f the ac m | j u ly 201 2 | vo l . 5 5 | no. 7
3. contributed articles
their organizations are not threat-
ened by failure or undesirable behav-
ior elsewhere in the coalition. Socio-Technical Systems
System Complexity Engineers are concerned primarily with building technical systems from hardware
and software components and assume system requirements reflect the organizational
Complexity stems from the number
needs for integration with other systems, compliance, and business processes.
and type of relationships between the Yet systems in use are not simply technical systems but “socio-technical systems.”
system’s components and between the To reflect the fact they involve evolving and interacting communities that include
system and its environment. If a rela- technical, human, and organizational elements, they are sometimes also called “socio-
technical ecosystems,” though the term socio-technical systems is more common.
tively small number of relationships Socio-technical systems include people and processes, as well as technological
exist between system components and systems. The process definitions outline how the system designers intend the system
they change relatively slowly over time, should be used, but, in practice, users interpret and adapt them in unpredictable ways,
then engineers can develop determin- depending on their education, experience, and culture. Individual and group behavior
also depend on organizational rules and regulations, as well as organizational culture,
istic models of the system and make or “the way things are done around here.”
predictions concerning its properties. Defining technical software-intensive systems intended to support an
However, when the elements in a organization’s work in isolation is an oversimplification that hinders software
engineering. So-called system requirements represent the interface between the
system involve many dynamic relation- technical system and the wider socio-technical system, yet requirements are inevitably
ships, complexity is inevitable. Com- incomplete, incorrect, and/or out of date. Coalitions of systems cannot operate on this
plex systems are nondeterministic, basis. Rather, engineers must recognize that by taking advantage of the abilities and
and system characteristics cannot be inventiveness of people, the systems will be more effective and resilient.
predicted by analyzing the system’s
constituents. Such characteristics
emerge when the whole system is put Even when the relationships be- and 1980s, modern software is much
to use and changes over time, depend- tween system elements are simpler, larger, more complex, more reliable,
ing how it is used and on the state of its relatively static, and, in principle, un- and often developed more quickly.
external environment. derstandable, there may be so many Software products deliver astonishing
Dynamic relationships include elements and relationships that un- functionality at relatively low cost.
those between system elements and derstanding them is practically impos- Software engineering has focused
the system’s environment that change. sible. Such complexity is called “epis- on reducing and managing epistemic
For example, a trust relationship is a temic complexity” due to our lack of complexity, so, where inherent com-
dynamic relationship; initially, com- knowledge of the system rather than plexity is relatively limited and a single
ponent A might not trust component some inherent system characteris- organization controls all system ele-
B, so, following some interchange, tics.16 For example, it may be possible ments, software engineering is highly
A checks that B has performed as ex- in principle to deduce the traceability effective. However, for coalitions of
pected. Over time, these checks may relationships between requirements systems with a high degree of inherent
be reduced in scope as A’s trust in B and design, but, if appropriate tools complexity, today’s software engineer-
increases. However, some failure in B are not available, doing so may be prac- ing techniques are inadequate.
may profoundly influence that trust, tically impossible. This is reflected in the failure that is
and, after the failure, even more strin- If you do not know enough about all too common in large government-
gent checks might be introduced. a system’s components and their re- funded projects. The software may be
Complexity stemming from the lationships, you cannot make predic- delivered late, be more expensive to
dynamic relationships between ele- tions about overall behavior, even if the develop than anticipated, and inad-
ments in a system depends on the ex- system lacks dynamic relationships equate for the needs of its users. An
istence and nature of these relation- between its elements. Epistemic com- example of such a project was the at-
ships. Engineers cannot analyze this plexity increases with system size; as tempt, from 2000 to 2010, to automate
inherent complexity during system ever-larger systems are built, they are U.K. health records; the project was ul-
development, as it depends on the inevitably more difficult to understand timately abandoned at a cost estimated
system’s dynamic operating environ- and their behavior and properties at $5 billion–$10 billion.19
ment. Coalitions of systems in which more difficult to predict. This distinc- The fundamental reason today’s
elements are large software systems tion between inherent and epistemic software engineering cannot effec-
are always inherently complex. The complexity is important. As discussed tively manage inherent complexity is
relationships between the elements of in the following section, it is the prima- that its basis is in developing individ-
the coalition change because they are ry reason new approaches to software ual programs rather than in interact-
not independent of how the systems engineering are needed. ing systems. The consequence is that
are used or of the nature of their op- software-engineering methods are
erating environments. Consequently, Reductionism and unsuitable for building LSCITS. To ap-
the nonfunctional (often even the Software Engineering preciate why, we need to examine the
functional) behavior of coalitions of In some respects, software engineering essential divide-and-conquer reduc-
systems is emergent and impossible to has been incredibly successful. Com- tionist assumption that is the basis of
predict completely. pared to the systems built in the 1970s all modern engineering.
ju ly 2 0 1 2 | vo l. 55 | n o. 7 | c om m u n ic at ion s of t he acm 73
4. contributed articles
Reductionism is a philosophical nical criteria. Decision making in orga- stitute at Carnegie Mellon University
position that a complex system is no nizations is profoundly influenced by (http://www.sei.cmu.edu/) on ultra-
more than the sum of its parts, and political considerations, with actors large-scale systems (ULSS)13 triggered
that an account of the overall system striving to maintain or improve their research leading to creation of the
can be reduced to accounts of individ- current positions to avoid losing face. Center for Ultra-Large Scale Software-
ual constituents. From an engineering Technical considerations are rarely the Intensive Systems, or ULSSIS (http://
perspective, this means systems engi- most significant factor in large-system ulssis.cs.virginia.edu/ULSSIS), a re-
neers must be able to design a system decision making; and search consortium involving the Uni-
so it is composed of discrete smaller The problem is definable, and sys- versity of Virginia, Michigan State
parts and interfaces allowing the parts tem boundaries are clear. The nature University, Vanderbilt University,
to work together. A systems engineer of “wicked problems”15 is that the and the University of California, San
then builds the system elements and “problem” is constantly changing, de- Diego. In the U.K., the comparable
integrates them to create the desired pending on the perceptions and status LSCITS Initiative addresses problems
overall system. of stakeholders. As stakeholder posi- of inherent and epistemic complexity
Researchers generally adopt this tions change, the boundaries are like- in LSCITS, while Hillary Sillitto, a se-
reductionist assumption, and their wise redefined. nior systems architect at Thales Land
work concerns finding better ways to However, for coalitions of systems, Joint Systems U.K., has proposed
decompose problems or systems (such these assumptions never hold true, ULSS design principles.17
as software architecture), better ways and many software project “failures,” Northrop et al.13 made the point
to create the parts of the system (such where software is delivered late and/ that developing ultra-large-scale sys-
as object-oriented techniques), or bet- or over budget, are a consequence of tems needs to go beyond incremental
ter ways to do system integration (such adherence to the reductionist view. To improvements to current methods,
as test-first development). Underlying help address inherent complexity, soft- identifying seven important research
all software-engineering methods and ware engineering must look toward areas: human interaction, computa-
techniques (see Figure 1) are three re- the systems, people, and organizations tional emergence, design, computa-
ductionist assumptions: that make up a software system’s envi- tional engineering, adaptive system
System owners control system devel- ronment. We need to represent, ana- infrastructure, adaptable and predict-
opment. A reductionist perspective lyze, model, and simulate potential op- able system quality and policy, and ac-
takes the view that an ultimate control- erational environments for coalitions quisition and management. The SEI
ler has the authority to take decisions of systems to help us understand and ULSS report suggested it is essential to
about a system and is therefore able manage, so far as possible, the com- deploy expertise from a range of disci-
to enforce decisions on, say, how com- plex relationships in the coalition. plines to address these challenges.
ponents interact. However, when sys- We agree the research required is
tems consist of independently owned Challenges interdisciplinary and that incremental
and managed elements, there is no Since 2006, initiatives in the U.S. improvement in existing techniques is
such owner or controller and no cen- and in Europe have sought to ad- unable to address the long-term soft-
tral authority to take or enforce design dress engineering large coalitions of ware-engineering challenges of ultra-
decisions; systems. In the U.S., a report by the large-scale systems engineering. How-
Decisions are rational, driven by tech- influential Software Engineering In- ever, a weakness of the SEI report was
its failure to set out a roadmap outlin-
Figure 1. Reductionist assumptions vs. LSCITS reality. ing how large-scale systems engineer-
ing can get from where it is today to the
research it proposed.
Reductionist assumptions Software engineers worldwide cre-
ating large complex software systems
require more immediate, perhaps
Owners of Decisions made Definable problem
a system control rationally, driven by and clear system
more incremental, research, driven by
its development technical criteria boundaries the practical problems of complex IT
systems engineering. The pragmatic
Control Rationality Problem definition
proposals we outline here begin to ad-
dress some of them, aiming for medi-
Wicked problem and um-term, as well as a longer-term, im-
No single owner Decisions driven by constantly renegotiated
or controller political motives system boundaries
pact on LSCITS engineering.
The research topics we propose
here might be viewed as part of the
roadmap that could lead us from cur-
Large-scale complex IT systems reality rent practice to LSCITS engineering.
We see them as a bridge between the
short- and medium-term imperative to
improve our ability to create coalitions
74 com municatio ns o f th e acm | j u ly 201 2 | vo l . 5 5 | no. 7
5. contributed articles
of systems and the longer-term vision a failure for some users may have no ef-
set out in the SEI ULSS report. fect on others. Because some failures
Developing coalitions of systems in- are ambiguous, automated systems
volves engineering individual systems cannot cope on their own. Human op-
to work in the orchestration, as well as
configuration, of a coalition to meet The nonfunctional erators must use information from the
system, intervening to enable it to re-
organizational needs. Based on the
ideas in the SEI ULSS report and on our
(and, often, the cover from failure. This means under-
functional) behavior
standing the socio-technical processes
own experience in the U.K. LSCITS Ini- of failure recovery, the support the
tiative, we have identified 10 questions
that can help define a research agenda
of coalitions operators need, and how to design co-
alition members to be “good citizens”
for future LSCITS software engineering: of systems is able to support failure recovery.
How can interactions between inde-
pendent systems be modeled and simu-
emergent and How can socio-technical factors be
integrated into systems and software-
lated? To help understand and manage impossible to engineering methods? Software- and
coalitions of systems LSCITS engineers
need dynamic models that are updated predict completely. systems-engineering methods support
development of technical systems and
in real time with information from generally consider human, social, and
the system itself. These models are organizational issues to be outside
needed to help make what-if assess- the system’s boundary. However, such
ments of the consequences of system- nontechnical factors significantly af-
change options. This requires new fect development, integration, and
performance- and failure-modeling operation of coalitions of systems.
techniques where the models adapt au- Though a considerable body of work
tomatically due to system-monitoring covers socio-technical systems, it has
data. We do not suggest simulations not been industrialized or made acces-
can be complete or predict all possible sible to practitioners. Baxter and Som-
problems. However, other engineering merville2 surveyed this work and pro-
disciplines (such as civil and aeronau- posed a route to industrial-scale use
tical engineering) have benefited enor- of socio-technical methods. However,
mously from simulation, and compa- much more research and experience is
rable benefits could be achieved for required before socio-technical analy-
software engineering. ses are used routinely for complex sys-
How can coalitions of systems be mon- tems engineering.
itored? And what are the warning signs To what extent can coalitions of sys-
problems produce? In the run-up to the tems be self-managing? Needed is re-
Flash Crash, no warning signs indicat- search into self-management so sys-
ed the market was tending toward an tems are able to detect changes in both
unstable state. To help avoid transition their operation and operational envi-
to an unstable system state, systems ronment and dynamically reconfigure
engineers need to know the indicators themselves to cope with the changes.
that provide information about the The danger is that reconfiguration will
state of the coalition of systems, how create further complexity, so a key re-
they may be used to provide both early quirement is for the techniques to op-
warnings of system problems, and, if erate in a safe, predictable, auditable
necessary, switch to safe-mode operat- way ensuring self-management does
ing conditions that prevent the possi- not conflict with “design for recovery.”
bility of damage. How can organizations manage com-
How can systems be designed to re- plex, dynamically changing system con-
cover from failure? A fundamental prin- figurations? Coalitions of systems will
ciple of software engineering is that be constructed through orchestration
systems should be built so they do not and configuration, and desired system
fail, leading to development of meth- configurations will change dynami-
ods and tools based on fault avoidance, cally in response to load, indicators
fault detection, and fault tolerance. of system health, unavailability of
However, as coalitions of systems are components, and system-health warn-
constructed with independently man- ings. Ways are needed to support con-
aged elements and negotiated require- struction by configuration, managing
ments, avoiding failure is increasingly configuration changes and recording
impractical. Indeed, what seems to be changes, including automated chang-
ju ly 2 0 1 2 | vo l. 55 | n o. 7 | c om m u n ic at ion s of t he acm 75
6. contributed articles
es from the self-management system, sive. For some safety-critical systems, key problem will not be compatibility
in real time, so an audit trail includes the cost of certification can exceed the but understanding what the informa-
the configuration of the coalition at cost of development, and certification tion exchange really means. This is ad-
any point in time. costs will increase as systems become dressed today on a system-by-system
How should the agile engineering of larger and more complex. Though cer- basis through negotiation between
coalitions of systems be supported? The tification as practiced today is almost system owners to clarify the meaning
business environment changes quickly certainly impossible for coalitions of of shared information. However, if
in response to economic circumstanc- systems, research is urgently needed dynamic coalitions are allowed, with
es, competition, and business reorga- into incremental and evolutionary cer- systems entering and leaving the coali-
nization. Likewise, coalitions of sys- tification so our ability to deploy criti- tion, negotiation is not practical. The
tems must be able to change quickly to cal complex systems is not curtailed by key is developing a means of sharing
reflect current business needs. A model certification requirements. This issue the meaning of information, perhaps
of system change that relies on lengthy is social, as well as technical, as societ- through ontologies like those pro-
processes of requirements analysis and ies decide what level of certification is posed by Antoniou and van Harmelen1
approval does not work. Agile methods socially and legally acceptable. involving the semantic Web.
of programming have been success- How can systems undergo “probabilis- A major problem researchers must
ful for small- to medium-size systems tic verification”? Today’s techniques of address is lack of knowledge of what
where the dominant activity is systems system testing and more formal analy- happens in real systems. High-profile
development. For large complex sys- sis are based on the assumption that a failures (such as the Flash Crash) lead
tems, development processes are often system involves a definitive specifica- to inquiries, but more is needed about
dominated by coordination activities tion and that behavior deviating from the practical difficulties faced by de-
involving multiple stakeholders and it is recognized. Coalitions of systems velopers and operators of coalitions
engineers who are not colocated. How have no such specification nor is sys- of systems and how to address them
can agile approaches be effective for tem behavior guaranteed to be deter- as they arise. New ideas, tools, and
“systems development in the large” to ministic. The key verification issue will methods must be supported by long-
support multi-organization global sys- not be whether the system is correct term empirical studies of the systems
tems development? but the probability that it satisfies es- and their development processes to
How should coalitions of systems sential properties (such as safety) that provide a solid information base for re-
be regulated and certified? Many such take into account its probabilistic, real- search and innovation.
coalitions represent critical systems, time, nondeterministic behavior.8,11 The U.K. LSCITS Initiative5 address-
failure of which could threaten individ- How should shared knowledge in a es some of them, working with partners
uals, organizations, and national econ- coalition of systems be represented? We from the computer, financial services,
omies. They may have to be certified by assume the systems in a coalition in- and health-care industries to develop
regulators checking that, as far as pos- teract through service interfaces so an understanding of the fundamental
sible, they do not pose a threat to their the system has no overarching con- systems engineering problems they
operators or to the wider systems en- troller. Information is encoded in a face. Key to this work is a long-term en-
vironment. But certification is expen- standards-based representation. The gagement with the National Health In-
formation Center to create coalitions
Figure 2. Outline structure for master’s course in LSCITS. of systems to provide external access to
and analysis of vast amounts of health
and patient data.
The project is developing practical
Systems Engineering Business
techniques of socio-technical systems
Ultra-large-scale Systems Organizational engineering2 and exploring design for
systems engineering change failure.18 It has so far developed prac-
Complexity Systems Legal and tical, predictable techniques for au-
procurement regulatory issues tonomic system management3,4 and
Mathematical
modeling Systems Technology is investigating the scaling up of agile
integration innovation methods14 and exploring incremental
Socio-technical
systems Systems Program system certification9 and development
resilience management of techniques for system simulation
and modeling.
Education. To address the practical
Options from computer science, engineering, psychology, business
issues of creating, managing, and op-
erating LSCITS, engineers need knowl-
Industrial project
edge and understanding of the systems
and with techniques outside a “nor-
mal” software- or systems-engineering
education. In the U.K., the LSCITS Ini-
tiative provides a new kind of doctoral
76 comm unicatio ns o f the acm | j u ly 201 2 | vo l . 5 5 | no. 7