SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
Massive Data Analytics and the Cloud
A Revolution in Intelligence Analysis




by
Michael Farber
farber_michael@bah.com
Mike Cameron
cameron_mike@bah.com
Christopher Ellis
ellis_christopher@bah.com
Josh Sullivan, Ph.D.
sullivan_joshua@bah.com
Massive Data Analytics and the Cloud
A Revolution in Intelligence Analysis

Cloud computing offers a very important approach to              way organizations store, access, and process massive
achieving lasting strategic advantages by rapidly adapting       amounts of disparate data via massively parallel and
to complex challenges in IT management and data                  distributed IT systems. These technologies include
analytics. This paper discusses the business impact and          Hadoop, MapReduce, BigTable, and other emergent
analytic transformation opportunities of cloud computing.        Data Cloud technologies. Historically, the amount of
Moreover, it highlights the differences among two cloud          processing power that could be applied constrained
architectures—Utility Clouds and Data Clouds—with                the ability to analyze intelligence. Consequently, data
illustrative examples of how Data Clouds are shaping new         representation and processing were restricted by the
advances in Intelligence Analysis.                               amount of available processing power. Complex data
                                                                 models and normalization became common, and data
Intelligence Analysis Transformation                             was forced into models that did not fully represent
Until the advent of the Information Age, the process
                                                                 all aspects of data. Understanding the emergence of
of Intelligence Analysis was one of obtaining and
                                                                 ”big data,” data reflection, and massively scalable
combining sparse and often denied data to derive
                                                                 processing can lead to new insights for Intelligence
intelligence. Analysis needs were well-understood,
                                                                 Analysis, as demonstrated by Google®, Yahoo!®,
with consistent enemies and smooth data growth
                                                                 and Amazon®, which leverage cloud computing and
patterns. With the advent of the Information Age, the
                                                                 Data Clouds to power their businesses. Booz Allen’s
Internet, and rapidly evolving and multi-layered methods
                                                                 experience with cloud computing ideally positions
of disseminating information, including wikis, blogs,
                                                                 the firm to support current and future clients in
e-mail, virtual worlds, online games, VoIP telephone,
                                                                 understanding and adopting cloud computing solutions.
digital photos, instant messages (IM), and tweets, the
raw data and reflections of data on nearly everyone
                                                                 Understanding the Differences Between
and their every activity are becoming increasingly
available. This availability and constant growth of
                                                                 Data Clouds and Utility Clouds
                                                                 Cloud computing offers a wide range of technologies
such vast amounts of disparate data is turning the
                                                                 that support the transformation of data analytics.
Intelligence Analysis problem on its head, transforming
                                                                 As outlined by the National Institute of Standards
it from a process of “stitching together sparse data
                                                                 and Technology (NIST), cloud computing is a model
to derive conclusions” to a process of “extracting
                                                                 “for enabling available, convenient, on-demand
conclusions from aggregation and distillation of
                                                                 network access to a shared pool of configurable
massive data and data reflections.”1
                                                                 computing resources (e.g., networks, servers,
Cloud computing provides new capabilities for                    storage, applications, services) that can be rapidly
performing analysis across all data in an organization.          provisioned and released with minimal management
It uses new technical approaches to store, search,               effort or service provider interaction.”2 A number of
mine, and distribute massive amounts of data. Cloud              cloud computing deployment models exist, such as
computing allows analysts and decision makers to ask             public clouds, private clouds, community clouds, and
ad-hoc analysis questions of massive volumes of data             hybrids. Moreover, two major cloud computing design
in very quick and precise ways. New cloud computing              patterns are emerging: Utility Clouds and Data Clouds.
technologies are driving analytic transformation in the          These distinctions are not mutually exclusive; rather,

1 Discussions with US Government Computational Sciences Group.   2 US National Institute of Standards. http://groups.google.com/group/cloudforum/web/
                                                                 nist-working-definition-of-cloud-computing.



                                                                                                                                                        1
Exhibit 1 | Relationship of Utility Clouds and Data Clouds



                                                                                           Massively Parallel Compute




                                                                            DATA CLOUD
                                                                                              Highly Scalable Multi-
                                                                                             Dimensional Databases

                                                                                             Distributed Highly Fault-
                                                                                            Tolerant Massive Storage

                                    CLOUD
                                  COMPUTING



                                                                  UTILITY CLOUD
                                                                                                  Software as a
                                                                                                  Service (Saas)

                                                                                           Platform as a Service (Paas)

                                                                                         Infrastructure as a Service (Iaas)



    Source: Booz Allen Hamilton




    these designs can work cooperatively to provide              multi-tenancy is a security model that separates and
    economies of scale, resiliency, security, scalability,       protects data and processing. In fact, security—
    and analytics at world scale. As illustrated in Exhibit      focused on ensuring data segmentation, integrity, and
    1, fundamentally different mission objectives drive          access control—is one of the most critical design
    Utility Clouds and Data Clouds. Utility Clouds focus         drivers of Utility Cloud architectures.
    on offering infrastructure, platforms, and software
                                                                 In contrast, the main objective of Data Clouds is the
    as services that many users consume. These basic
                                                                 aggregation of massive data. Data Clouds have an
    building blocks of cloud computing are essential to
                                                                 architectural approach of dividing processing tasks into
    achieving real solutions at scale. Data Clouds can
                                                                 smaller units distributed to servers across the cluster
    leverage those utility building blocks to provide data
                                                                 with co-location of computing and storage capabilities,
    analytics, structured data storage, databases, and
                                                                 allowing highly scaled computation across all of the
    massively parallel computation, which allow analysts
                                                                 data in the cloud. The Data Cloud design pattern
    unprecedented access to mission data and shared
                                                                 typifies data-intensive and processing-intensive usage
    analysis algorithms.
                                                                 scenarios without regard to concepts used in the Utility
    Specifically, Utility Clouds focus on providing IT           Cloud model, such as virtualization and multi-tenancy.
    capabilities as a service; e.g., Infrastructure as a
                                                                 Exhibit 2 contains a comparison of Data Clouds
    Service (IaaS), Platform as a Service (PaaS), and
                                                                 and Utility Clouds. The difference between these
    Software as a Service (SaaS). This service model
                                                                 two architectures is seen in industry. Amazon is an
    scales by allowing multiple constituencies to share
                                                                 excellent model for Utility Cloud computing, and Google
    IT assets, called multi-tenancy. Key to enabling
                                                                 is an excellent model for Data Cloud computing.




2
Cloud Computing Can Solve Problems                              Consider a social network analysis problem from
Previously Out of Reach                                         Facebook® This problem was impossible to solve
                                                                          .
More than just a new design pattern, cloud computing            before the advent of cloud computing:3
comprises a range of technologies that enable new               With 400 to 500 million users and billions of page
forms of analysis that were previously computationally          views every day, Facebook accumulates massive
impossible or too difficult to attempt. These business          amounts of data. One of the challenges it has faced
and mission problems are intractable to solve in the            since its early days is developing a scalable way of
context of traditional IT design and delivery models and        storing and processing all these bytes since historical
have often ended in failed or underwhelming results.            data is a major driver of business innovation and the
Cloud computing allows research into these problems,            user experience on Facebook. This task can only be
opening new business opportunities and new outreach             accomplished by empowering Facebook’s engineers and
to clients with large amounts of data.                          analysts with easy to use tools to mine and manipulate
Much of the current cloud computing discussion                  large data sets. At some point, there isn’t a bigger
focuses on utility computing, economies of scale,               server to buy, and the relational database stops scaling.
and migration of current capabilities to the cloud.             To drive the business forward, Facebook needed a way
Refocusing the conversation to new business                     to query and correlate roughly 15 terabytes of new
opportunities that empower decision makers and                  social network data each day, in different languages, at
analysts to ask previously un-askable questions is the          different times, often from mixed media (e.g., web, IM,
emerging power of the cloud.                                    SMS) streaming in from hundreds of different sources.
What does the cloud allow us to do that we could                Moreover, Facebook needed a massively parallel
not do before? Compute-intensive problems, such as              processing framework and a way to safely and securely
large-scale image processing, sensor data correlation,          store large volumes of data. Several computationally
social network analysis, encryption/decryption, data            impossible problems were lingering at Facebook.
mining, simulations, and pattern recognition, are strong        One of the most interesting of these problems was
examples of problems that can be solved in the cloud            the Facebook Lexicon, a data analysis program that
computing domain.                                               would first allow a user to select a word and would




Exhibit 2 | Comparison of Data and Utility Clouds


                              Data Clouds                                                  Utility Clouds

     • Computing architecture for large-scale data processing   • Computing services for outsourced IT operations
       and analytics
                                                                • Concurrent, independent, multi-tenant user population
     • Designed to operate at trillions of operations/day,
                                                                • Service offerings such as SaaS, PaaS, and IaaS
       petabytes of storage
                                                                • Characterized by data segmentation, hosted
     • Designed for performance, scale, and data processing
                                                                  applications, low cost of ownership, and elasticity
     • Characterized by run-time data models and simplified
       development models


Source: Booz Allen Hamilton


                                                                3  arma, Joydeep Sen. “Hadoop.” Facebook, June 5, 2008. http://www.facebook.com/
                                                                  S
                                                                  notes.php?id=9445547199#!/note.php?note_id=16121578919.



                                                                                                                                                    3
then scan all available data at Facebook (which grows        Distributed Highly Fault-Tolerant Massive Storage
    by 15 terabytes per day), calculate the frequency of         The foundation of Data Cloud computing is the ability
    the word’s occurrence, and graphically display the           to reliably store and process petabytes of data using
    information over time. This task was not possible in a       non-specialized hardware and networking. In practice,
    traditional IT solution because of the large number of       this form of storage has required a new way of
    users, size of data, and time to process. But the Data       thinking about data storage. The Google File System
    Cloud allows Facebook to leverage more than 8,500            (GFS) and Hadoop Distributed File System (HDFS)
    Central Processing Unit (CPU) cores and petabytes of         are two examples of proven approaches to creating
    disk space to create rich data analytics on a wide range     distributed highly fault-tolerant massive storage
    of business characteristics. The Data Cloud allows           systems. Several attributes of highly fault-tolerant
    analysts and technical leaders at Facebook to rapidly        massive storage systems are key innovations in the
    write analytics in the software language of their choice.    Data Cloud design pattern:4
    Subsequently, these analytics run over massive amounts
                                                                 •	 Is reliable, allowing distributed storage and
    of data, condensing down to small, personalized analysis
                                                                    replication of bytes across networks and hardware
    results. These results can then be stored in a traditional
                                                                    assumed to fail at anytime
    relational database, allowing existing reporting and
    financial tools to remain unchanged but to still benefit     •	 Allows for massive, world-scale storage that
    from the computational power of the cloud. Facebook             separates metadata from data
    is now investigating a data warehousing layer that rides
                                                                 •	 Supports a write-once, sporadic append, read-many
    in the cloud and is capable of making decisions and
                                                                    usage structure
    courses of action based on millions of inputs.
                                                                 •	 Stores very large files, often each greater than 1
    The same capabilities inherent in the Facebook Data
                                                                    terabyte in size
    Cloud are available to other organizations in their own
    Data Cloud. Cloud computing is a transformative force        •	 Allows compute cycles to be easily moved to the data
    addressing size, speed, and scale, with a low cost of           store, instead of moving data to a processer farm.
    entry and very high potential benefits. The first step
                                                                 These attributes are especially important for
    toward exploring the power of cloud computing is to
                                                                 Intelligence Analysis. First, a large-scale distributed
    understand the taxonomy differences between Data
                                                                 storage system, which leverages readily available
    Clouds and Utility Clouds.
                                                                 commercial hardware, offers a method of replication
                                                                 across many physical sites for data redundancy.
    Data Cloud Design Features                                   Vast amounts of data can be reliably stored across
    The Data Cloud model offers a transformational
                                                                 a distributed system without costly storage-network
    effect to Intelligence Analysis. Unlike current data
                                                                 arrays, reducing the risk of storing mission-critical data
    warehousing models, the Data Cloud design begins
                                                                 in one central location. Second, because massive scale
    by assuming an organization needs to rapidly store
                                                                 is achieved horizontally (by adding hardware) the ability
    and process massive amounts of chaotic data that is
                                                                 to gauge and predict data growth rates across the
    spread across the enterprise, distressed by differing
                                                                 enterprise becomes dramatically easier by provisioning
    time sequences, and burdened with noise.
                                                                 hardware at the leading edge of a Data Cloud vice in
    Several key attributes of the Data Cloud design pattern      independent storage silos for different projects.
    are specifically remarkable for Intelligence Analysis and
    are discussed in the following sections.



                                                                 4  hemawat, Sanjay; Gobioff, Howard; and Leung, Shun-Tak. “The Google File System.”
                                                                   G
                                                                   Appeared in 19th ACM Symposium on Operating Systems Principles, Lake George, NY,
                                                                   October 2003. http://labs.google.com/papers/gfs.html.


4
Highly Scalable Multi-Dimensional Databases                  inserted into the database with little or no modification.
The distributed highly fault-tolerant massive storage        The database is designed to hold many unique forms
system lacks the ability to fully represent structured       of data, allowing a query-time schema to be generated
data, providing only the raw data storage required           when data is retrieved, instead of an a priori schema
in the Data Cloud. Moving beyond raw data storage            defined when the database is first installed. This is
to representing structured data requires a highly            extremely valuable because as new data arrives in a
scalable database system. Traditional relational             format not seen before, instead of lengthy modifications
database systems are the ubiquitous design solution          to the database schema or complicated parsing to
for structured data storage in conventional enterprise       force the data into a relational model, the data can
applications. Many relational database systems               simply be stored in the native format. By leveraging a
support multi-terabyte tables, relationships, and            multi-dimensional database system that treats all data
complex Structured Query Language (SQL) engines,             as bytes, scales to massive sizes, and allows users to
which provide reliable, well-understood data schemas.        ask questions of the bytes at query-time, organizations
It is very important to understand that relational           have a new choice. Multi-dimensional databases are
database systems are the best solution for many types        in stark contrast to today’s highly normalized relational
of problems, especially when data is highly structured       data model, which includes schemas, relationships,
and volume is less than 10 terabytes. However, a new         and pre-determined storage formats stored on one or
class of problem is emerging when dealing with big           several clustered database servers.
data of volumes greater than 10 terabytes. Although
                                                             Moreover, the power of multi-dimensional databases
relational database models are capable of running in a
                                                             allows computations to be “pre” computed. For
Data Cloud, many current relational database systems
                                                             instance, several large Internet content providers mine
fail in the Data Cloud in two important ways. First,
                                                             millions of data points every hour to pre-compute the
many relational database systems cannot scale to
                                                             best advertisements to show users the next time they
support petabytes or greater amounts of data storage,
                                                             visit one of the provider’s sites. Instead of pulling a
or the database requires highly specialized components
                                                             random advertisement or performing a search when
to accomplish ever-diminishing increases in scale.
                                                             the page is requested, these providers pre-compute
Second, and critically important to Intelligence Analysis,
                                                             the best advertisements to show everyone, at
is the impedance mismatch that occurs as complex
                                                             anytime, on any of their sites based on historical data,
data is normalized into a relational table format.
                                                             advertisement click rates, probability of clicking, and
When data is collected, often the first step is to           revenue models. Similarly, a large US auto insurance
transform the data, normalize the data, and insert a         firm leverages multi-dimensional databases to
row into a relational database. Next, users query data       pre-compute the insurance quote for every car in the
based on keywords or pre-loaded search queries and           United States each night. If a caller asks for a quote,
wait for the results to return. Once returned, users sift    the firm can instantly tell the caller a price because
through results.                                             it was already computed overnight based on many
                                                             input sources.5 The highly scalable nature of such
Multi-dimensional databases offer a fundamentally
                                                             databases allows for massive computational power.
different model. The distributed and highly scalable
design of multi-dimensional databases means data is          There are some major analytic implications here. First,
stored and searched as billions of rows with billions        a computing framework such as a Data Cloud that,
of columns across hundreds of servers. Even more             by design, can scale to handle petabytes of mission
interesting, these forms of databases do not require         data and perform intensive computation across
schemas or predefined definitions of their structure.        all the data, all the time means new insights and
When new data arrives in an unfamiliar format, it can be     discoveries are now possible by looking at big data in
                                                             5  aker, Stephen. “The Two Flavors of Google.” Businessweek.com, December 13, 2007.
                                                               B
                                                               http://www.businessweek.com/magazine/content/07_52/b4064000281756.htm.



                                                                                                                                                    5
many different ways. Consider the corollaries between                                  1,000 nodes, bringing extensive analytic processing
    Intelligence Analysis and the massive correlation/                                     capabilities to users in the enterprise.
    prediction engines at eHarmony® and Netflix® which
                                                  ,
                                                                                           Working in tandem with the distributed file system
    leverage regression algorithms across massive data
                                                                                           and the multi-dimensional database, the MapReduce
    sets to compute personality, character traits, hidden
                                                                                           framework leverages a master node to divide large
    relationships, and group tendencies. These data-driven
                                                                                           jobs into smaller tasks for worker nodes to process.
    analytic systems leverage the attributes of Data Clouds
                                                                                           The framework, capable of running on thousands of
    but target specialized behavioral analysis to advance
                                                                                           machines, attempts to maintain a high level of affinity
    their particular business. Hulu® a popular online
                                   ,
                                                                                           between data and processing, which means the
    television website, uses the Data Cloud to process
                                                                                           framework intelligently moves the processing close
    many permutations of log files about the shows people
                                                                                           to the data to minimize bandwidth needs. Moving the
    are watching.5
                                                                                           compute job to the data is easier than moving large
                                                                                           amounts of data to a central bank of processors.
    Massively Parallel Compute (Analytic Algorithms)
                                                                                           Moreover, the framework manages extrapolative errors,
    Parallel computing is a well-adopted technology
                                                                                           noticing when a worker in the cloud is taking a long
    seen in processor cores and software thread-based
                                                                                           time on one of these tasks or has failed altogether and
    parallelism. However, massively parallel processing—
                                                                                           automatically tasks another node with completing the
    leveraging thousands of networked commodity servers
                                                                                           same task. All these details and abstractions are built
    constrained only by bandwidth—is now the emerging
                                                                                           into the framework. Developers are able to focus on the
    context for the Data Cloud.
                                                                                           analytic value of the jobs they create and no longer worry
    If distributed file systems, such as GFS and HDFS,                                     about the specialized complexities of massively parallel
    and column-oriented databases are employed to                                          computing. An intelligence analyst is able to write 10
    store massive volumes of data, there is then a need                                    to 20 lines of computer code, and the MapReduce
    to analyze and process this data in an intelligent                                     framework will convert it into a massively parallel
    fashion. In the past, writing parallel code required                                   search—working against petabytes of data across
    highly trained developers, complex job coordination,                                   thousands of machines—without requiring the analyst
    and locking services to ensure nodes did not overwrite                                 to know or understand any of these technical details.
    each other. Often, each parallel system would develop
                                                                                           Tasks such as sorting, data mining, image
    unique solutions for each of these problems. These
                                                                                           manipulation, social network analysis, inverted index
    and other complexities inhibited the broad adoption of
                                                                                           construction, and machine learning are prime jobs
    massively parallel processing, meaning that building
                                                                                           for MapReduce. In another scenario, assume that
    and supporting the required hardware and software
                                                                                           terabytes of aerial imagery have been collected
    was reserved for dedicated systems.
                                                                                           for intelligence purposes. Even with an algorithm
    MapReduce, a framework pioneered by Google, has                                        available to detect tanks, planes, or missile silos,
    overcome many of these previous barriers and allows                                    the task of finding these weapons could take days
    for data-intensive computing while abstracting the                                     if run in a conventional manner. Processing 100
    details of the Data Cloud away from the developer.                                     terabytes of imagery on a standard computer takes
    This ability allows analysts and developers to quickly                                 11 days. Processing the same amount of data on
    create many different parallelized analytic algorithms                                 1,000 standard computers takes 15 minutes. By
    that leverage the capabilities of the Data Cloud.                                      incorporating MapReduce, each image or part of an
    Consequently, the same MapReduce job crafted to                                        image becomes its own task and can be examined
    run on a single node can as easily run on a group of                                   in a parallel manner. Distributing this work to many

    5  aker, Stephen. “The Two Flavors of Google.” Businessweek.com, December 13, 2007.
      B
      http://www.businessweek.com/magazine/content/07_52/b4064000281756.htm.



6
computers drastically cuts the time for this job and                                •	 Batch Processing Systems—Log file analysis,
other large tasks, scales the performance linearly by                                  nightly automated relationship matching, regression
adding commodity hardware, and ensures reliability                                     testing, and financial analysis
through data and task replication.
                                                                                    •	 Unpredictable Content—Web portals or intelligence
                                                                                       dissemination systems that vary widely in usage
Programmatic Models for Scaling in the Data Cloud
                                                                                       based on time of day, temporal content stores for
Building applications and the architectures that
                                                                                       conferences, or National Special Security Events.
run in the Data Cloud requires new thinking about
scale, elasticity, and resilience. Cloud application                                Clearly, applications that can leverage the Data Cloud
architectures follow two key tenets: (1) only use                                   can take advantage of the ubiquitous infrastructure
computing resources when needed (elasticity) and (2)                                that already exists within the cloud, elastically drawing
survive drastically changing data volumes (scalability).                            on resources as needed based on scale. Moreover,
Much of this work is accomplished by designing cloud                                the efficient application of computing resources across
applications to dynamically scale by dynamically                                    the cloud and the rapid ability to decrease processing
processing asynchronously queued events. As a result,                               time through massive parallelization is a transformative
any number of jobs can be submitted to the Data                                     force for many stages of Intelligence Analysis.
Cloud, and those jobs are persistently queued for
resilience until completed. The jobs are then removed                               Conclusion
from a queue and distributed across any number of                                   Cloud computing has the potential to transform
worker nodes, drawing on resources in the cloud on                                  how organizations use computing power to create a
demand. When the work is complete, these jobs are                                   collaborative foundation of shared analytics, mission-
closed and the resources returned to the cloud.                                     centric operations, and IT management. Challenges to
                                                                                    the implementation of cloud computing remain, but the
These features, working in concert, achieve scalability
                                                                                    new analytic capabilities of big data, ad-hoc analysis,
across computers and data volume, elasticity of
                                                                                    and massively scalable analytics—combined with the
resource utilization, and resilience for assured
                                                                                    security and financial advantages of switching to a cloud
operational readiness. These forms of application
                                                                                    computing environment—are driving research across the
architecture address the difficulties of massive data
                                                                                    cloud ecosystem. Cloud computing technology offers a
processing. The cloud abstracts the complexities of
                                                                                    very important approach to achieving lasting strategic
resource provisioning, error handling, parallelization,
                                                                                    advantage by rapidly adapting to complex challenges in
and scalability, allowing developers to trade
                                                                                    IT management and data analytics.
sophistication for scale.

The following actions leverage the power of the
Data Cloud:6

•	 Processing Pipelines—Document or image
   processing, video transcoding, indexing data, and
   data mining




6  aria, Jinesh. Cloud Architectures. Amazon, June 16, 2008. http://jineshvaria.
  V
  s3.amazonaws.com/public/cloudarchitectures-varia.pdf.



                                                                                                                                                7
About Booz Allen
    Booz Allen Hamilton has been at the forefront of strategy       rapidly deploy talent and resources, and deliver enduring
    and technology consulting for nearly a century. Today,          results. By combining a consultant’s problem-solving
    Booz Allen is a leading provider of management and              orientation with deep technical knowledge and strong
    technology consulting services to the US government             execution, Booz Allen helps clients achieve success in
    in defense, intelligence, and civil markets, and to major       their most critical missions—as evidenced by the firm’s
    corporations, institutions, and not-for-profit organizations.   many client relationships that span decades. Booz Allen
    In the commercial sector, the firm focuses on leveraging        helps shape thinking and prepare for future developments
    its existing expertise for clients in the financial services,   in areas of national importance, including cybersecurity,
    healthcare, and energy markets, and to international clients    homeland security, healthcare, and information technology.
    in the Middle East. Booz Allen offers clients deep functional
                                                                    Booz Allen is headquartered in McLean, Virginia, employs
    knowledge spanning strategy and organization, engineering
                                                                    more than 25,000 people, and had revenue of $5.59
    and operations, technology, and analytics—which it
                                                                    billion for the 12 months ended March 31, 2011. Fortune
    combines with specialized expertise in clients’ mission and
                                                                    has named Booz Allen one of its “100 Best Companies
    domain areas to help solve their toughest problems.
                                                                    to Work For” for seven consecutive years. Working Mother
    The firm’s management consulting heritage is the basis          has ranked the firm among its “100 Best Companies for
    for its unique collaborative culture and operating model,       Working Mothers” annually since 1999. More information
    enabling Booz Allen to anticipate needs and opportunities,      is available at www.boozallen.com. (NYSE: BAH)


    To learn more about the firm and to download digital versions of this article and other Booz Allen Hamilton
    publications, visit www.boozallen.com.

    Contact Information:
    Michael Farber                   Mike Cameron                   Christopher Ellis              Josh Sullivan, Ph.D.
    Senior Vice President            Principal                      Principal                      Principal
    farber_michael@bah.com           cameron_mike@bah.com           ellis_christopher@bah.com      sullivan_joshua@bah.com
    240-314-5671                     301-543-4432                   301-419-5147                   301-543-4611

8
Principal Offices
Huntsville, Alabama                    Indianapolis, Indiana                  Philadelphia, Pennsylvania
Sierra Vista, Arizona                  Leavenworth, Kansas                    Charleston, South Carolina
Los Angeles, California                Aberdeen, Maryland                     Houston, Texas
San Diego, California                  Annapolis Junction, Maryland           San Antonio, Texas
San Francisco, California              Hanover, Maryland                      Abu Dhabi, United Arab Emirates
Colorado Springs, Colorado             Lexington Park, Maryland               Alexandria, Virginia
Denver, Colorado                       Linthicum, Maryland                    Arlington, Virginia
District of Columbia                   Rockville, Maryland                    Chantilly, Virginia
Orlando, Florida                       Troy, Michigan                         Charlottesville, Virginia
Pensacola, Florida                     Kansas City, Missouri                  Falls Church, Virginia
Sarasota, Florida                      Omaha, Nebraska                        Herndon, Virginia
Tampa, Florida                         Red Bank, New Jersey                   McLean, Virginia
Atlanta, Georgia                       New York, New York                     Norfolk, Virginia
Honolulu, Hawaii                       Rome, New York                         Stafford, Virginia
O’Fallon, Illinois                     Dayton, Ohio                           Seattle, Washington




The most complete, recent list of offices and their addresses and telephone numbers can be found on
www.boozallen.com


www.boozallen.com                                                                             ©2011 Booz Allen Hamilton Inc.

                                                                                                                  08.250.11

Weitere ähnliche Inhalte

Was ist angesagt?

Cloud Expo 2010 Cloud Computing in DoD
Cloud Expo 2010 Cloud Computing in DoDCloud Expo 2010 Cloud Computing in DoD
Cloud Expo 2010 Cloud Computing in DoD
GovCloud Network
 
Business intelligence in_the_cloud
Business intelligence in_the_cloudBusiness intelligence in_the_cloud
Business intelligence in_the_cloud
Prachyanun Nilsook
 
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
YogeshIJTSRD
 
Analytics as a Service in SL
Analytics as a Service in SLAnalytics as a Service in SL
Analytics as a Service in SL
SkylabReddy Vanga
 
melding digital asset management
melding digital asset managementmelding digital asset management
melding digital asset management
David Miller
 
Security in the cloud planning guide
Security in the cloud planning guideSecurity in the cloud planning guide
Security in the cloud planning guide
Yury Chemerkin
 

Was ist angesagt? (20)

Enabling Cloud Analytics with Data-Level Security
Enabling Cloud Analytics with Data-Level SecurityEnabling Cloud Analytics with Data-Level Security
Enabling Cloud Analytics with Data-Level Security
 
Cloud Expo 2010 Cloud Computing in DoD
Cloud Expo 2010 Cloud Computing in DoDCloud Expo 2010 Cloud Computing in DoD
Cloud Expo 2010 Cloud Computing in DoD
 
What is the future of cloud security linked in
What is the future of cloud security linked inWhat is the future of cloud security linked in
What is the future of cloud security linked in
 
Should we fear the cloud?
Should we fear the cloud?Should we fear the cloud?
Should we fear the cloud?
 
Is your infrastructure holding you back?
Is your infrastructure holding you back?Is your infrastructure holding you back?
Is your infrastructure holding you back?
 
Business intelligence in_the_cloud
Business intelligence in_the_cloudBusiness intelligence in_the_cloud
Business intelligence in_the_cloud
 
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
Cloud Analytics Ability to Design, Build, Secure, and Maintain Analytics Solu...
 
Secure Digital Transformation- Cybersecurity Skills for a Safe Journey to Dev...
Secure Digital Transformation- Cybersecurity Skills for a Safe Journey to Dev...Secure Digital Transformation- Cybersecurity Skills for a Safe Journey to Dev...
Secure Digital Transformation- Cybersecurity Skills for a Safe Journey to Dev...
 
Analytics as a Service in SL
Analytics as a Service in SLAnalytics as a Service in SL
Analytics as a Service in SL
 
Cybersecurity in the Age of Mobility
Cybersecurity in the Age of MobilityCybersecurity in the Age of Mobility
Cybersecurity in the Age of Mobility
 
Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...
Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...
Enabling Big Data with Data-Level Security:The Cloud Analytics Reference Arch...
 
melding digital asset management
melding digital asset managementmelding digital asset management
melding digital asset management
 
Cloud Data Protection for the Masses
Cloud Data Protection for the MassesCloud Data Protection for the Masses
Cloud Data Protection for the Masses
 
J3602068071
J3602068071J3602068071
J3602068071
 
Hybrid cloud- driving a business
Hybrid cloud- driving a businessHybrid cloud- driving a business
Hybrid cloud- driving a business
 
BriefingsDirect Transcript--How security leverages virtualization to counter ...
BriefingsDirect Transcript--How security leverages virtualization to counter ...BriefingsDirect Transcript--How security leverages virtualization to counter ...
BriefingsDirect Transcript--How security leverages virtualization to counter ...
 
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
Unlocking Value in the Fragmented World of Big Data Analytics (POV Paper)
 
Are you ready for the private cloud? [WHITEPAPER]
Are you ready for the  private cloud? [WHITEPAPER]Are you ready for the  private cloud? [WHITEPAPER]
Are you ready for the private cloud? [WHITEPAPER]
 
A REVIEW ON RESOURCE ALLOCATION MECHANISM IN CLOUD ENVIORNMENT
A REVIEW ON RESOURCE ALLOCATION MECHANISM IN CLOUD ENVIORNMENTA REVIEW ON RESOURCE ALLOCATION MECHANISM IN CLOUD ENVIORNMENT
A REVIEW ON RESOURCE ALLOCATION MECHANISM IN CLOUD ENVIORNMENT
 
Security in the cloud planning guide
Security in the cloud planning guideSecurity in the cloud planning guide
Security in the cloud planning guide
 

Ähnlich wie Massive Data Analytics and the Cloud

Cloud computing
Cloud computingCloud computing
Cloud computing
saralaanuj
 
Assessing no sql databases for telecom applications
Assessing no sql databases for telecom applicationsAssessing no sql databases for telecom applications
Assessing no sql databases for telecom applications
João Gabriel Lima
 
Telecom trends 261112
Telecom trends 261112Telecom trends 261112
Telecom trends 261112
Sharon Rozov
 
Capacity Management in a Cloud Computing World
Capacity Management in a Cloud Computing WorldCapacity Management in a Cloud Computing World
Capacity Management in a Cloud Computing World
David Linthicum
 
Cloud Computing Big Data Is Future Of It
Cloud Computing Big  Data Is Future Of ItCloud Computing Big  Data Is Future Of It
Cloud Computing Big Data Is Future Of It
Aman Ghei
 

Ähnlich wie Massive Data Analytics and the Cloud (20)

Big data security and privacy issues in the
Big data security and privacy issues in theBig data security and privacy issues in the
Big data security and privacy issues in the
 
BIG DATA SECURITY AND PRIVACY ISSUES IN THE CLOUD
BIG DATA SECURITY AND PRIVACY ISSUES IN THE CLOUD BIG DATA SECURITY AND PRIVACY ISSUES IN THE CLOUD
BIG DATA SECURITY AND PRIVACY ISSUES IN THE CLOUD
 
Mobile Data Analytics
Mobile Data AnalyticsMobile Data Analytics
Mobile Data Analytics
 
To Cloud, or Not to Cloud?
To Cloud, or Not to Cloud?To Cloud, or Not to Cloud?
To Cloud, or Not to Cloud?
 
Big Data Analytics in the Cloud for Business Intelligence.docx
Big Data Analytics in the Cloud for Business Intelligence.docxBig Data Analytics in the Cloud for Business Intelligence.docx
Big Data Analytics in the Cloud for Business Intelligence.docx
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Cloud computingjun28
Cloud computingjun28Cloud computingjun28
Cloud computingjun28
 
Assessing no sql databases for telecom applications
Assessing no sql databases for telecom applicationsAssessing no sql databases for telecom applications
Assessing no sql databases for telecom applications
 
Comparative study of Data management for cloud computing deployment
Comparative study of Data management for cloud computing deploymentComparative study of Data management for cloud computing deployment
Comparative study of Data management for cloud computing deployment
 
Efficient and reliable hybrid cloud architecture for big database
Efficient and reliable hybrid cloud architecture for big databaseEfficient and reliable hybrid cloud architecture for big database
Efficient and reliable hybrid cloud architecture for big database
 
understanding and Leveraging Cloud Xcomputing
understanding and Leveraging Cloud Xcomputingunderstanding and Leveraging Cloud Xcomputing
understanding and Leveraging Cloud Xcomputing
 
Telecom trends 261112
Telecom trends 261112Telecom trends 261112
Telecom trends 261112
 
iStart hitchhikers guide to cloud computing
iStart hitchhikers guide to cloud computingiStart hitchhikers guide to cloud computing
iStart hitchhikers guide to cloud computing
 
Capacity Management in a Cloud Computing World
Capacity Management in a Cloud Computing WorldCapacity Management in a Cloud Computing World
Capacity Management in a Cloud Computing World
 
Cloud Computing Big Data Is Future Of It
Cloud Computing Big  Data Is Future Of ItCloud Computing Big  Data Is Future Of It
Cloud Computing Big Data Is Future Of It
 
Sybase IQ Big Data
Sybase IQ Big DataSybase IQ Big Data
Sybase IQ Big Data
 
Sybase IQ ve Big Data
Sybase IQ ve Big DataSybase IQ ve Big Data
Sybase IQ ve Big Data
 
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
 
Scientific Cloud Computing: Present & Future
Scientific Cloud Computing: Present & FutureScientific Cloud Computing: Present & Future
Scientific Cloud Computing: Present & Future
 

Mehr von Booz Allen Hamilton

You Can Hack That: How to Use Hackathons to Solve Your Toughest Challenges
You Can Hack That: How to Use Hackathons to Solve Your Toughest ChallengesYou Can Hack That: How to Use Hackathons to Solve Your Toughest Challenges
You Can Hack That: How to Use Hackathons to Solve Your Toughest Challenges
Booz Allen Hamilton
 

Mehr von Booz Allen Hamilton (20)

You Can Hack That: How to Use Hackathons to Solve Your Toughest Challenges
You Can Hack That: How to Use Hackathons to Solve Your Toughest ChallengesYou Can Hack That: How to Use Hackathons to Solve Your Toughest Challenges
You Can Hack That: How to Use Hackathons to Solve Your Toughest Challenges
 
Examining Flexibility in the Workplace for Working Moms
Examining Flexibility in the Workplace for Working MomsExamining Flexibility in the Workplace for Working Moms
Examining Flexibility in the Workplace for Working Moms
 
The True Cost of Childcare
The True Cost of ChildcareThe True Cost of Childcare
The True Cost of Childcare
 
Booz Allen's 10 Cyber Priorities for Boards of Directors
Booz Allen's 10 Cyber Priorities for Boards of DirectorsBooz Allen's 10 Cyber Priorities for Boards of Directors
Booz Allen's 10 Cyber Priorities for Boards of Directors
 
Inaugural Addresses
Inaugural AddressesInaugural Addresses
Inaugural Addresses
 
Military Spouse Career Roadmap
Military Spouse Career Roadmap Military Spouse Career Roadmap
Military Spouse Career Roadmap
 
Homeland Threats: Today and Tomorrow
Homeland Threats: Today and TomorrowHomeland Threats: Today and Tomorrow
Homeland Threats: Today and Tomorrow
 
Preparing for New Healthcare Payment Models
Preparing for New Healthcare Payment ModelsPreparing for New Healthcare Payment Models
Preparing for New Healthcare Payment Models
 
The Product Owner’s Universe: Agile Coaching
The Product Owner’s Universe: Agile CoachingThe Product Owner’s Universe: Agile Coaching
The Product Owner’s Universe: Agile Coaching
 
Immersive Learning: The Future of Training is Here
Immersive Learning: The Future of Training is HereImmersive Learning: The Future of Training is Here
Immersive Learning: The Future of Training is Here
 
Nuclear Promise: Reducing Cost While Improving Performance
Nuclear Promise: Reducing Cost While Improving PerformanceNuclear Promise: Reducing Cost While Improving Performance
Nuclear Promise: Reducing Cost While Improving Performance
 
Frenemies – When Unlikely Partners Join Forces
Frenemies – When Unlikely Partners Join ForcesFrenemies – When Unlikely Partners Join Forces
Frenemies – When Unlikely Partners Join Forces
 
Booz Allen Secure Agile Development
Booz Allen Secure Agile DevelopmentBooz Allen Secure Agile Development
Booz Allen Secure Agile Development
 
Booz Allen Industrial Cybersecurity Threat Briefing
Booz Allen Industrial Cybersecurity Threat BriefingBooz Allen Industrial Cybersecurity Threat Briefing
Booz Allen Industrial Cybersecurity Threat Briefing
 
Booz Allen Hamilton and Market Connections: C4ISR Survey Report
Booz Allen Hamilton and Market Connections: C4ISR Survey ReportBooz Allen Hamilton and Market Connections: C4ISR Survey Report
Booz Allen Hamilton and Market Connections: C4ISR Survey Report
 
CITRIX IN AMAZON WEB SERVICES
CITRIX IN AMAZON WEB SERVICESCITRIX IN AMAZON WEB SERVICES
CITRIX IN AMAZON WEB SERVICES
 
Modern C4ISR Integrates, Innovates and Secures Military Networks
Modern C4ISR Integrates, Innovates and Secures Military NetworksModern C4ISR Integrates, Innovates and Secures Military Networks
Modern C4ISR Integrates, Innovates and Secures Military Networks
 
Agile and Open C4ISR Systems - Helping the Military Integrate, Innovate and S...
Agile and Open C4ISR Systems - Helping the Military Integrate, Innovate and S...Agile and Open C4ISR Systems - Helping the Military Integrate, Innovate and S...
Agile and Open C4ISR Systems - Helping the Military Integrate, Innovate and S...
 
Women On The Leading Edge
Women On The Leading Edge Women On The Leading Edge
Women On The Leading Edge
 
Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Massive Data Analytics and the Cloud

  • 1. Massive Data Analytics and the Cloud A Revolution in Intelligence Analysis by Michael Farber farber_michael@bah.com Mike Cameron cameron_mike@bah.com Christopher Ellis ellis_christopher@bah.com Josh Sullivan, Ph.D. sullivan_joshua@bah.com
  • 2.
  • 3. Massive Data Analytics and the Cloud A Revolution in Intelligence Analysis Cloud computing offers a very important approach to way organizations store, access, and process massive achieving lasting strategic advantages by rapidly adapting amounts of disparate data via massively parallel and to complex challenges in IT management and data distributed IT systems. These technologies include analytics. This paper discusses the business impact and Hadoop, MapReduce, BigTable, and other emergent analytic transformation opportunities of cloud computing. Data Cloud technologies. Historically, the amount of Moreover, it highlights the differences among two cloud processing power that could be applied constrained architectures—Utility Clouds and Data Clouds—with the ability to analyze intelligence. Consequently, data illustrative examples of how Data Clouds are shaping new representation and processing were restricted by the advances in Intelligence Analysis. amount of available processing power. Complex data models and normalization became common, and data Intelligence Analysis Transformation was forced into models that did not fully represent Until the advent of the Information Age, the process all aspects of data. Understanding the emergence of of Intelligence Analysis was one of obtaining and ”big data,” data reflection, and massively scalable combining sparse and often denied data to derive processing can lead to new insights for Intelligence intelligence. Analysis needs were well-understood, Analysis, as demonstrated by Google®, Yahoo!®, with consistent enemies and smooth data growth and Amazon®, which leverage cloud computing and patterns. With the advent of the Information Age, the Data Clouds to power their businesses. Booz Allen’s Internet, and rapidly evolving and multi-layered methods experience with cloud computing ideally positions of disseminating information, including wikis, blogs, the firm to support current and future clients in e-mail, virtual worlds, online games, VoIP telephone, understanding and adopting cloud computing solutions. digital photos, instant messages (IM), and tweets, the raw data and reflections of data on nearly everyone Understanding the Differences Between and their every activity are becoming increasingly available. This availability and constant growth of Data Clouds and Utility Clouds Cloud computing offers a wide range of technologies such vast amounts of disparate data is turning the that support the transformation of data analytics. Intelligence Analysis problem on its head, transforming As outlined by the National Institute of Standards it from a process of “stitching together sparse data and Technology (NIST), cloud computing is a model to derive conclusions” to a process of “extracting “for enabling available, convenient, on-demand conclusions from aggregation and distillation of network access to a shared pool of configurable massive data and data reflections.”1 computing resources (e.g., networks, servers, Cloud computing provides new capabilities for storage, applications, services) that can be rapidly performing analysis across all data in an organization. provisioned and released with minimal management It uses new technical approaches to store, search, effort or service provider interaction.”2 A number of mine, and distribute massive amounts of data. Cloud cloud computing deployment models exist, such as computing allows analysts and decision makers to ask public clouds, private clouds, community clouds, and ad-hoc analysis questions of massive volumes of data hybrids. Moreover, two major cloud computing design in very quick and precise ways. New cloud computing patterns are emerging: Utility Clouds and Data Clouds. technologies are driving analytic transformation in the These distinctions are not mutually exclusive; rather, 1 Discussions with US Government Computational Sciences Group. 2 US National Institute of Standards. http://groups.google.com/group/cloudforum/web/ nist-working-definition-of-cloud-computing. 1
  • 4. Exhibit 1 | Relationship of Utility Clouds and Data Clouds Massively Parallel Compute DATA CLOUD Highly Scalable Multi- Dimensional Databases Distributed Highly Fault- Tolerant Massive Storage CLOUD COMPUTING UTILITY CLOUD Software as a Service (Saas) Platform as a Service (Paas) Infrastructure as a Service (Iaas) Source: Booz Allen Hamilton these designs can work cooperatively to provide multi-tenancy is a security model that separates and economies of scale, resiliency, security, scalability, protects data and processing. In fact, security— and analytics at world scale. As illustrated in Exhibit focused on ensuring data segmentation, integrity, and 1, fundamentally different mission objectives drive access control—is one of the most critical design Utility Clouds and Data Clouds. Utility Clouds focus drivers of Utility Cloud architectures. on offering infrastructure, platforms, and software In contrast, the main objective of Data Clouds is the as services that many users consume. These basic aggregation of massive data. Data Clouds have an building blocks of cloud computing are essential to architectural approach of dividing processing tasks into achieving real solutions at scale. Data Clouds can smaller units distributed to servers across the cluster leverage those utility building blocks to provide data with co-location of computing and storage capabilities, analytics, structured data storage, databases, and allowing highly scaled computation across all of the massively parallel computation, which allow analysts data in the cloud. The Data Cloud design pattern unprecedented access to mission data and shared typifies data-intensive and processing-intensive usage analysis algorithms. scenarios without regard to concepts used in the Utility Specifically, Utility Clouds focus on providing IT Cloud model, such as virtualization and multi-tenancy. capabilities as a service; e.g., Infrastructure as a Exhibit 2 contains a comparison of Data Clouds Service (IaaS), Platform as a Service (PaaS), and and Utility Clouds. The difference between these Software as a Service (SaaS). This service model two architectures is seen in industry. Amazon is an scales by allowing multiple constituencies to share excellent model for Utility Cloud computing, and Google IT assets, called multi-tenancy. Key to enabling is an excellent model for Data Cloud computing. 2
  • 5. Cloud Computing Can Solve Problems Consider a social network analysis problem from Previously Out of Reach Facebook® This problem was impossible to solve . More than just a new design pattern, cloud computing before the advent of cloud computing:3 comprises a range of technologies that enable new With 400 to 500 million users and billions of page forms of analysis that were previously computationally views every day, Facebook accumulates massive impossible or too difficult to attempt. These business amounts of data. One of the challenges it has faced and mission problems are intractable to solve in the since its early days is developing a scalable way of context of traditional IT design and delivery models and storing and processing all these bytes since historical have often ended in failed or underwhelming results. data is a major driver of business innovation and the Cloud computing allows research into these problems, user experience on Facebook. This task can only be opening new business opportunities and new outreach accomplished by empowering Facebook’s engineers and to clients with large amounts of data. analysts with easy to use tools to mine and manipulate Much of the current cloud computing discussion large data sets. At some point, there isn’t a bigger focuses on utility computing, economies of scale, server to buy, and the relational database stops scaling. and migration of current capabilities to the cloud. To drive the business forward, Facebook needed a way Refocusing the conversation to new business to query and correlate roughly 15 terabytes of new opportunities that empower decision makers and social network data each day, in different languages, at analysts to ask previously un-askable questions is the different times, often from mixed media (e.g., web, IM, emerging power of the cloud. SMS) streaming in from hundreds of different sources. What does the cloud allow us to do that we could Moreover, Facebook needed a massively parallel not do before? Compute-intensive problems, such as processing framework and a way to safely and securely large-scale image processing, sensor data correlation, store large volumes of data. Several computationally social network analysis, encryption/decryption, data impossible problems were lingering at Facebook. mining, simulations, and pattern recognition, are strong One of the most interesting of these problems was examples of problems that can be solved in the cloud the Facebook Lexicon, a data analysis program that computing domain. would first allow a user to select a word and would Exhibit 2 | Comparison of Data and Utility Clouds Data Clouds Utility Clouds • Computing architecture for large-scale data processing • Computing services for outsourced IT operations and analytics • Concurrent, independent, multi-tenant user population • Designed to operate at trillions of operations/day, • Service offerings such as SaaS, PaaS, and IaaS petabytes of storage • Characterized by data segmentation, hosted • Designed for performance, scale, and data processing applications, low cost of ownership, and elasticity • Characterized by run-time data models and simplified development models Source: Booz Allen Hamilton 3 arma, Joydeep Sen. “Hadoop.” Facebook, June 5, 2008. http://www.facebook.com/ S notes.php?id=9445547199#!/note.php?note_id=16121578919. 3
  • 6. then scan all available data at Facebook (which grows Distributed Highly Fault-Tolerant Massive Storage by 15 terabytes per day), calculate the frequency of The foundation of Data Cloud computing is the ability the word’s occurrence, and graphically display the to reliably store and process petabytes of data using information over time. This task was not possible in a non-specialized hardware and networking. In practice, traditional IT solution because of the large number of this form of storage has required a new way of users, size of data, and time to process. But the Data thinking about data storage. The Google File System Cloud allows Facebook to leverage more than 8,500 (GFS) and Hadoop Distributed File System (HDFS) Central Processing Unit (CPU) cores and petabytes of are two examples of proven approaches to creating disk space to create rich data analytics on a wide range distributed highly fault-tolerant massive storage of business characteristics. The Data Cloud allows systems. Several attributes of highly fault-tolerant analysts and technical leaders at Facebook to rapidly massive storage systems are key innovations in the write analytics in the software language of their choice. Data Cloud design pattern:4 Subsequently, these analytics run over massive amounts • Is reliable, allowing distributed storage and of data, condensing down to small, personalized analysis replication of bytes across networks and hardware results. These results can then be stored in a traditional assumed to fail at anytime relational database, allowing existing reporting and financial tools to remain unchanged but to still benefit • Allows for massive, world-scale storage that from the computational power of the cloud. Facebook separates metadata from data is now investigating a data warehousing layer that rides • Supports a write-once, sporadic append, read-many in the cloud and is capable of making decisions and usage structure courses of action based on millions of inputs. • Stores very large files, often each greater than 1 The same capabilities inherent in the Facebook Data terabyte in size Cloud are available to other organizations in their own Data Cloud. Cloud computing is a transformative force • Allows compute cycles to be easily moved to the data addressing size, speed, and scale, with a low cost of store, instead of moving data to a processer farm. entry and very high potential benefits. The first step These attributes are especially important for toward exploring the power of cloud computing is to Intelligence Analysis. First, a large-scale distributed understand the taxonomy differences between Data storage system, which leverages readily available Clouds and Utility Clouds. commercial hardware, offers a method of replication across many physical sites for data redundancy. Data Cloud Design Features Vast amounts of data can be reliably stored across The Data Cloud model offers a transformational a distributed system without costly storage-network effect to Intelligence Analysis. Unlike current data arrays, reducing the risk of storing mission-critical data warehousing models, the Data Cloud design begins in one central location. Second, because massive scale by assuming an organization needs to rapidly store is achieved horizontally (by adding hardware) the ability and process massive amounts of chaotic data that is to gauge and predict data growth rates across the spread across the enterprise, distressed by differing enterprise becomes dramatically easier by provisioning time sequences, and burdened with noise. hardware at the leading edge of a Data Cloud vice in Several key attributes of the Data Cloud design pattern independent storage silos for different projects. are specifically remarkable for Intelligence Analysis and are discussed in the following sections. 4 hemawat, Sanjay; Gobioff, Howard; and Leung, Shun-Tak. “The Google File System.” G Appeared in 19th ACM Symposium on Operating Systems Principles, Lake George, NY, October 2003. http://labs.google.com/papers/gfs.html. 4
  • 7. Highly Scalable Multi-Dimensional Databases inserted into the database with little or no modification. The distributed highly fault-tolerant massive storage The database is designed to hold many unique forms system lacks the ability to fully represent structured of data, allowing a query-time schema to be generated data, providing only the raw data storage required when data is retrieved, instead of an a priori schema in the Data Cloud. Moving beyond raw data storage defined when the database is first installed. This is to representing structured data requires a highly extremely valuable because as new data arrives in a scalable database system. Traditional relational format not seen before, instead of lengthy modifications database systems are the ubiquitous design solution to the database schema or complicated parsing to for structured data storage in conventional enterprise force the data into a relational model, the data can applications. Many relational database systems simply be stored in the native format. By leveraging a support multi-terabyte tables, relationships, and multi-dimensional database system that treats all data complex Structured Query Language (SQL) engines, as bytes, scales to massive sizes, and allows users to which provide reliable, well-understood data schemas. ask questions of the bytes at query-time, organizations It is very important to understand that relational have a new choice. Multi-dimensional databases are database systems are the best solution for many types in stark contrast to today’s highly normalized relational of problems, especially when data is highly structured data model, which includes schemas, relationships, and volume is less than 10 terabytes. However, a new and pre-determined storage formats stored on one or class of problem is emerging when dealing with big several clustered database servers. data of volumes greater than 10 terabytes. Although Moreover, the power of multi-dimensional databases relational database models are capable of running in a allows computations to be “pre” computed. For Data Cloud, many current relational database systems instance, several large Internet content providers mine fail in the Data Cloud in two important ways. First, millions of data points every hour to pre-compute the many relational database systems cannot scale to best advertisements to show users the next time they support petabytes or greater amounts of data storage, visit one of the provider’s sites. Instead of pulling a or the database requires highly specialized components random advertisement or performing a search when to accomplish ever-diminishing increases in scale. the page is requested, these providers pre-compute Second, and critically important to Intelligence Analysis, the best advertisements to show everyone, at is the impedance mismatch that occurs as complex anytime, on any of their sites based on historical data, data is normalized into a relational table format. advertisement click rates, probability of clicking, and When data is collected, often the first step is to revenue models. Similarly, a large US auto insurance transform the data, normalize the data, and insert a firm leverages multi-dimensional databases to row into a relational database. Next, users query data pre-compute the insurance quote for every car in the based on keywords or pre-loaded search queries and United States each night. If a caller asks for a quote, wait for the results to return. Once returned, users sift the firm can instantly tell the caller a price because through results. it was already computed overnight based on many input sources.5 The highly scalable nature of such Multi-dimensional databases offer a fundamentally databases allows for massive computational power. different model. The distributed and highly scalable design of multi-dimensional databases means data is There are some major analytic implications here. First, stored and searched as billions of rows with billions a computing framework such as a Data Cloud that, of columns across hundreds of servers. Even more by design, can scale to handle petabytes of mission interesting, these forms of databases do not require data and perform intensive computation across schemas or predefined definitions of their structure. all the data, all the time means new insights and When new data arrives in an unfamiliar format, it can be discoveries are now possible by looking at big data in 5 aker, Stephen. “The Two Flavors of Google.” Businessweek.com, December 13, 2007. B http://www.businessweek.com/magazine/content/07_52/b4064000281756.htm. 5
  • 8. many different ways. Consider the corollaries between 1,000 nodes, bringing extensive analytic processing Intelligence Analysis and the massive correlation/ capabilities to users in the enterprise. prediction engines at eHarmony® and Netflix® which , Working in tandem with the distributed file system leverage regression algorithms across massive data and the multi-dimensional database, the MapReduce sets to compute personality, character traits, hidden framework leverages a master node to divide large relationships, and group tendencies. These data-driven jobs into smaller tasks for worker nodes to process. analytic systems leverage the attributes of Data Clouds The framework, capable of running on thousands of but target specialized behavioral analysis to advance machines, attempts to maintain a high level of affinity their particular business. Hulu® a popular online , between data and processing, which means the television website, uses the Data Cloud to process framework intelligently moves the processing close many permutations of log files about the shows people to the data to minimize bandwidth needs. Moving the are watching.5 compute job to the data is easier than moving large amounts of data to a central bank of processors. Massively Parallel Compute (Analytic Algorithms) Moreover, the framework manages extrapolative errors, Parallel computing is a well-adopted technology noticing when a worker in the cloud is taking a long seen in processor cores and software thread-based time on one of these tasks or has failed altogether and parallelism. However, massively parallel processing— automatically tasks another node with completing the leveraging thousands of networked commodity servers same task. All these details and abstractions are built constrained only by bandwidth—is now the emerging into the framework. Developers are able to focus on the context for the Data Cloud. analytic value of the jobs they create and no longer worry If distributed file systems, such as GFS and HDFS, about the specialized complexities of massively parallel and column-oriented databases are employed to computing. An intelligence analyst is able to write 10 store massive volumes of data, there is then a need to 20 lines of computer code, and the MapReduce to analyze and process this data in an intelligent framework will convert it into a massively parallel fashion. In the past, writing parallel code required search—working against petabytes of data across highly trained developers, complex job coordination, thousands of machines—without requiring the analyst and locking services to ensure nodes did not overwrite to know or understand any of these technical details. each other. Often, each parallel system would develop Tasks such as sorting, data mining, image unique solutions for each of these problems. These manipulation, social network analysis, inverted index and other complexities inhibited the broad adoption of construction, and machine learning are prime jobs massively parallel processing, meaning that building for MapReduce. In another scenario, assume that and supporting the required hardware and software terabytes of aerial imagery have been collected was reserved for dedicated systems. for intelligence purposes. Even with an algorithm MapReduce, a framework pioneered by Google, has available to detect tanks, planes, or missile silos, overcome many of these previous barriers and allows the task of finding these weapons could take days for data-intensive computing while abstracting the if run in a conventional manner. Processing 100 details of the Data Cloud away from the developer. terabytes of imagery on a standard computer takes This ability allows analysts and developers to quickly 11 days. Processing the same amount of data on create many different parallelized analytic algorithms 1,000 standard computers takes 15 minutes. By that leverage the capabilities of the Data Cloud. incorporating MapReduce, each image or part of an Consequently, the same MapReduce job crafted to image becomes its own task and can be examined run on a single node can as easily run on a group of in a parallel manner. Distributing this work to many 5 aker, Stephen. “The Two Flavors of Google.” Businessweek.com, December 13, 2007. B http://www.businessweek.com/magazine/content/07_52/b4064000281756.htm. 6
  • 9. computers drastically cuts the time for this job and • Batch Processing Systems—Log file analysis, other large tasks, scales the performance linearly by nightly automated relationship matching, regression adding commodity hardware, and ensures reliability testing, and financial analysis through data and task replication. • Unpredictable Content—Web portals or intelligence dissemination systems that vary widely in usage Programmatic Models for Scaling in the Data Cloud based on time of day, temporal content stores for Building applications and the architectures that conferences, or National Special Security Events. run in the Data Cloud requires new thinking about scale, elasticity, and resilience. Cloud application Clearly, applications that can leverage the Data Cloud architectures follow two key tenets: (1) only use can take advantage of the ubiquitous infrastructure computing resources when needed (elasticity) and (2) that already exists within the cloud, elastically drawing survive drastically changing data volumes (scalability). on resources as needed based on scale. Moreover, Much of this work is accomplished by designing cloud the efficient application of computing resources across applications to dynamically scale by dynamically the cloud and the rapid ability to decrease processing processing asynchronously queued events. As a result, time through massive parallelization is a transformative any number of jobs can be submitted to the Data force for many stages of Intelligence Analysis. Cloud, and those jobs are persistently queued for resilience until completed. The jobs are then removed Conclusion from a queue and distributed across any number of Cloud computing has the potential to transform worker nodes, drawing on resources in the cloud on how organizations use computing power to create a demand. When the work is complete, these jobs are collaborative foundation of shared analytics, mission- closed and the resources returned to the cloud. centric operations, and IT management. Challenges to the implementation of cloud computing remain, but the These features, working in concert, achieve scalability new analytic capabilities of big data, ad-hoc analysis, across computers and data volume, elasticity of and massively scalable analytics—combined with the resource utilization, and resilience for assured security and financial advantages of switching to a cloud operational readiness. These forms of application computing environment—are driving research across the architecture address the difficulties of massive data cloud ecosystem. Cloud computing technology offers a processing. The cloud abstracts the complexities of very important approach to achieving lasting strategic resource provisioning, error handling, parallelization, advantage by rapidly adapting to complex challenges in and scalability, allowing developers to trade IT management and data analytics. sophistication for scale. The following actions leverage the power of the Data Cloud:6 • Processing Pipelines—Document or image processing, video transcoding, indexing data, and data mining 6 aria, Jinesh. Cloud Architectures. Amazon, June 16, 2008. http://jineshvaria. V s3.amazonaws.com/public/cloudarchitectures-varia.pdf. 7
  • 10. About Booz Allen Booz Allen Hamilton has been at the forefront of strategy rapidly deploy talent and resources, and deliver enduring and technology consulting for nearly a century. Today, results. By combining a consultant’s problem-solving Booz Allen is a leading provider of management and orientation with deep technical knowledge and strong technology consulting services to the US government execution, Booz Allen helps clients achieve success in in defense, intelligence, and civil markets, and to major their most critical missions—as evidenced by the firm’s corporations, institutions, and not-for-profit organizations. many client relationships that span decades. Booz Allen In the commercial sector, the firm focuses on leveraging helps shape thinking and prepare for future developments its existing expertise for clients in the financial services, in areas of national importance, including cybersecurity, healthcare, and energy markets, and to international clients homeland security, healthcare, and information technology. in the Middle East. Booz Allen offers clients deep functional Booz Allen is headquartered in McLean, Virginia, employs knowledge spanning strategy and organization, engineering more than 25,000 people, and had revenue of $5.59 and operations, technology, and analytics—which it billion for the 12 months ended March 31, 2011. Fortune combines with specialized expertise in clients’ mission and has named Booz Allen one of its “100 Best Companies domain areas to help solve their toughest problems. to Work For” for seven consecutive years. Working Mother The firm’s management consulting heritage is the basis has ranked the firm among its “100 Best Companies for for its unique collaborative culture and operating model, Working Mothers” annually since 1999. More information enabling Booz Allen to anticipate needs and opportunities, is available at www.boozallen.com. (NYSE: BAH) To learn more about the firm and to download digital versions of this article and other Booz Allen Hamilton publications, visit www.boozallen.com. Contact Information: Michael Farber Mike Cameron Christopher Ellis Josh Sullivan, Ph.D. Senior Vice President Principal Principal Principal farber_michael@bah.com cameron_mike@bah.com ellis_christopher@bah.com sullivan_joshua@bah.com 240-314-5671 301-543-4432 301-419-5147 301-543-4611 8
  • 11.
  • 12. Principal Offices Huntsville, Alabama Indianapolis, Indiana Philadelphia, Pennsylvania Sierra Vista, Arizona Leavenworth, Kansas Charleston, South Carolina Los Angeles, California Aberdeen, Maryland Houston, Texas San Diego, California Annapolis Junction, Maryland San Antonio, Texas San Francisco, California Hanover, Maryland Abu Dhabi, United Arab Emirates Colorado Springs, Colorado Lexington Park, Maryland Alexandria, Virginia Denver, Colorado Linthicum, Maryland Arlington, Virginia District of Columbia Rockville, Maryland Chantilly, Virginia Orlando, Florida Troy, Michigan Charlottesville, Virginia Pensacola, Florida Kansas City, Missouri Falls Church, Virginia Sarasota, Florida Omaha, Nebraska Herndon, Virginia Tampa, Florida Red Bank, New Jersey McLean, Virginia Atlanta, Georgia New York, New York Norfolk, Virginia Honolulu, Hawaii Rome, New York Stafford, Virginia O’Fallon, Illinois Dayton, Ohio Seattle, Washington The most complete, recent list of offices and their addresses and telephone numbers can be found on www.boozallen.com www.boozallen.com ©2011 Booz Allen Hamilton Inc. 08.250.11