SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Open Source SOA in
         the Cloud: Data
       Analytics in the Cloud
Tom Plunkett   TomPlunkett@vt.edu
Michael Sick   michael.sick@serenesoftware.com


           SOA World 2009
Overview
Unit of measure
                                                  • Who are we?
                                 Introductions
                                                  • Baselines & definitions

                                                  • Targeted Use Cases
                                 Opportunity      • Technical convergence & opportunities
                                                  • Commercial opportunities & drivers

                                                  • State of current technology
Data Analytics                   Technology &
                                                  • Commercial & FOSS solutions
in the Cloud                     Standards
                                                  • Hadoop Focus

                                                  • Challenges to Meet Target Use Cases
                                 Challenges       • Economic challenges & the role of “free”
                                                  • Wide scale challenges in Cloud and data analytics

                                                  • Questions
                                 Questions
                                                  • Contacts
         *     Footnote
         Source:                   Source
This work is licensed under a Creative                                                    Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                  2
License
Introductions




                                     Data Analytics in the Cloud:                 Data Analytics
                                                                                  in the Cloud
                                                                                                   Opportunity



                                                                                                   Technology &
                                                                                                   Standards




                                           Introductions
                                                                                                   Challenges



                                                                                                   Questions




Unit of measure

                                 Introductions



                                 Opportunity



Data Analytics                   Technology &
in the Cloud                     Standards


                                 Challenges



                                 Questions
         *     Footnote
         Source:                   Source
This work is licensed under a Creative                              Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States              3
License
Introductions



                                                                                                                     Opportunity




                                                  Tom Plunkett
                                                                                                    Data Analytics   Technology &
                                                                                                    in the Cloud     Standards



                                                                                                                     Challenges



                                                                                                                     Questions




Unit of measure




                                            Extensive Federal Government Experience

                                            Java and SOA Certifications

                                            Patents

                                            Teach OOP and Java for Virginia Tech




         *     Footnote
         Source:                   Source
This work is licensed under a Creative                                                Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                           4
License
Introductions



                                                                                                                                 Opportunity




                                                        Michael Sick
                                                                                                                Data Analytics   Technology &
                                                                                                                in the Cloud     Standards



                                                                                                                                 Challenges



                                                                                                                                 Questions




Unit of measure




                                            Commercial & Federal Enterprise Architect


                                            Owner: Serene Software Inc. – EA Services Firm

                                            Clients include: BAE, USAF, Raytheon, BearingPoint,
                                            McGraw-Hill, Sun Microsystems, Badcock Furniture

                                            Fascinated by technology -15 years running




         *     Footnote
         Source:                   Source
This work is licensed under a Creative                                                            Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                                5
License
Introductions



                                                                                                                              Opportunity




                                               Serene Software
                                                                                                             Data Analytics   Technology &
                                                                                                             in the Cloud     Standards



                                                                                                                              Challenges



                                                                                                                              Questions




Unit of measure
                                • Serene is a boutique consulting company focusing on
                                  delivery of Enterprise Architecture services and solutions
                                • Service Areas
                                  – Cloud Computing
                                    – IT Governance
                                    – IT Strategy
                                    – IT Cost Containment
                                    – Service Oriented Architectures (SOA)
                                    – IT Solution Selection
                                    – IT Audit & Analysis
                                • Experience includes: BAE, USAF, Raytheon, BearingPoint,
                                  McGraw-Hill, Sun Microsystems, Badcock Furniture, …
                                • Founded in 2003 (privately held, no debt) and
                                  headquartered in Jacksonville, FL
         *     Footnote
         Source:                   Source
This work is licensed under a Creative                                                         Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                           6
License
Introductions



                                                                                                                             Opportunity




              Draft NIST Definition of Cloud Computing
                                                                                                            Data Analytics   Technology &
                                                                                                            in the Cloud     Standards



                                                                                                                             Challenges



                                                                                                                             Questions




Unit of measure for enabling convenient, on-demand network access to a shared pool
         A model
               of configurable computing resources that can be rapidly provisioned and relea-
               sed with minimal management effort or service provider interaction

 Essential Characteristics                  Delivery Models                 Deployment Models
 • On-demand self-service                   • Cloud Software as a Service   • Private cloud
                                              (SaaS)
 • Ubiquitous network access                                                • Community cloud
                                            • Cloud Platform as a Service
 • Location independent                                                     • Public cloud
                                              (PaaS)
   resource pooling
                                                                            • Hybrid cloud
                                            • Cloud Infrastructure as a
 • Rapid elasticity
                                              Service (IaaS)
 • Measured Service




         *     Footnote
Source: Draft NIST Definition of Cloud Computing, 06/2009
         Source:                   Source
This work is licensed under a Creative                                                        Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                       7
License
Introductions



                                                                                                                          Opportunity




                                     OSI Open Source Definition
                                                                                                         Data Analytics   Technology &
                                                                                                         in the Cloud     Standards



                                                                                                                          Challenges



                                                                                                                          Questions




Unit of measure                             Free Redistribution

                                            Source Code

                                            Derived Works

                                            Integrity of The Author's Source Code

                                            No Discrimination Against Persons or Groups

                                            No Discrimination Against Fields of Endeavor

                                            Distribution of License

                                            License Must Not Be Specific to a Product

                                            License Must Not Restrict Other Software

                                            License Must Be Technology-Neutral
         *     Footnote
Source: http://www.opensource.org/docs/osd
         Source:                   Source
This work is licensed under a Creative                                                     Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                             8
License
Introductions



                                                                                                                                 Opportunity




                             The Open Group SOA Definition
                                                                                                                Data Analytics   Technology &
                                                                                                                in the Cloud     Standards



                                                                                                                                 Challenges



                                                                                                                                 Questions




Unit of measure




                                  Service-Oriented Architecture (SOA) is an architectural
                                  style that supports service orientation

                                  Service orientation is a way of thinking in terms of services
                                  and service-based development and the outcomes of services




         *     Footnote
Source: http://www.opengroup.org/projects/soa/doc.tpl?gdid=10632
         Source:                   Source
This work is licensed under a Creative                                                            Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                           9
License
Introductions




                   Data Clouds & Data Grids – What‘s the                                                                 Data Analytics
                                                                                                                         in the Cloud
                                                                                                                                          Opportunity



                                                                                                                                          Technology &
                                                                                                                                          Standards




                                difference?
                                                                                                                                          Challenges



                                                                                                                                          Questions




Unit of measure                             Often Data Clouds & Data Grids are used inter-
                                            changeably, we make the following distinctions

 Data Grids                                                             Data Clouds
 • Grid computing system optimized to share                             • Focuses on perception of infinite storage,
   large amounts of distributed data                                      computing capacity
 • Focus on technical capabilities                                      • Focus on cost, virtualization & flexible
                                                                          capacity
 • Often combined with computational grid
   computing systems                                                    • Enables scale-up/scale-down economics
 • Data often moved to compute grid for use                             • Data moved rarely, locality is a key feature
 • Often oriented towards highly structured                             • Clouds thus far focusing on column
   scientific data computing applications                                 oriented, massively scalable data stores



         *     Footnote
Sources: Wikipedia & [Grossman 1]
         Source:                   Source
This work is licensed under a Creative                                                                     Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                              10
License
Introductions



                                                                                                                       Opportunity




                                            Definition: Mashups
                                                                                                      Data Analytics   Technology &
                                                                                                      in the Cloud     Standards



                                                                                                                       Challenges



                                                                                                                       Questions




Unit of measure



                                  Web available resource that combines data/functions
                                  from two or more external resources

                                 Idea of mashup efforts is to reduce the cost of
                                 producing and consuming resources

                                 Integration should be fast, easy

                                 Often focuses on widely available formats/protocols
                                 like RSS or Atom over HTTP




         *     Footnote
         Source:                   Source
This work is licensed under a Creative                                                  Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                           11
License
Introductions




                                     Data Analytics in the Cloud:                 Data Analytics
                                                                                  in the Cloud
                                                                                                   Opportunity



                                                                                                   Technology &
                                                                                                   Standards




                                           Opportunities
                                                                                                   Challenges



                                                                                                   Questions




Unit of measure

                                 Introductions



                                 Opportunity



Data Analytics                   Technology &
in the Cloud                     Standards


                                 Challenges



                                 Questions
         *     Footnote
         Source:                   Source
This work is licensed under a Creative                              Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States              12
License
Introductions




               Use Case: Cloud Data Analytical Tools for                                                  Data Analytics
                                                                                                          in the Cloud
                                                                                                                           Opportunity



                                                                                                                           Technology &
                                                                                                                           Standards




                Intelligence Community Field Analyst
                                                                                                                           Challenges



                                                                                                                           Questions




Unit of measure Problem Statement: Analytical Tools Obsolete On Deployment,
                field analysts need timely, configurable data analytics. How
                does cloud based DA meet the needs of IC analysts

                                            Cloud Analytical
 Customer Problem                                                         Customer Value
                                            Tools Solution
 • Traditional business                     • Recomposable Cloud          • Enabling field analysts to
   intelligence tools require                 Computing Data Analytical     quickly build the analytical
   years to develop                           Tools                         tool they need to analyze
                                                                            petabytes of data
 • Field Analysts confront                    – Apache Hadoop
   situations which are
                                              – Mashups
   rapidly changing
                                              – Service-Oriented
 • Petabytes of data require
                                                Architecture
   analysis



         *     Footnote
         Source:                   Source
This work is licensed under a Creative                                                      Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                      13
License
Introductions




             Why the “Buzzword” Soup? Convergence                                                     Data Analytics
                                                                                                      in the Cloud
                                                                                                                       Opportunity



                                                                                                                       Technology &
                                                                                                                       Standards




                         of Capabilities
                                                                                                                       Challenges



                                                                                                                       Questions




Unit of measure                                                      Convergence of capabilities
                                 Free Open                           New opportunities in breadth
                                   Source                            and depth of DA services
                                  Software                           • Big Data: Cloud disk and data
                                   (FOSS)                              storage engines make peta-
                                                                       byte environments available
                                                                       to new clients
                                                                     • Value Based Billing: Heavy
      Virtual-                     Cloud                 Data          use of FOSS in the cloud
                                             SaaS                      reduces costs directly &
      ization                    Computing               Analytics
                                                                       indirectly
                                                                     • Capacity Scaling: Scaling
                                                                       up/down of capacity in pay-go
                                                                       fashion makes DA available to
                                                                       wider audience
                                   Mashups                           • Composable UI’s: Capability
                                                                       to assemble DA results into
         *     Footnote                                                various interfaces
         Source:                   Source
This work is licensed under a Creative                                                  Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States               14
License
Introductions




                                            Early Data Analytic Cloud
                                                                                                                                             Opportunity



                                                                                                                            Data Analytics   Technology &
                                                                                                                            in the Cloud     Standards




                                             Consumers/Providers
                                                                                                                                             Challenges



                                                                                                                                             Questions




Unit of measure
             Profile                         Types                         Example Companies

                                            Big Internet Companies        • Yahoo, Amazon – can build DA on inf.
                      Internet Scale




                                                                                                                                      Services
                      Service               SaaS Companies                • Force.com – DA & Warehousing to SBA’s
                      Providers                                           • Facebook – sell DA access to anon. user info
                                            Social Platforms

                                            Insurers                      • BCBS – private clouds across consortium




                                                                                                                                      Services
                       Large data-
                       centric Tradi-       Healthcare & Biotech          • Kaiser Permanente – common DA services
 Cloud DA              tional Co’s
                                            Rating Agencies               • S & P – open DA cloud to customers
 Oppor-
 tunities                                                                 • CIA –private org-wide Cloud
                                            Intelligence Community




                                                                                                                                      Services
                       Government
                                            Defense Managed Services • DISA -- offer DA to .mil clients
                       Organizations
                                            Healthcare                    • SSA – offer DA to fraud prevention analysts




                                                                                                                                      Services
                                            DAaas Infrastructure          • Cloudera –managed Hadoop instances
                       DAaaS
                       Providers            SMB DAaaS Provider            • ?? – managed DAaaS, simplified, low cost
         *     Footnote
         Source:                   Source
This work is licensed under a Creative                                                                        Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                                15
License
Introductions




                                     Data Analytics in the Cloud:                 Data Analytics
                                                                                  in the Cloud
                                                                                                   Opportunity



                                                                                                   Technology &
                                                                                                   Standards




                                      Technology & Standards
                                                                                                   Challenges



                                                                                                   Questions




Unit of measure

                                 Introductions



                                 Opportunity



Data Analytics                   Technology &
in the Cloud                     Standards


                                 Challenges



                                 Questions
         *     Footnote
         Source:                   Source
This work is licensed under a Creative                              Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States              16
License
Introductions



                                                                                                                                  Opportunity




                                            Google MapReduce
                                                                                                                 Data Analytics   Technology &
                                                                                                                 in the Cloud     Standards



                                                                                                                                  Challenges



                                                                                                                                  Questions




Unit of measure



                                  Algorithm for computing distributed problems using a
                                  divide and conquer approach with a cluster of nodes

                                  Master node Maps input into smaller sub-problems and
                                  distributes the work to the cluster. A worker node may further
                                  map the work for a further cluster of nodes. The worker nodes
                                  then process the smaller problems, and return the answers back
                                  to the master node

                                  Master node then Reduces the set of answers into the answer to the
                                  original problem



         *     Footnote
         Source:                   Source
This work is licensed under a Creative                                                             Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                          17
License
Introductions



                                                                                                                             Opportunity




                                            Apache Hadoop
                                                                                                            Data Analytics   Technology &
                                                                                                            in the Cloud     Standards



                                                                                                                             Challenges



                                                                                                                             Questions




Unit of measure




                           Open Source implementation of the MapReduce algorithms

                           Hadoop can store and process petabytes of data

                           Subprojects include HBase, Chukwa, Hive, Pig, and ZooKeeper

                           Yahoo (more than 100,000 CPUs in >25,000 computers
                           running Hadoop) and other companies make extensive use of Hadoop




         *     Footnote
         Source:                   Source
This work is licensed under a Creative                                                        Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                       18
License
Introductions




                          As-Is Hadoop Simplified Reference                                          Data Analytics
                                                                                                     in the Cloud
                                                                                                                      Opportunity



                                                                                                                      Technology &
                                                                                                                      Standards




                                    Architecture
                                                                                                                      Challenges



                                                                                                                      Questions




Unit of measure



                                            Chukwa           HBase



                                                                     Structured Data
                                                     Apache Hadoop

                                                                     Unstructured
                                                       Zookeeper
                                                                     Data


               Business
                                            ETL              Pig         Hive
               Intelligence



         *     Footnote
         Source:                   Source
This work is licensed under a Creative                                                 Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                       19
License
Introductions



                                                                                                                         Opportunity




                                   Apache Hadoop Sub-projects
                                                                                                        Data Analytics   Technology &
                                                                                                        in the Cloud     Standards



                                                                                                                         Challenges



                                                                                                                         Questions




Unit of measure
Hadoop Sub-
                               Capabilities                                Example Companies
projects
Chukwa                      • Data collection system for monitoring and   • Yahoo
                              analyzing large distributed systems

HBase                       • Similar to Google’s BigTable                • Yahoo
                            • Distributed database for structured data
                            • Multi-dimensional sorted map

Hive                        • Data warehouse infrastructure for large     • Facebook
                              datasets
                            • Hive QL query language

Pig                         • High-level language for data analysis       • Yahoo
                            • Compiler for Map-Reduce programs

Zookeeper                   • Configuration, Naming, Distributed          • Yahoo
         *     Footnote       Synchronization, and group services
         Source:                   Source
This work is licensed under a Creative                                                    Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                        20
License
Introductions




                                     Data Analytics in the Cloud:                 Data Analytics
                                                                                  in the Cloud
                                                                                                   Opportunity



                                                                                                   Technology &
                                                                                                   Standards




                                             Challenges
                                                                                                   Challenges



                                                                                                   Questions




Unit of measure

                                 Introductions



                                 Opportunity



Data Analytics                   Technology &
in the Cloud                     Standards


                                 Challenges



                                 Questions
         *     Footnote
         Source:                   Source
This work is licensed under a Creative                              Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States              21
License
Introductions



                                                                                                                       Opportunity




                    To-Be Simplified Hadoop Architecture
                                                                                                      Data Analytics   Technology &
                                                                                                      in the Cloud     Standards



                                                                                                                       Challenges



                                                                                                                       Questions




Unit of measure
 REST API

                                                                  HBase
 SOAP API


Business                                                                              Structured
Intelligence                                                                          Data
                                         Query           Apache Hadoop
                                         Language                                     Unstructured
 Pig                                                Chukwa                Zookeeper   Data


 Hive
                                                              Algorithm
                                                              Library

 ETL
         *     Footnote
         Source:                   Source
This work is licensed under a Creative                                                  Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                        22
License
Introductions



                                                                                                                                                      Opportunity




                                              Key Challenges
                                                                                                                                     Data Analytics   Technology &
                                                                                                                                     in the Cloud     Standards



                                                                                                                                                      Challenges



                                                                                                                                                      Questions




Unit of measure                              Hardware                     Speed of Rack Interconnects, Multi-core
                          Infrastructure     Parallelization              Core platform, Data Analytic Components
                                             Node Affinity                Make use of super nodes, XML i/o, en/de-crypt
                                             Cost                         “brutally efficient” pricing, FOSS advantages
                          Adoption           Cost Models                  Accurate, open models of CapEx, OpEx costs
                                             Migration Pain               Full warehouse migration, ETL,
                                             Ease of Admin.               Parallel current RDBMS, Warehouse admin
                                             Debugging                    Distributed debugging, integration w/ Provider
Emerging                  Administration
Challenges                                   Flexible Provisioning        Multi-level provisioning – co., dept, individual
                                             System Reporting             Reporting, audit trails, view to DA system
                                             ETL Integration              Interface, metadata optimized for ETL loading
                          Input & Analysis   Intuitive API’s              Declarative & programmatic cross language
                                             Product Integration          BI, Applications (SAP, Oracle Financial, Lawson)
                                             Data Visualization           Viewing & drill down of very large data sets
                          Output             Intuitive API’s              Declarative & programmatic cross language
         *     Footnote                      Mashups/Dynamics             Easy discovery of data & functions & workflows
         Source:                   Source
This work is licensed under a Creative                                                                                 Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                                23
License
Introductions



                                                                                                                                                      Opportunity




                          Solutions: Projected & In-Progress
                                                                                                                                     Data Analytics   Technology &
                                                                                                                                     in the Cloud     Standards



                                                                                                                                                      Challenges



                                                                                                                                                      Questions




Unit of measure                              Hardware                     Interconnect $$ dropping, hardware maturing
                          Infrastructure     Parallelization              Platforms advance, market for components
                                             Node Affinity                Discovery of capability, affinity into Hadoop, …
                                             Cost                         FOSS’s game to loose, small diff * a lot = a lot
                          Adoption           Cost Models                  Industry standard ROI/IRR models for CC
                                             Migration Pain               Migration toolkits for traditional DW products
                                             Ease of Admin.               Integrated & extended admin packages
                                             Debugging                    Commercial distributed debugging
Emerging                  Administration
Challenges                                   Flexible Provisioning        Multi-level provisioning – co., dept, individual
                                             System Reporting             Reporting, audit trails, view to DA system
                                             ETL Integration              ETL interface, support of popular packages
                          Input & Analysis   Intuitive API’s              SQL like interface in core, language bindings
                                             Product Integration          3rd party adaptors, IWay et al
                                             Data Visualization           Modeling, meta-data, traceability, and new UI’s
                          Output             Intuitive API’s              SQL like interface in core, language bindings
         *     Footnote                      Mashups/Dynamics             Generic datatypes, discovery services
         Source:                   Source
This work is licensed under a Creative                                                                                 Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States                                24
License
Introductions




                                     Data Analytics in the Cloud:                 Data Analytics
                                                                                  in the Cloud
                                                                                                   Opportunity



                                                                                                   Technology &
                                                                                                   Standards




                                             Questions
                                                                                                   Challenges



                                                                                                   Questions




Unit of measure

                                 Introductions



                                 Opportunity



Data Analytics                   Technology &
in the Cloud                     Standards


                                 Challenges



                                 Questions
         *     Footnote
         Source:                   Source
This work is licensed under a Creative                              Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States              25
License
Introductions



                                                                                                            Opportunity




                            Question? & Contact Information
                                                                                           Data Analytics   Technology &
                                                                                           in the Cloud     Standards



                                                                                                            Challenges



                                                                                                            Questions




Unit of measure


       Principle Architect / Partner             Cloud Computing Architect
       Michael A. Sick                           Tom Plunkett
       888.777.1847                              888.777.1847
       michael.sick@serenesoftware.com           TomPlunkett@vt.edu

                                                 Address
       Address                                   Serene Software
       Serene Software                           116 19th Ave. North, Suite 503
       116 19th Ave. North, Suite 503            Jacksonville Beach, FL
       Jacksonville Beach, FL                    URL: www.serenesoftware.com
       URL: www.serenesoftware.com


         *     Footnote
         Source:                   Source
This work is licensed under a Creative                                       Tom Plunkett & Michael Sick
Commons Attribution 3.0 United States       26
License

Weitere ähnliche Inhalte

Ähnlich wie Data Analytics In The Cloud Soa World

It aac defense-it-cloud2013
It aac defense-it-cloud2013It aac defense-it-cloud2013
It aac defense-it-cloud2013
John Weiler
 
12.08.09 Event Mike Perdue Presentation
12.08.09 Event   Mike Perdue Presentation12.08.09 Event   Mike Perdue Presentation
12.08.09 Event Mike Perdue Presentation
mcini
 

Ähnlich wie Data Analytics In The Cloud Soa World (20)

Cloud computing standards
Cloud computing standardsCloud computing standards
Cloud computing standards
 
Forecast 2012 Panel: Cloud Standards Jason Waxman, Intel
Forecast 2012 Panel: Cloud Standards Jason Waxman, IntelForecast 2012 Panel: Cloud Standards Jason Waxman, Intel
Forecast 2012 Panel: Cloud Standards Jason Waxman, Intel
 
Accelerating the Speed of Innovation - Jason Waxman, Intel
Accelerating the Speed of Innovation - Jason Waxman, IntelAccelerating the Speed of Innovation - Jason Waxman, Intel
Accelerating the Speed of Innovation - Jason Waxman, Intel
 
מצגת מטריקס
מצגת מטריקסמצגת מטריקס
מצגת מטריקס
 
מצגת מטריקס
מצגת מטריקסמצגת מטריקס
מצגת מטריקס
 
Research Problem Presentation - Research in Supply Chain Digital Twins
Research Problem Presentation - Research in Supply Chain Digital TwinsResearch Problem Presentation - Research in Supply Chain Digital Twins
Research Problem Presentation - Research in Supply Chain Digital Twins
 
Cloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 TutorialCloud Computing for Developers and Architects - QCon 2008 Tutorial
Cloud Computing for Developers and Architects - QCon 2008 Tutorial
 
Learning from Big Data – Simplify Your Workflow Using Technology Assisted Review
Learning from Big Data – Simplify Your Workflow Using Technology Assisted ReviewLearning from Big Data – Simplify Your Workflow Using Technology Assisted Review
Learning from Big Data – Simplify Your Workflow Using Technology Assisted Review
 
Sukhbir jasuja digital_trends_11
Sukhbir jasuja digital_trends_11Sukhbir jasuja digital_trends_11
Sukhbir jasuja digital_trends_11
 
Lessons learned as an entrepreneur
Lessons learned as an entrepreneurLessons learned as an entrepreneur
Lessons learned as an entrepreneur
 
Big data and its impact on SOA
Big data and its impact on SOABig data and its impact on SOA
Big data and its impact on SOA
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop Sample
 
Info Sec 2010 Possibilities And Security Challenges Of Cloud Computing (Han...
Info Sec 2010   Possibilities And Security Challenges Of Cloud Computing (Han...Info Sec 2010   Possibilities And Security Challenges Of Cloud Computing (Han...
Info Sec 2010 Possibilities And Security Challenges Of Cloud Computing (Han...
 
Voice of the Customer
Voice of the CustomerVoice of the Customer
Voice of the Customer
 
It aac defense-it-cloud2013
It aac defense-it-cloud2013It aac defense-it-cloud2013
It aac defense-it-cloud2013
 
Coleman Technologies Overview
Coleman Technologies OverviewColeman Technologies Overview
Coleman Technologies Overview
 
Coleman Technologies Overview
Coleman Technologies OverviewColeman Technologies Overview
Coleman Technologies Overview
 
Keeping the World Connected with CompTIA Network+
Keeping the World Connected with CompTIA Network+Keeping the World Connected with CompTIA Network+
Keeping the World Connected with CompTIA Network+
 
12.08.09 Event Mike Perdue Presentation
12.08.09 Event   Mike Perdue Presentation12.08.09 Event   Mike Perdue Presentation
12.08.09 Event Mike Perdue Presentation
 
Conférence Open Data par où commencer ? "How to achieve interoperability?" E....
Conférence Open Data par où commencer ? "How to achieve interoperability?" E....Conférence Open Data par où commencer ? "How to achieve interoperability?" E....
Conférence Open Data par où commencer ? "How to achieve interoperability?" E....
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Data Analytics In The Cloud Soa World

  • 1. Open Source SOA in the Cloud: Data Analytics in the Cloud Tom Plunkett TomPlunkett@vt.edu Michael Sick michael.sick@serenesoftware.com SOA World 2009
  • 2. Overview Unit of measure • Who are we? Introductions • Baselines & definitions • Targeted Use Cases Opportunity • Technical convergence & opportunities • Commercial opportunities & drivers • State of current technology Data Analytics Technology & • Commercial & FOSS solutions in the Cloud Standards • Hadoop Focus • Challenges to Meet Target Use Cases Challenges • Economic challenges & the role of “free” • Wide scale challenges in Cloud and data analytics • Questions Questions • Contacts * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 2 License
  • 3. Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Introductions Challenges Questions Unit of measure Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 3 License
  • 4. Introductions Opportunity Tom Plunkett Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Extensive Federal Government Experience Java and SOA Certifications Patents Teach OOP and Java for Virginia Tech * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 4 License
  • 5. Introductions Opportunity Michael Sick Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Commercial & Federal Enterprise Architect Owner: Serene Software Inc. – EA Services Firm Clients include: BAE, USAF, Raytheon, BearingPoint, McGraw-Hill, Sun Microsystems, Badcock Furniture Fascinated by technology -15 years running * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 5 License
  • 6. Introductions Opportunity Serene Software Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure • Serene is a boutique consulting company focusing on delivery of Enterprise Architecture services and solutions • Service Areas – Cloud Computing – IT Governance – IT Strategy – IT Cost Containment – Service Oriented Architectures (SOA) – IT Solution Selection – IT Audit & Analysis • Experience includes: BAE, USAF, Raytheon, BearingPoint, McGraw-Hill, Sun Microsystems, Badcock Furniture, … • Founded in 2003 (privately held, no debt) and headquartered in Jacksonville, FL * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 6 License
  • 7. Introductions Opportunity Draft NIST Definition of Cloud Computing Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure for enabling convenient, on-demand network access to a shared pool A model of configurable computing resources that can be rapidly provisioned and relea- sed with minimal management effort or service provider interaction Essential Characteristics Delivery Models Deployment Models • On-demand self-service • Cloud Software as a Service • Private cloud (SaaS) • Ubiquitous network access • Community cloud • Cloud Platform as a Service • Location independent • Public cloud (PaaS) resource pooling • Hybrid cloud • Cloud Infrastructure as a • Rapid elasticity Service (IaaS) • Measured Service * Footnote Source: Draft NIST Definition of Cloud Computing, 06/2009 Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 7 License
  • 8. Introductions Opportunity OSI Open Source Definition Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Free Redistribution Source Code Derived Works Integrity of The Author's Source Code No Discrimination Against Persons or Groups No Discrimination Against Fields of Endeavor Distribution of License License Must Not Be Specific to a Product License Must Not Restrict Other Software License Must Be Technology-Neutral * Footnote Source: http://www.opensource.org/docs/osd Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 8 License
  • 9. Introductions Opportunity The Open Group SOA Definition Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Service-Oriented Architecture (SOA) is an architectural style that supports service orientation Service orientation is a way of thinking in terms of services and service-based development and the outcomes of services * Footnote Source: http://www.opengroup.org/projects/soa/doc.tpl?gdid=10632 Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 9 License
  • 10. Introductions Data Clouds & Data Grids – What‘s the Data Analytics in the Cloud Opportunity Technology & Standards difference? Challenges Questions Unit of measure Often Data Clouds & Data Grids are used inter- changeably, we make the following distinctions Data Grids Data Clouds • Grid computing system optimized to share • Focuses on perception of infinite storage, large amounts of distributed data computing capacity • Focus on technical capabilities • Focus on cost, virtualization & flexible capacity • Often combined with computational grid computing systems • Enables scale-up/scale-down economics • Data often moved to compute grid for use • Data moved rarely, locality is a key feature • Often oriented towards highly structured • Clouds thus far focusing on column scientific data computing applications oriented, massively scalable data stores * Footnote Sources: Wikipedia & [Grossman 1] Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 10 License
  • 11. Introductions Opportunity Definition: Mashups Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Web available resource that combines data/functions from two or more external resources Idea of mashup efforts is to reduce the cost of producing and consuming resources Integration should be fast, easy Often focuses on widely available formats/protocols like RSS or Atom over HTTP * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 11 License
  • 12. Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Opportunities Challenges Questions Unit of measure Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 12 License
  • 13. Introductions Use Case: Cloud Data Analytical Tools for Data Analytics in the Cloud Opportunity Technology & Standards Intelligence Community Field Analyst Challenges Questions Unit of measure Problem Statement: Analytical Tools Obsolete On Deployment, field analysts need timely, configurable data analytics. How does cloud based DA meet the needs of IC analysts Cloud Analytical Customer Problem Customer Value Tools Solution • Traditional business • Recomposable Cloud • Enabling field analysts to intelligence tools require Computing Data Analytical quickly build the analytical years to develop Tools tool they need to analyze petabytes of data • Field Analysts confront – Apache Hadoop situations which are – Mashups rapidly changing – Service-Oriented • Petabytes of data require Architecture analysis * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 13 License
  • 14. Introductions Why the “Buzzword” Soup? Convergence Data Analytics in the Cloud Opportunity Technology & Standards of Capabilities Challenges Questions Unit of measure Convergence of capabilities Free Open New opportunities in breadth Source and depth of DA services Software • Big Data: Cloud disk and data (FOSS) storage engines make peta- byte environments available to new clients • Value Based Billing: Heavy Virtual- Cloud Data use of FOSS in the cloud SaaS reduces costs directly & ization Computing Analytics indirectly • Capacity Scaling: Scaling up/down of capacity in pay-go fashion makes DA available to wider audience Mashups • Composable UI’s: Capability to assemble DA results into * Footnote various interfaces Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 14 License
  • 15. Introductions Early Data Analytic Cloud Opportunity Data Analytics Technology & in the Cloud Standards Consumers/Providers Challenges Questions Unit of measure Profile Types Example Companies Big Internet Companies • Yahoo, Amazon – can build DA on inf. Internet Scale Services Service SaaS Companies • Force.com – DA & Warehousing to SBA’s Providers • Facebook – sell DA access to anon. user info Social Platforms Insurers • BCBS – private clouds across consortium Services Large data- centric Tradi- Healthcare & Biotech • Kaiser Permanente – common DA services Cloud DA tional Co’s Rating Agencies • S & P – open DA cloud to customers Oppor- tunities • CIA –private org-wide Cloud Intelligence Community Services Government Defense Managed Services • DISA -- offer DA to .mil clients Organizations Healthcare • SSA – offer DA to fraud prevention analysts Services DAaas Infrastructure • Cloudera –managed Hadoop instances DAaaS Providers SMB DAaaS Provider • ?? – managed DAaaS, simplified, low cost * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 15 License
  • 16. Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Technology & Standards Challenges Questions Unit of measure Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 16 License
  • 17. Introductions Opportunity Google MapReduce Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Algorithm for computing distributed problems using a divide and conquer approach with a cluster of nodes Master node Maps input into smaller sub-problems and distributes the work to the cluster. A worker node may further map the work for a further cluster of nodes. The worker nodes then process the smaller problems, and return the answers back to the master node Master node then Reduces the set of answers into the answer to the original problem * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 17 License
  • 18. Introductions Opportunity Apache Hadoop Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Open Source implementation of the MapReduce algorithms Hadoop can store and process petabytes of data Subprojects include HBase, Chukwa, Hive, Pig, and ZooKeeper Yahoo (more than 100,000 CPUs in >25,000 computers running Hadoop) and other companies make extensive use of Hadoop * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 18 License
  • 19. Introductions As-Is Hadoop Simplified Reference Data Analytics in the Cloud Opportunity Technology & Standards Architecture Challenges Questions Unit of measure Chukwa HBase Structured Data Apache Hadoop Unstructured Zookeeper Data Business ETL Pig Hive Intelligence * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 19 License
  • 20. Introductions Opportunity Apache Hadoop Sub-projects Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Hadoop Sub- Capabilities Example Companies projects Chukwa • Data collection system for monitoring and • Yahoo analyzing large distributed systems HBase • Similar to Google’s BigTable • Yahoo • Distributed database for structured data • Multi-dimensional sorted map Hive • Data warehouse infrastructure for large • Facebook datasets • Hive QL query language Pig • High-level language for data analysis • Yahoo • Compiler for Map-Reduce programs Zookeeper • Configuration, Naming, Distributed • Yahoo * Footnote Synchronization, and group services Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 20 License
  • 21. Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Challenges Challenges Questions Unit of measure Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 21 License
  • 22. Introductions Opportunity To-Be Simplified Hadoop Architecture Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure REST API HBase SOAP API Business Structured Intelligence Data Query Apache Hadoop Language Unstructured Pig Chukwa Zookeeper Data Hive Algorithm Library ETL * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 22 License
  • 23. Introductions Opportunity Key Challenges Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Hardware Speed of Rack Interconnects, Multi-core Infrastructure Parallelization Core platform, Data Analytic Components Node Affinity Make use of super nodes, XML i/o, en/de-crypt Cost “brutally efficient” pricing, FOSS advantages Adoption Cost Models Accurate, open models of CapEx, OpEx costs Migration Pain Full warehouse migration, ETL, Ease of Admin. Parallel current RDBMS, Warehouse admin Debugging Distributed debugging, integration w/ Provider Emerging Administration Challenges Flexible Provisioning Multi-level provisioning – co., dept, individual System Reporting Reporting, audit trails, view to DA system ETL Integration Interface, metadata optimized for ETL loading Input & Analysis Intuitive API’s Declarative & programmatic cross language Product Integration BI, Applications (SAP, Oracle Financial, Lawson) Data Visualization Viewing & drill down of very large data sets Output Intuitive API’s Declarative & programmatic cross language * Footnote Mashups/Dynamics Easy discovery of data & functions & workflows Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 23 License
  • 24. Introductions Opportunity Solutions: Projected & In-Progress Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Hardware Interconnect $$ dropping, hardware maturing Infrastructure Parallelization Platforms advance, market for components Node Affinity Discovery of capability, affinity into Hadoop, … Cost FOSS’s game to loose, small diff * a lot = a lot Adoption Cost Models Industry standard ROI/IRR models for CC Migration Pain Migration toolkits for traditional DW products Ease of Admin. Integrated & extended admin packages Debugging Commercial distributed debugging Emerging Administration Challenges Flexible Provisioning Multi-level provisioning – co., dept, individual System Reporting Reporting, audit trails, view to DA system ETL Integration ETL interface, support of popular packages Input & Analysis Intuitive API’s SQL like interface in core, language bindings Product Integration 3rd party adaptors, IWay et al Data Visualization Modeling, meta-data, traceability, and new UI’s Output Intuitive API’s SQL like interface in core, language bindings * Footnote Mashups/Dynamics Generic datatypes, discovery services Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 24 License
  • 25. Introductions Data Analytics in the Cloud: Data Analytics in the Cloud Opportunity Technology & Standards Questions Challenges Questions Unit of measure Introductions Opportunity Data Analytics Technology & in the Cloud Standards Challenges Questions * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 25 License
  • 26. Introductions Opportunity Question? & Contact Information Data Analytics Technology & in the Cloud Standards Challenges Questions Unit of measure Principle Architect / Partner Cloud Computing Architect Michael A. Sick Tom Plunkett 888.777.1847 888.777.1847 michael.sick@serenesoftware.com TomPlunkett@vt.edu Address Address Serene Software Serene Software 116 19th Ave. North, Suite 503 116 19th Ave. North, Suite 503 Jacksonville Beach, FL Jacksonville Beach, FL URL: www.serenesoftware.com URL: www.serenesoftware.com * Footnote Source: Source This work is licensed under a Creative Tom Plunkett & Michael Sick Commons Attribution 3.0 United States 26 License