08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Stki summit2012infra v7 - major trends - paradign shifts
1. Trends in
Infrastructure:
Paradigm Shifts
Tell me and I’ll forget
Show me and I may STKI Summit 2012
remember Pini Cohen
Involve me and I’ll VP and Senior Analyst
2. What do we do?
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 2
3. Agenda
Major paradigm shifts
Development and SOA
ESM BSM CMDB
DBMS and DATA
Platforms – Servers
Clients
Storage
Source: http://astonguild.org.uk/files/NEW_MENU_FRONT_RGB%5B1%5D.jpg
Pini Cohen’s work Copyright STKI@2012
3
Do not remove source or attribution from any slide or graph
4. Major paradigm shifts -mini agenda
• Why don’t we see a change when it is coming?
• Big Data and programming models
• The changing end users devices ecosystem
• Infrastructure as Code and DEVOPS
Source: http://www.b2binbound.com/blog/?Tag=paradigm%20shift
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 4
5. Managers Dillema
• Bingo! My product is main stream product (quartiles 2 and 3).
• Now, should I invest in quartiles 1 or 4?
• Most managers will invest in quartile 4
Quality required is improving gradually
Percentage
Source of pic: http://www.buat-nadlan.com/2011/11/blog-post_3065.html
New productcategory
Quality required by Customers
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 5
6. Prof. Clayton Christensen: Disruptive Innovation Model
Remember Digital Equipment Corporation (DEC). “Underdogs become
mainstream faster than we think”. Change towards what looks as
“none mature” areas is crucial
T1 T2
Pini Cohen’s work Copyright STKI@2012
6
Do not remove source or attribution from any slide or graph
7. Last’s year my theme was “The Gap”
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 7
8. Major paradigm shifts-mini agenda
• Why don’t we see a change when it is coming?
• Big Data and programming models
• The changing end users devices ecosystem
• Infrastructure as Code and Devops
Source: http://www.b2binbound.com/blog/?Tag=paradigm%20shift
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 8
9. Big Data Definition – 4 V’s (or more…)
• Volume – tens of TBs and more (15-20TB+)
• Velocity – the speed in which data is added – 10M items
per hour and more. And the speed in which the data needs
to be processed
• Variety – different types of data – structured &
unstructured. In many cases deals with internet of things,
social media, but also with voice, video, etc.
• Variability - able to cope with new attributes and changing
data types – without interrupting the analytical process
(without “import-export”)
• Other optional V’s - validity, volatility, viscosity (resistance
to flow), etc. source: http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.html
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 9
10. The origins of the 3V’s:
• 2002 research by Doug Laney from META Group (now
Gartner):
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 10
11. “Big Data” theme main current usage:
• “Big Data" is just marketing jargon. -Doug Laney,
Gartner source: http://www.computerweekly.com/blogs/cwdn/2011/11/datas-main-drivers-volume-velocity-variety-and-variability.html
Source: http://winnbadisa.com/wp-content/uploads/2011/12/marketing-career-cloud.jpg
• STKI : doing something significantly different from
what you’ve done until now
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 11
12. Big Data at work:
• Orbitz Worldwide has collected 750 terabytes of
unstructured data on their consumers’ behavior – detailed
information from customer online visits and browsing
sessions. Using Hadoop, models have been developed
intended to improve search results and tailor the user
experience based on everything from location, interest in
family travel versus solo travel, and even the kind of device
being used to explore travel options.
• The result? To date, a 7% increase in interaction rate, 37%
growth in stickiness of sessions and a net 2.6% in booking
path engagement.
Source: http://www.deloitte.com/assets/Dcom-UnitedStates/Local%20Assets/Documents/us_cons_techtrends2012_013112.pdf
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 12
13. Example network flow data (possible use – Cyber)
• A huge amount of flow data
• Long-term collection of flow data
Flow data in our campus network ( /16 prefix )
# of Routers 1 Day 1 Month 1 Year
1 1.2 GB 13 GB 156 GB
5 6 GB 65 GB 780 GB
10 12 GB 130 GB 1.5 TB
200 240 GB 2.6 TB
30 TB
• Short-term period of flow data
• Massive flow data from anomaly traffic data of Internet worm and DDoS
• Cluster file system and cloud computing platform
• Google’s programming model, MapReduce, big table [8]
• Open-source system, Hadoop [9]
Source: www.caida.org/workshops/.../wide-casfi1004_wkang.ppt STKI modifications
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 13
14. DW appliances will be discussed later
Teradata EMC Greenplun Oracle Exadata
Source: http://www.asugnews.com/2011/09/06/inside-saps-product-naming-strategies/
Pini Cohen’s work Copyright STKI@2012
14
Microsoft Parallel Data Warehouse
Do not remove source or attribution from any slide or graph
15. Several parts of paradigm changes Elements Concepts
• Storing data for analytics (mainly):
• HDFS – Hadoop File System
• Map Reduce- Programming method mainly for analytics
• Other “Add-on”: Pig, , Hive, JAQL (IBM)
• Storing and retrieving data - DBMS:
• NoSQL – DBMS (not only SQL):
• Cassandra
• MongoDB
• CouchDB
• Hbase
• New ways of manipulating and analyzing all kind data.
Example – how do get specific lead from a Facebook status
“I wish I could see Messi next month in London”? Not
discussed in this presentation (see Einat’s presentation)
New algorithms.
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 15
16. Who Uses Hadoop?
• Amazon/A9 Quantcast
• AOL
Rackspace/Mailtrust
• Facebook
• Fox interactive media
Veoh
• Netflix Yahoo!
• New York Times PowerSet (now
Microsoft)
More at http://wiki.apache.org/hadoop/PoweredBy
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 16
17. Who Uses Cassandra?
• Facebook SimpleGeo
• Digg Rackspace
• Despegar Shazam
• Ooyala SoftwareProjects
• Imagini
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 17
18. Big Data technologies (Hadoop etc.) vs. traditional IT
Traditional IT Big Data
Centralized Storage Local storage
Brand redundant Servers Cheap HW White Boxes
Standard Infrastructure and virtual Is standardization needed?! (in the HW
servers. level). No server virtualization.
Well established backup and DRP Why do I need backup? How do I tackle
procedures DRP (compute clusters that are stretched
over locations)
Traditional vendors Open Source solutions
Mature products and procedures In a new patch for specific issues
sometimes it is written “not implemented
yet”
Traditional programming, SQL Different kind of programming (map-
reduce) , no Joins
Will Big Data infrastructure be part of existing infrastructure or will be
developed as new domain?
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 18
19. The Basic Concept –the internet
• Think Distributed
• Think Parallel
Source: http://retedeicittadini.it/wp-content/uploads/2011/02/network-distributed.gif Source: http://www.catonmat.net/blog/mit-introduction-to-algorithms-
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 19
20. New type of scale:
• Hadoop:
• Up to 4,000 machines in a cluster
• Up to 20 PB in a cluster
• Currently traditional IT technologies can not handle this
kind of scale.
• This scale comes with a cost!
Source: http://www.techsangam.com/wp-content/uploads/2012/01/i_love_scalability_mug.jpg
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 20
21. Brewer's (CAP) Theorem
• It is impossible for a distributed computer system to
simultaneously provide all three of the following
guarantees:
• Consistency (all nodes see the same data at the same time)
• Availability (node failures do not prevent survivors from
continuing to operate)
• Partition Tolerance (the system continues to operate in many
partitions and despite arbitrary message loss)
Source: Scalebase STKI modifications
Professor Eric A. Brewer
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 21
22. Dealing With CAP
• Drop Consistency
• Welcome to the “Eventually Consistent” term.
• At the end – everything will work out just fine - And hey, sometimes
this is a good enough solution
• When no updates occur for a long period of time, eventually all
updates will propagate through the system and all the nodes will
be consistent
• For a given accepted update and a given node, eventually either
the update reaches the node or the node is removed from service
• Known as BASE (Basically Available, Soft state, Eventual
consistency), as opposed to ACID
Source: Scalebase
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 22
23. Hadoop
• Apache Hadoop is a software framework that supports
data-intensive distributed applications
• It enables applications to work with thousands of nodes
and petabytes of data.
• Hadoop was inspired by Google's MapReduce and Google
File System (GFS) papers
• Contains (basically):
• HDFS – Hadoop file System
• MapReduce programming model
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 23
24. HDFS – Hadoop File System
• Parallel
• Distributed on commodity elements
• Throughput over latency
• Reliable and self healing
• For large scale – typical file is gigabytes to terabytes (for
one file!)
• Applications need a write-once-read-many access
model (mainly analytics)
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 24
25. HDFS motivation
• What if you needed to write a program that distributes
data on commodity HW (PC’s or Servers). You would need
to take care of:
• Where is the data located
• How to distribute data between the nodes
• How many times you want to replicate the data
• How to insert, select and update data
• What to do if one node or more fails
• How to add node or to take out a node
• Manage and monitor the environment
• Hadoop File System did it for you!
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 25
26. HDFS: Hadoop Distributed File Systems
• Client requests meta data about a file from namenode
• Data is served directly from datanode
HDFS namenode
Application
(file name, block id)
HDFS Client File namespace /user/css534/input
(block id, block location)
block 3df2
instructions state
(block id, byte range)
HDFS datanode HDFS datanode
block data
Linux local file system Linux local file system
… …
source: http://www.google.co.il/url?sa=t&rct=j&q=Rob+Jordan++Chris+Livdahl+hadoop+filetype%3Apptx&source=web&cd=1&ved=0CCIQFjAA
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 26
27. Datanode Blockreports
File “part-0” will be
replicated twice and will
populatesaved in blocks 1
and 3 (file is big so it has to
be divided to 2 blocks)
Block 1 is on data nodes A and C
source: http://www.google.co.il/url?sa=t&rct=j&q=Rob+Jordan++Chris+Livdahl+hadoop+filetype%3Apptx&source=web&cd=1&ved=0CCIQFjAA
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 27
28. HDFS basic limitations
• Namenode is single point of failure
• Write-once model
• Plan to support appending-writes
• A namespace with an extremely large number of files
exceeds Namenode’s capacity to maintain
• Cannot be mounted by exisiting OS
• Getting data in and out is tedious
• HDFS does not implement / support user quotas / access
permissions
• Data balancing schemes
• No periodic checkpoints
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 28
29. Map Reduce programming model
• In very basic – Brings the program to the data
• Contains two elements:
• Map: this part of the job is performed in parallel asynchronous
by each node
• Reduce: gather the result from the relevant nodes
• In more detail :
• Map : return (write on temp file) a list containing zero or more
( k, v ) pairs
• Output can be a different key from the input
• Output can have same key
• Reduce : return a new list of reduced output from input
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 29
30. MapReduce motivation
• What if you needed to write a program that processes data
that’s on distributed computers?
• You would need to write distributed program that:
• Finds where the data located
• Work on each node and then combine the result from each node
together.
• Where (on the local node) and how (format) to write the
intermediate results
• Find when the jobs of all participating nodes have concluded and
then start the “aggregation” part
• What to do if a job is stuck (restart the job or turn to another node
to perform the same job)
• Hadopp MapReduce is the framework for you!
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 30
31. MapReduce example:
map(String key, String value):
// key: document name
// value: document contents
for each word w in value:
EmitIntermediate(w, "1");
reduce(String key, Iterator values):
// key: a word
// values: a list of counts
int result = 0;
for each v in values:
result += ParseInt(v);
Emit(AsString(result));
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 31
32. Dataflow in Hadoop
Master Job: Word Count
Submit job
All elements – standard HW
map schedule reduce
map reduce
Source: Haifa Labs IBM
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 32
33. Dataflow in Hadoop
Hello World Bye World
Read Hello 1
Input File World 2
map reduce
Block 1 Bye
Hello Hadoop Goodbye Hadoop
HDFS
Block 2 Hello 1
map Hadoop 2 reduce
Goodbye
Source: Haifa Labs IBM
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 33
34. Dataflow in Hadoop
Finished Finished + Location
map Local
FS
reduce
Local
map FS reduce
Source: Haifa Labs IBM
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 34
35. Dataflow in Hadoop
map Local
FS
reduce
HTTP GET
Local
map FS reduce
Source: Haifa Labs IBM
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 35
36. Dataflow in Hadoop
Write
Final
reduce
Answer
HDFS
reduce Bye 1
Goodbye 1
Hadoop 2
Hello 2
World 2
Source: Haifa Labs IBM
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 36
37. Example: Flow Analysis Map/Reduce
• Read text flow files
Flow
Flow Flow Octet
Dst Port • Run map tasks
Flow • Read each line
(Validation Check)
• Parsing flow data
• Save result
53 [64, 128]
into temporary files
(key, value)
53 128
64 53 192 • Run reduce tasks
• Read temporary files
(Key, List[Value])
• Run sum process
• Write results to a file
Source: www.caida.org/workshops/.../wide-casfi1004_wkang.ppt
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 37
38. Components of Cluster Node
Flow File Input
Processor
Flow Analysis Flow Analysis • Flow file
Cluster File Map Reduce
Cluster File
Map Reduce input processor
System
(System)
HDFS • Flow analysis
flow- ( HDFS )
MapReduce Library map/reduce
tools
• Flow-tools
Hadoop • Hadoop
• HDFS
Java Virtual Machine
• MapReduce
Operating System ( Linux ) • Java VM
• OS : Linux
Hardware ( CPU, HDD, Memory, NIC )
Source: www.caida.org/workshops/.../wide-casfi1004_wkang.ppt
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 38
39. MapReduce helprs: Hive, Pig
• Make life easier – translate more friendly language to Map
Reduce
Hive Pig
Language SQL-like PigLatin
Schemas/Types Yes (explicit) Yes (implicit)
Partitions Yes No
Server Optional (Thrift) No
User Defined Functions (UDF) Yes (Java) Yes (Java)
Custom Serializer/Deserializer Yes Yes
DFS Direct Access Yes (implicit) Yes (explicit)
Streaming Yes Yes
Web Interface Yes No
JDBC/ODBC Yes (limited) No
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 39
40. Hive: MapReduce helper:
• Code Example:
• hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a;
• hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a
WHERE a.key < 100;
• hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/reg_3' SELECT a.*
FROM events a;
• hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_4' select a.invites,
a.pokes FROM profiles a;
• hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT COUNT(*)
FROM invites a WHERE a.ds='2008-08-15';
• hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT a.foo, a.bar
FROM invites a;
• hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/sum' SELECT
SUM(a.pc) FROM pc1 a;
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 40
41. NoSQL DBMS: storing and retrieving data
• Key/Value
• A big hash table
• Examples: Voldemort, Amazon’s Dynamo
• Big Table
• Big table, column families
• Examples: Hbase, Cassandra
• Document based
• Collections of collections
• Examples: CouchDB, MongoDB
• Graph databases
• Based on graph theory
• Examples: Neo4J
• Each solves a different problem
Source: Scalebase
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 41
42. Pros/Cons
• Pros:
• Performance
• BigData
• Most solutions are open source
• Data is replicated to nodes and is therefore fault-tolerant
(partitioning)
• Don't require a schema
• Can scale up and down
• Cons:
• Code change
• No framework support
• Not ACID
• Eco system (BI, Backup)
• There is always a database at the backend
• Some API is just too simple
Source: Scalebase
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 42
43. There are some NoSQL projects out there…
Source: NoSQL Databases: Providing Extreme Scale and Flexibility By Matthew D. Sarrel
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 43
44. NoSQL Market Forecast 2011-2015
http://www.marketresearchmedia.com/2010/11/11/nosql-market/
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 44
45. Apache Cassandra
• Cassandra is a highly scalable, eventually
consistent, distributed, structured key-value
store
• Child of Google’s BigTable and Amazon’s
Dynamo
• Peer to peer architecture. All nodes are equal Source: ids.snu.ac.kr/w/images/1/18/2011SS-03.ppt
• Cassandra’s replication factor (RF) is the total
number of nodes onto which the data will be
placed. RF of at least 2 is highly recommended,
keeping in mind that your effective number of
nodes is (N total nodes / RF).
• CQL (Cassandra Query Language) command line
• Time stamp for each value written
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 45
46. Consistent Hashing
• Partition using consistent hashing (for the
first node data is placed) based on MD5
Distributed hash table algorithm A
• Keys hash to a point on a fixed circular
C
space V B
• Ring is partitioned into a set of ordered
slots and servers and keys hashed over
these slots
• Nodes take positions on the circle. S D
• A, B, and D exists.
• B responsible for AB range ( for replication
factor=2 – default).
• D responsible for BD range.
• A responsible for DA range. R H
• C joins.
• B, D split ranges. M
• C gets BC from D.
Source: http://www.intertech.com/resource/usergroup/NoSQL.ppt
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 46
47. Write operation
Source: http://assets.en.oreilly.com/1/event/51/Scaling%20Web%20Applications%20with%20Cassandra%20Presentation.ppt
Pini Cohen’s work Copyright STKI@2012
47
Do not remove source or attribution from any slide or graph
48. Cassandra’s tunable consistency (write)
Level Behavior
Ensure that the write has been written to at least 1 node, including HintedHandoff
ANY
recipients.
Ensure that the write has been written to at least 1 replica's commit log and
ONE
memory table before responding to the client.
Ensure that the write has been written to at least 2 replica's before responding to
TWO
the client.
Ensure that the write has been written to at least 3 replica's before responding to
THREE
the client.
Ensure that the write has been written to N / 2 + 1 replicas before responding to the
QUORUM
client.
Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes, within
LOCAL_QUORUM
the local datacenter (requires NetworkTopologyStrategy)
Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes in each
EACH_QUORUM
datacenter (requires NetworkTopologyStrategy)
Ensure that the write is written to all N replicas before responding to the client. Any
ALL
unresponsive replicas will fail the operation.
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph Source: wiki
48
49. Cassandra’s tunable consistency – read
Level Behavior
ANY Not supported. You probably want ONE instead.
Will return the record returned by the first replica to respond. A consistency check is always done in a
ONE background thread to fix any consistency issues when ConsistencyLevel.ONE is used. This means subsequent
calls will have correct data even if the initial read gets an older value. (This is called ReadRepair)
Will query 2 replicas and return the record with the most recent timestamp. Again, the remaining replicas will
TWO
be checked in the background.
THREE Will query 3 replicas and return the record with the most recent timestamp.
Will query all replicas and return the record with the most recent timestamp once it has at least a majority of
QUORUM
replicas (N / 2 + 1) reported. Again, the remaining replicas will be checked in the background.
LOCAL_QUO Returns the record with the most recent timestamp once a majority of replicas within the local datacenter have
RUM replied.
EACH_QUO Returns the record with the most recent timestamp once a majority of replicas within each datacenter have
RUM replied.
Will query all replicas and return the record with the most recent timestamp once all replicas have replied. Any
ALL
unresponsive replicas will fail the operation.
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph Source: wiki
49
50. Cassandra’s data model structure
Think of cassandra as row-oriented
keyspace
column family
settings
(eg,
partitioner) settings column
(eg,
comparator,
type [Std]) name value clock
Source: http://assets.en.oreilly.com/1/event/51/Scaling%20Web%20Applications%20with%20Cassandra%20Presentation.ppt
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 50
51. Data Model – “flexible” scheme!
ColumnFamily: Rockets
Key Value
1 Name Value
name Rocket-Powered Roller Skates
toon Ready, Set, Zoom
inventoryQty 5
brakes false
2 Name Value
name Little Giant Do-It-Yourself Rocket-Sled Kit
toon Beep Prepared
inventoryQty 4
brakes false
3 Name Value
name Acme Jet Propelled Unicycle
toon Hot Rod and Reel
inventoryQty 1
wheels 1
Source: http://wenku.baidu.com/view/6e254321482fb4daa58d4b87.html
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 51
52. Cassandra’s CQL – Cassandra SQL Language
• SQL like. Example:
• CREATE KEYSPACE test with strategy_class = 'SimpleStrategy' and
strategy_options:replication_factor=1;
• CREATE INDEX ON users (birth_date);
• SELECT * FROM users WHERE state='UT' AND birth_date > 1970;
• However:
• No Joins
• No UPDATES/DELETES
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 52
53. NoSQL benchmark – for scale!
Source: r esearch.yahoo.com/files/ycsb-v4.pdf
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 53
54. Can we live with NoSQL limitations?
• Facebook has dropped Cassandra
• “..we found Cassandra's eventual consistency model to be a
difficult pattern to reconcile for our new Messages
infrastructure”
• Facebook has selected HBase (Columnar DBMS) .
http://www.facebook.com/notes/facebook-engineering/the-underlying-technology-of-
messages/454991608919
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 54
55. What about other NoSQL DBMS?
• MongoDB
• Hbase
• CouchDB
• Maybe next session….
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 55
56. Big Data potential implications on IT
• Will traditional RDBMS be obsolete? Surely no!
• Several areas are Big Data zone by definition – Internet
marketing, Cyber, DW, etc.
• How well can we live with “Eventually Consistent” which in
most cases means 1-2 minutes delay?!
• Can we define that all batch data can live well on Big Data
technologies?
• Will we see at the end (10 years form now) that only small
portion of data still resides on RDBMS and most of the data
resides on Big Data technologies?!
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 56
57. Example of big data technology: SPLUNK
• Splunk is a traditional IT vendor based on MapReduce
(from 2009)
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 57
58. Another aspect of Big Data - IBM Watson wins in Jeopardy
58
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph
59. DeepQA: the technology & architecture behind Watson
Learned Models
help combine and
weigh the Evidence
model model model
Answer Sources Evidence Sources
model model model
Initial Candidate Answer Evidence Deep
Primary
Question Answer Scoring Retrieval Evidence
Search model model model
Generation Scoring
Question Hypothesis
Question Hypothesis Final Confidence
& Topic & Evidence Synthesis
Decomposition Generation Merging & Ranking
Analysis Scoring
Hypothesis Hypothesis and Evidence
Generation Scoring Answer &
Confidence
Hypothesis
Hypothesis and Evidence Scoring
Generation
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 59
60. Where did it acquire knowledge?
Three Domain Data Training and test NLP Resources
(vocabularies,
types of (articles, books, question sets
taxonomies,
documents) w/answer keys
knowledge ontologies)
• Wikipedia
• 17 GB
• Time, Inc.
• 2.0 GB
• New York Time
• 7.4 GB
• Encarta • 0.3 GB
• Oxford University • 0.11 GB
• Internet Movie Database • 0.1 GB
• IBM Dictionary • 0.01 GB
• ... J! Archive/YAGO/dbPedia… XXX
• Total Raw Content • 70 GB
• Preprocessed Content • 500 GB
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 60
61. IBM’s Watson possible implications
If the computer understands my speech, why do I need a
keyboard?
If the computer can talk, why do I need a screen?
If the computer understands semantics and can act with its
own reasoning – why do you need me?!
61
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph
62. Major paradigm shifts -mini agenda
• Why don’t we see a change when it is coming?
• Big Data and programming models
• The changing end user devices ecosystem
• Infrastructure as a s Code and DEVOPS
Source: http://www.b2binbound.com/blog/?Tag=paradigm%20shift
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 62
63. Mega-trend #1 of 21st century
CONSUMERIZATION:
empowerment of people collaborating via
connected mobile devices
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph
64. User Interface Revolution – Touch / Sound(Voice) / Move Era
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 64
65. 2012: Sound/Voice is in
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 65
66. 2012: Face recognition is in
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph
67. Desktop and Mobile ecosystems begin to converge
“BYOD : bring your own device"
employees asserting control over the technology they use for work
4 Devices per employee?!
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph
68. Four screens of convergence: TV, PC, mobile and in-car
• We want to be connected 7X24
• Each of these screens is useful during our
day and each is connected to the 'cloud'
• IT should allow us to use the same
business (IT supports ALL) and
entertainment applications
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 68
69. Can IT support all devices ?
• Employees will use as many
computers and mobile devices as
they wish.
• Automatically keep their data in
sync with a backup copy .
• Solutions should be enterprise class :
• secure
• reliable
• maintainable
• integrated to critical back-office
systems
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 69
70. What about Productivity Software for non-wintel machines?
Office 2015
ARM W8
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 70
71. Israel (expected end 2012):
Wintel: Q42011 compared to Q42010
Desktop PCs: -25% Notebooks: -35%
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph
72. Client/server v2
Client/Server V2
1. Most apps work on/off line
Terminals V 2 2. Most of the time connected
3. Uses cloud/local applications
WEB/Browser client
2 types of applications:
1. Off-line: processing and
storage local
2. Always connected:
Client/Server V1
browser based applications 2 types of applications:
1. Off-line: processing and storage local
Terminals V1 2. Always connected : data and
Always connected
Picture Source: http://sthvcarringtonmedia.blogspot.com/2011/02/emotions.html
processing @server; GUI++ @client
I/O only at the local
ADVANCES/COST
1. Communications/networking
2. Processor/storage
3. Power /battery
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph
73. Windows on ARM
Feature Windows 8 x86/64 Windows 8 on ARM Source: http://lenzfire.com/2011/12/future-of-pc-is-soon-to-be-woa-windows-on-arm-than-to-wintel-85094/
Device Branding Such devices would be These would also be
branded as x86/64 ones branded as ARM
Old Windows 7 Things Everything that runs on Only selective things
Windows 7 would run on would be runnable
these platforms
Virtualization Yes, If hardware supports it Not supported
Turn on/off options Yes, on all devices No, devices would keep
running on Connected
Standby power mode
App Development Yes, many tools are Yes, but with selective
available tools only which are not
yet available
Availability All the sources from where Would be available only
Windows 7 is available e.g. in ARM devices. No,
online, DVD/CD and PC’s etc DVD’s or online
availability WOA – Windows on Arm
Driver availability From respective company’s Only through Windows
site, DVD/CD’s and through Update
Windows Update
Maintenance e.g. Through Windows Disks and Only Through Windows
Updates and Other Windows Update Update
Fixes
Uniqueness Any source would run on a Each source in unique to
wide variety of devices unique device
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 73
74. Microsoft is fighting back
Win8 tabletsphone are: However:
• Easier to managesecure • Microsoft starts from
from enterprise scratch in this markets
perspective
• The “influences” already
• Easier to synchronize
with enterprise data are heavy users mainly of
“stylish Apple”
• Easier to enable
enterprise applications • There are strong forces
(on Intel based devices) within Microsoft to
• Microsoft hopes to “Bring enable business
Your Enterprise to Home” applications to other
BYEH platforms (Office on iPAD
Android..)
Will Microsoft “hidden” dream of “IT enabling only Microsoft tablets and
phones accessing mail enterprise apps” will come true?!
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph
75. A new era. We had it before:
Source: http://www.socialtechpop.com/2010/10/old-vs-new-trends-in-social-media/
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 75
76. And the new era will look like :
Source: http://www.mobilemag.com/2011/01/06/samsungs-hybrid-sliding-pc-7-series-tabletnotebook-thingy/
Computing as we now it today Change at the deviceUX level
and change in application level -
mobility
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 76
77. New Era: IT can no longer dictate a single device
• Looks like the dominance of Microsoft on Intel with C/S or WEB
app is over!
• The new general purpose application architecture will support:
• Data stored in a cloud and in local devices (appropriate formats per
each device).
• Data synchronization with conflict resolution between data instances
• Continuous transaction processing between different devices =
mobility
• Different interfaces to the same application (mainly APPS but also
browser based)
• Application code is native or hybrid for each device
• Offline work (read with update)
• Automatic SW update
• Voice
• Face recognition
• AI reasoning
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 77
78. Major paradigm shifts -mini agenda
• Why don’t we see a change when it is coming?
• Big Data and programming models
• The changing end users devices ecosystem
• Infrastructure as a s Code and DEVOPS
Source: http://www.b2binbound.com/blog/?Tag=paradigm%20shift
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 78
79. Infrastructure as code
• Treat your infrastructure as code:
• AnalyzeDesign
• Develop (the automation scripts)
• Prepare the Build
• Test
• Deploy the Build
• That means – no more manual configurations
• Automatic testing – not only for the apps level
• Also – be sure that what is not in the build – will not be
installed
• Is that possible in the current landscape?!
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 79
80. Some SW definitions:
• Software build - the process of converting source code files
into standalone software artifact(s) that can be run on a
computer. One of the most important steps of a software
build is the compilation process where source code files
are converted into executable code.
• Build automation is the act of automating a wide variety of
tasks that software developers do in their day-to-day
activities including things like:
• compiling computer source code into binary code
• packaging binary code
• running tests
• deployment to production systems
Source: Wiki STKI modifications
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 80
81. Infrastructure as code
• This will enable frequent changes in production
• 180% change from current “versions” policy!
Source: wiki
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 81
82. Opscode - Chef
• With Chef, you write abstract definitions as source code to describe
how you want each part of your infrastructure to be built, and then
apply those descriptions to individual servers.
• The result is a fully automated infrastructure: when a new server
comes on line, the only thing you have to do is tell Chef what role it
should play in your architecture.
Source: opscode
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 82
83. Opscode’s Chef
• Chef agent assures that the desired configuration is
installed!
• All install files scripts are located in a central repository
(Chef Server) in CouchDB
• Tracing what was successful and what not
• Documentation of everything
• Major components: Cookbooks, Precipice , Knife, Shef
• Pull model (can not control when components are
installed)
• Ruby scripting language
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 83
84. Devops – Development and Operations
• Addresses the conflict between Development and
Operations:
• Development – are paid for change
• Operations – change is the enemy!
• “Wall of Confusion” - combination of conflicting
motivations, processes, and tooling
Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 84
85. Devops – Development from Mars, Operations from Venus
• Development and Operations are in different organization
entities and use different tools
Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 85
86. DeploymentRelease time is trouble time
• Development kicks things off by "tossing" a software release
"over the wall" to Operations.
• Operations also hand edit configuration files to reflect the
production environment, which is significantly different than
the Development or QA environments.
• At best they are duplicating work that was already done in
previous environments, at worst they are about to introduce
or uncover new bugs.
Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 86
87. Devops – new state of mind
Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 87
88. Devops aims at:
Source: http://dev2ops.org/blog/2010/2/22/what-is-devops.html
• DevOps enables the benefits of Agile development to be
felt at the organizational level. DevOps does this by
allowing for fast and responsive, yet stable, operations that
can be kept in sync with the pace of innovation coming out
of the development process.
http://en.wikipedia.org/wiki/File:Devops.png
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 88
89. DevOps Addresses Challenges
• DevOps is an operational approach that automates system
configuration and management.
• To manage cloud systems, customers
• Need to manage servers as groups
• Must respond to rapid infrastructure changes
• Have repeatable automated deployments
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 89
90. Striving towards Devops state of mind:
• Measurement and incentives to change culture - metrics
based on joint performance
• Unified processes
• Unified tooling
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 90
91. Devop Measurement
• Resource Utilization - How resources are allocated and how efficiently
they are used. Usually we're talking about people, but other kinds of
resources can fall into this bucket as well.
• How much time do developers and administrators spend on build and deployment
activity?
• How much productivity is lost to problems and bottlenecks? What is the ripple
Source: http://dev2ops.org/blog/2010/1/21/how-to-measure-the-impact-of-it-operations-on-your-business.html
effect of that?
• What’s the ratio of ad-hoc change or service recovery activity to
planned change?
• What’s the cost of moving a unit of change through your lifecycle?
• What's the mean time to diagnose a service outage? Mean time to repair?
• What was the true cost of each build or deployment problem (resource and
schedule impact)?
• What percentage of Development driven changes require Operations to
edit/change procedures or edit/change automation?
• How much management time is spent dealing with build and deployment problems
or change management overhead?
• Can Development and QA successfully deploy their own
environments? How long does it take per deployment?
• How much of your team’s time is spent recreating and maintaining software
infrastructure that already exists elsewhere?
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 91
92. Devop Measurement
• Operations Throughput - The volume and rate at which change
moves through your development to operations pipeline.
• How long does it take to get a release from development,
through testing, and into production?
Source: http://dev2ops.org/blog/2010/1/21/how-to-measure-the-impact-of-it-operations-on-your-business.html
• How much of that is actual testing time, deployment time, handoff
time, or waiting?
• How many releases can you successfully deploy per period?
• How many successful individual change requests can your operations
team handle per period?
• Are any build and deployment activities the rate limiting step of your
application lifecycle? How does that limit impact your business?
• How many simultaneous changes can your team safely handle?
• What is business' perceived “wait time” from code
completion to production deployment of a feature?
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 92
93. Devop Measurement
• Agility - This looks at how quickly and efficiently your IT
operations can react to changes in the needs of your
business.
• How quickly can you scale up or scale down capacity to meet
Source: http://dev2ops.org/blog/2010/1/21/how-to-measure-the-impact-of-it-operations-on-your-business.html
changing business demands?
• What’s the change management overhead associated
increasing/decreasing capacity? What’s the risk?
• How quickly and what would it cost to adapt your build and
deployment systems to automate any new applications or
acquired business lines?
• What would it cost you to handle a x% growth in the number of
applications or business lines (direct resource assignment plus any
attention drain from other staff)?
• Could your IT operations handle a x% growth in number of
applications or business lines? (i.e. could it even be done?)
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 93
94. Architecture Concepts related to Devops
• Devops is related to several technology
architecture and guidelines:
• Build an application “as stateless as” and “as shared
nothing as” possible
• Try to have as least “technical debt” as possible (bugs
that are on production, patches that are not installed,
unsupported swhw, etc.)
• Build an application with the ability to “turn off” some
of its functionality while on air
• Expending transaction versions vs. modifying or
updating transaction (enables roll back and working
concurrently in several versions)
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 94
95. Devops tools:
Soruce: http://doc36.controltier.org/wiki/File:ProvisioningToolchain.png
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 95
96. Devops vs. Private Cloud?
• In many aspects the objectives of Devops and Private Cloud
are overlapping
• Automation is at the core of both Private Cloud and Devops
Source: http://www.pistoncloud.com/2012/01/devops-and-private-cloud-sitting-in-a-tree/
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 96
97. Some input from last’s year presentation
• Public cloud
Source: IDC https://www.eiseverywhere.com/file_uploads/7e2edb16ed28a2123cd21508f87be8b2_ITR_Boston_2011_Public_and_Private_Cloud_Track_RickVillars_IDC.pdf
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 97
98. Summary – Major paradigm shifts
• Remember Digital Equipment
Corporation (DEC). “Underdogs
become mainstream faster then we
think”. Change is crucial
• Embrace big data experiments
• Embrace Devops concepts – metrics,
process and tools. Start with metrics
• Devops tools might be our current Technologies
configuration, CMDB, tools. Processes
• Embrace at least one SAAS application Standardization
now (Email, Service desk, HR, ERP,
CRM, etc.). Also IAAS, PAAS.
• Standardization with processes.
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 98
99. STKI Round Tables
• Lots of useful information – use it !
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 99
100. STKI Round Tables
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 100
101. We will present data on products and vendors:
1. Israeli vendors rating – state of the current market focused on the
enterprise market (not SMB)
X – Market penetration (sales + installed base+ clients
perspective)
Y – is X plus localization, support, development center, number
and kind of integrators, etc.
Worldwide leaders marked, based on global positioning
Vendors to watch: Are only just entering Israeli market or
making a big change so can’t be positioned but should be
watched
Represents the current Israeli market and not necessarily what we
recommend to our clients
2. Products and selected resellers / implementers
The location within the list is random
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 101
102. We will present data on products and vendors (cont.)
3. Selected installations of products – projects in different stages ,
production,implementation, after decision…
4. Service providers that are used by users . I asked users – “which
SI do you use in this category” and counted the result.
5. Analysis by international and Israeli analysts
This complete information (1 to 5) should be used together,
combined with the specific circumstances of each case when
making a decision
This subjective chart is the result of our
objective research
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 102
103. 103
Pini Cohen’s work Copyright STKI@2012
Do not remove source or attribution from any slide or graph 103
105. Agenda
Major paradigm shifts
Development and SOA
ESM BSM CMDB
DBMS and DATA
Platforms – Servers
Clients
Storage
Source: http://astonguild.org.uk/files/NEW_MENU_FRONT_RGB%5B1%5D.jpg
Pini Cohen’s work Copyright STKI@2012
105
Do not remove source or attribution from any slide or graph
Hinweis der Redaktion
DeepQA generates and scores many hypotheses using an extensible collection of Natural Language Processing, Machine Learning and Reasoning Algorithms. These gather and weigh evidence over both unstructured and structured content to determine the answer with the best confidence.DeepQAgenerates and scores many hypotheses using an extensible collection of Natural Language Processing, Machine Learning and Reasoning Algorithms. Thesegather and weigh evidence over both unstructured and structured content to determine the answer with the best confidence.Watson – the computer system we developed to play Jeopardy! is based on the DeepQAsoftatearchtiecture.Here is a look at the DeepQA architecture. This is like looking inside the brain of the Watson system from about 30,000 feet high.Remember, the intended meaning of natural language is ambiguous, tacit and highly contextual. The computer needs to consider many possible meanings, attempting to find the evidence and inference paths that are most confidently supported by the data.So, the primary computational principle supported by the DeepQA architecture is to assume and pursue multiple interpretations of the question, to generate many plausible answers or hypotheses and to collect and evaluate many different competing evidence paths that might support or refute those hypotheses. Each component in the system adds assumptions about what the question might means or what the content means or what the answer might be or why it might be correct. DeepQA is implemented as an extensible architecture and was designed at the outset to support interoperability. <UIMA Mention>For this reason it was implemented using UIMA, a framework and OASIS standard for interoperable text and multi-modal analysis contributed by IBM to the open-source community.Over 100 different algorithms, implemented as UIMA components, were integrated into this architecture to build Watson.In the first step, Question and Category analysis, parsing algorithms decompose the question into its grammatical components. Other algorithms here will identify and tag specific semantic entities like names, places or dates. In particular the type of thing being asked for, if is indicated at all, will be identified. We call this the LAT or Lexical Answer Type, like this “FISH”, this “CHARACTER” or “COUNTRY”.In Query Decomposition, different assumptions are made about if and how the question might be decomposed into sub questions. The original and each identified sub part follow parallel paths through the system.In Hypothesis Generation, DeepQA does a variety of very broad searches for each of several interpretations of the question. Note that Watson, to compete on Jeopardy! is not connected to the internet.These searches are performed over a combination of unstructured data, natural language documents, and structured data, available data bases and knowledge bases fed to Watson during training.The goal of this step is to generate possible answers to the question and/or its sub parts. At this point there is very little confidence in these possible answers since little intelligence has been applied to understanding the content that might relate to the question. The focus at this point on generating a broad set of hypotheses, – or for this application what we call them “Candidate Answers”. To implement this step for Watson we integrated and advanced multiple open-source text and KB search components.After candidate generation DeepQA also performs Soft Filtering where it makes parameterized judgments about which and how many candidate answers are most likely worth investing more computation given specific constrains on time and available hardware. Based on a trained threshold for optimizing the tradeoff between accuracy and speed, Soft Filtering uses different light-weight algorithms to judge which candidates are worth gathering evidence for and which should get less attention and continue through the computation as-is. In contrast, if this were a hard-filter those candidates falling below the threshold would be eliminated from consideration entirely at this point.In Hypothesis & Evidence Scoring the candidate answers are first scored independently of any additional evidence by deeper analysis algorithms. This may for example include Typing Algorithms. These are algorithms that produce a score indicating how likely it is that a candidate answer is an instance of the Lexical Answer Type determined in the first step – for example Country, Agent, Character, City, Slogan, Book etc. Many of these algorithms may fire using different resources and techniques to come up with a score. What is the likelihood that “Washington” for example, refers to a “General” or a “Capital” or a “State” or a “Mountain” or a “Father” or a “Founder”?For each candidate answer many pieces of additional Evidence are search for. Each of these pieces of evidence are subjected to more algorithms that deeply analyze the evidentiary passages and score the likelihood that the passage supports or refutes the correctness of the candidate answer. These algorithms may consider variations in grammatical structure, word usage, and meaning.In the Synthesis step, if the question had been decomposed into sub-parts, one or more synthesis algorithms will fire. They will apply methods for inferring a coherent final answer from the constituent elements derived from the questions sub-parts.Finally, arriving at the last step, Final Merging and Ranking, are many possible answers, each paired with many pieces of evidence and each of these scored by many algorithms to produce hundreds of feature scores. All giving some evidence for the correctness of each candidate answer. Trained models are applied to weigh the relative importance of these feature scores. These models are trained with ML methods to predict, based on past performance, how best to combine all this scores to produce final, single confidence numbers for each candidate answer and to produce the final ranking of all candidates. The answer with the strongest confidence would be Watson’s final answer. And Watson would try to buzz-in provided that top answer’s confidence was above a certain threshold. ----The DeepQA system defers commitments and carries possibilities through the entire process while searching for increasing broader contextual evidence and more credible inferences to support the most likely candidate answers. All the algorithms used to interpret questions, generate candidate answers, score answers, collection evidence and score evidence are loosely coupled but work holistically by virtue of DeepQA’s pervasive machine learning infrastructure.No one component could realize its impact on end-to-end performance without being integrated and trained with the other components AND they are all evolving simultaneously. In fact what had 10% impact on some metric one day, might 1 month later, only contribute 2% to overall performance due to evolving component algorithms and interactions. This is why the system as it develops in regularly trained and retrained.DeepQA is a complex system architecture designed to extensibly deal with the challenges of natural language processing applications and to adapt to new domains of knowledge. The Jeopardy! Challenge has greatly inspired its design and implementation for the Watson system.