SlideShare ist ein Scribd-Unternehmen logo
1 von 58
The Return of Big Iron?
Ben Stopford
Distinguished Engineer
RBS Markets
Much diversity
What does this mean?
• A change in what customers (we) value
• The mainstream is not serving customers
(us) sufficiently
The Database field has problems
We Lose: Joe Hellerstein (Berkeley) 2001
“Databases are commoditised and cornered
to slow-moving, evolving, structure
intensive, applications that require schema
evolution.“ …
“The internet companies are lost and we will
remain in the doldrums of the enterprise
space.” …
“As databases are black boxes which
require a lot of coaxing to get maximum
performance”
His question was how to win
them back?
These new technologies also
caused frustration
Backlash (2009)
Not novel (dates back to the 80’s)
Physical level not the logical level (messy?)
Incompatible with tooling
Lack of integrity (referential) & ACID
MR is brute force ignoring indexing, scew
All points are reasonable
And they proved it too!
“A comparison of Approaches to Large Scale
Data Analysis” – Sigmod 2009
• Vertica vs. DBMSX vs.
Hadoop
• Vertica up to 7 x faster than
Hadoop over benchmarks
Databases faster
than Hadoop
But possibly missed the point?
Databases were traditionally
designed to keep data safe
NoSQL grew from a need to scale
It’s more than just scale, they
facilitate different practices
A Better Fit
They better match the way software is
engineered today.
– Iterative development
– Fast feedback
– Frequent releases
Is NoSQL a Disruptive Technology?
Christensen’s observation:
Market leaders are displaced when markets
shift in ways that the incumbent leaders are
not prepared for.
Aside: MongoDB
• Impressive trajectory
• Slightly crappy product (from a traditional
database standpoint)
• Most closely related to relational DB (of
the NoSQLs)
• Plays to the agile mindset
Yet the NoSQL market is relatively
small
• Currently around $600 but projected to
grow strongly
• Database and systems management
market is worth around $34billion
Key Point

There is more to NoSQL than just
scale, it sits better with the way we
build software today
We have new building blocks to
play with!
My Problem
• Sprawling application space, built over
many years, grouped into both vertical and
horizontal silos
• Duplication of effort
• Data corruption & preventative measures
• Consolidation is costly, time consuming
and technically challenging.
Traditional solutions
(in chronological order)
– Messaging
– SOA
– Enterprise Data Warehouse
– Data virtualisation
Bringing data, applications, people
together is hard
A popular choice is an EDW
EDW pattern is workable, but tough
– As soon as you take a ‘view’ on what the
shape of the data is, it becomes harder to
change.
• Leave ‘taking a view” to the last responsible
moment

– Multifaceted: Shape, diversity of source,
diversity of population, temporal change
Harder to do iteratively
Is this the only way?
The Google Approach
MapReduce
Google Filesystem
BigTable
Tenzing
Megastore
F1
Dremel
Spanner
And just one code base!
So no enterprise schema secret
society!
The Ebay Approach
The Partial-Schematic
Approach
Often termed Clobs & Cracking
Problems with solidifying a
schematic representation
• Risk of throwing information away, keeping
only what you think you need.
– OK if you create data
– Bad if you got data from elsewhere

• Data tends to be poly-structured in
programs and on the wire
• Early-binding slows down development
But schemas are good
• They guarantee a contract
• That contract spans the whole dataset
– Similar to static typing in programming
languages.
Compromise positions
• Query schema can be a subset of data
schema.
• Use schemaless databases to capture
diversity early and evolve it as you build.
Common solutions today use
multiple technologies
M Re u
ap d ce

D a
at
W ho se
are u

?
Ke Vl u
y ae
St o
re

In- M mry/
eo
O
LTP D ba
ata se
We use an late-bound schema,
sitting over a schemaless store
S
tructured
S
tandardisation
Layer
Raw Data

Late Bound
Schema
Evolutionary Approach
• Late-binding makes consolidation
incremental
– Schematic representation delivered at the ‘last
responsible moment’ (schema on demand)
– A trade in this model has 4 mandatory nodes. A
fully modeled trade has around 800.

• The system of record is raw data, not our
‘view’ of it
• No schema migration! But this comes at a
price.
Scaling
Key based access always scales
Client
But queries (without the sharding key)
always broadcast
Client
As query complexity increases so does
the overhead
Client
Course grained shards
Client
Data Replicas provide hardware isolation
Client
Scaling
• Key based sharding is only sufficient very
simple workloads
• Course grained shards help (but suffer
from skew)
• Replication provides useful, if expensive,
hardware isolation
• Workload management is less useful in
my experience
Weak consistency forces the
problem onto the developer
Particularly bad for banks!
Scaling two phase commit is hard to
do efficiently
• Requires distributed lock/clock/counter
• Requires synchronisation of all readers &
writers
Alternatives to traditional 2PC
• MVCC over explicit locking
• Timestamp based strong consistency
– E.g. Granola

• Optimistic concurrency control
– Leverage short running transactions (avoid
cross-network transactions)
– Tolerate different temporal viewpoints to
reduce synchronization costs.
Immutable Data
•
•
•
•
•

Safety
‘As was’ view
Sits well with MVCC
Efficiency problems
Gaining popularity (e.g. Datomic)
Use joins to avoid ‘over aggregating’

Joins are ok, so long as they are
– Local
– via a unique key

Trade
r

Party
Trade
Memory/Disk Tradeoff
• Memory only (possibly overplayed)
• Pinned indexes (generally good idea if you
can afford the RAM)
• Disk resident (best general purpose
solution and for very large datasets)
Balance flexibility and complexity
Operational
(real time / MR)

Object/S
QL
S
tandardisation

Raw Data

Relational
Analytics
Supple at the front, more rigid at the back

Raw Access

Operational Access

Analytic Access

D

Looser

Tighter

L
M

Untyped

Object/S
QL

Reporting

Broad Data Coverage

Narrow Data Coverage

Narrow Query

Comprehensive Quer y
Principals
•
•
•
•

Record everything
Grow a schema, don’t do it upfront
Avoid using a ‘view’ as your system of record.
Differentiate between sourced data (out of
your control) and generated data (in your
control).
• Use automated replication (for isolation) as
well as sharding (for scale)
• Leverage asynchronicity to reduce
transaction overheads
Consolidation
means more trust,
less impedance
mismatches and
managing tighter
couplings
Target architectures are starting to
look more like large applications
of cloud enabled services than
heterogeneous application
conglomerates
Are we going back to the mainframe?
Thanks

http://www.benstopford.com

Weitere ähnliche Inhalte

Was ist angesagt?

Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
Ben Stopford
 

Was ist angesagt? (20)

Lessons from lhc
Lessons from lhcLessons from lhc
Lessons from lhc
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
 
Performance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data WarehousePerformance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data Warehouse
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower Cost
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower CostHow Consistent Data Services Deliver Simplicity, Compatibility, And Lower Cost
How Consistent Data Services Deliver Simplicity, Compatibility, And Lower Cost
 
Data Warehouse in Cloud
Data Warehouse in CloudData Warehouse in Cloud
Data Warehouse in Cloud
 
Distributed RDBMS: Challenges, Solutions & Trade-offs
Distributed RDBMS: Challenges, Solutions & Trade-offsDistributed RDBMS: Challenges, Solutions & Trade-offs
Distributed RDBMS: Challenges, Solutions & Trade-offs
 
Sql vs nosql
Sql vs nosqlSql vs nosql
Sql vs nosql
 
Disaster Recovery Site Implementation with MySQL
Disaster Recovery Site Implementation with MySQLDisaster Recovery Site Implementation with MySQL
Disaster Recovery Site Implementation with MySQL
 
SQL Server Disaster Recovery Implementation
SQL Server Disaster Recovery ImplementationSQL Server Disaster Recovery Implementation
SQL Server Disaster Recovery Implementation
 
Building a Digital Bank
Building a Digital BankBuilding a Digital Bank
Building a Digital Bank
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
O'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data LakeO'Reilly ebook: Operationalizing the Data Lake
O'Reilly ebook: Operationalizing the Data Lake
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
 
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
Rdbms vs. no sql
Rdbms vs. no sqlRdbms vs. no sql
Rdbms vs. no sql
 
Data vault what's Next: Part 2
Data vault what's Next: Part 2Data vault what's Next: Part 2
Data vault what's Next: Part 2
 
How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?
 
Big Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data ModelingBig Challenges in Data Modeling: NoSQL and Data Modeling
Big Challenges in Data Modeling: NoSQL and Data Modeling
 

Andere mochten auch

A little bit of clojure
A little bit of clojureA little bit of clojure
A little bit of clojure
Ben Stopford
 
User Focused Security at Netflix: Stethoscope
User Focused Security at Netflix: StethoscopeUser Focused Security at Netflix: Stethoscope
User Focused Security at Netflix: Stethoscope
Jesse Kriss
 
forward and backward chaining
forward and backward chainingforward and backward chaining
forward and backward chaining
Rado Sianipar
 
Stay productive while slicing up the monolith
Stay productive while slicing up the monolith Stay productive while slicing up the monolith
Stay productive while slicing up the monolith
Markus Eisele
 

Andere mochten auch (20)

JAX London Slides
JAX London SlidesJAX London Slides
JAX London Slides
 
Streaming, Database & Distributed Systems Bridging the Divide
Streaming, Database & Distributed Systems Bridging the DivideStreaming, Database & Distributed Systems Bridging the Divide
Streaming, Database & Distributed Systems Bridging the Divide
 
Microservices for a Streaming World
Microservices for a Streaming WorldMicroservices for a Streaming World
Microservices for a Streaming World
 
Data Pipelines with Apache Kafka
Data Pipelines with Apache KafkaData Pipelines with Apache Kafka
Data Pipelines with Apache Kafka
 
A little bit of clojure
A little bit of clojureA little bit of clojure
A little bit of clojure
 
The Power of the Log
The Power of the LogThe Power of the Log
The Power of the Log
 
Linux Performance Tools
Linux Performance ToolsLinux Performance Tools
Linux Performance Tools
 
Ideas for Distributing Skills Across a Continental Divide
Ideas for Distributing Skills Across a Continental DivideIdeas for Distributing Skills Across a Continental Divide
Ideas for Distributing Skills Across a Continental Divide
 
Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?Test-Oriented Languages: Is it time for a new era?
Test-Oriented Languages: Is it time for a new era?
 
Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?Refactoring tested code - has mocking gone wrong?
Refactoring tested code - has mocking gone wrong?
 
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear ScalabilityBeyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
Beyond The Data Grid: Coherence, Normalisation, Joins and Linear Scalability
 
Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011Coherence Implementation Patterns - Sig Nov 2011
Coherence Implementation Patterns - Sig Nov 2011
 
Top Mistakes When Writing Reactive Applications - Scala by the Bay 2016
Top Mistakes When Writing Reactive Applications - Scala by the Bay 2016Top Mistakes When Writing Reactive Applications - Scala by the Bay 2016
Top Mistakes When Writing Reactive Applications - Scala by the Bay 2016
 
MessageBus vs MessageBus
MessageBus vs MessageBusMessageBus vs MessageBus
MessageBus vs MessageBus
 
Time Manager Workshop at #QGIS2015 Conference in Nodebo
Time Manager Workshop at #QGIS2015 Conference in NodeboTime Manager Workshop at #QGIS2015 Conference in Nodebo
Time Manager Workshop at #QGIS2015 Conference in Nodebo
 
User Focused Security at Netflix: Stethoscope
User Focused Security at Netflix: StethoscopeUser Focused Security at Netflix: Stethoscope
User Focused Security at Netflix: Stethoscope
 
forward and backward chaining
forward and backward chainingforward and backward chaining
forward and backward chaining
 
Taking the friction out of microservice frameworks with Lagom
Taking the friction out of microservice frameworks with LagomTaking the friction out of microservice frameworks with Lagom
Taking the friction out of microservice frameworks with Lagom
 
Stay productive while slicing up the monolith
Stay productive while slicing up the monolith Stay productive while slicing up the monolith
Stay productive while slicing up the monolith
 
Modernizing Applications with Microservices
Modernizing Applications with MicroservicesModernizing Applications with Microservices
Modernizing Applications with Microservices
 

Ähnlich wie Big iron 2 (published)

NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
Adi Challa
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
Igor Moochnick
 
JasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max Schireson
MongoDB
 

Ähnlich wie Big iron 2 (published) (20)

SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
What ya gonna do?
What ya gonna do?What ya gonna do?
What ya gonna do?
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
One Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database RevolutionOne Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolution
 
NoSQLDatabases
NoSQLDatabasesNoSQLDatabases
NoSQLDatabases
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
 
Building FoundationDB
Building FoundationDBBuilding FoundationDB
Building FoundationDB
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory Webcast
 
JasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max SchiresonJasperWorld 2012: Reinventing Data Management by Max Schireson
JasperWorld 2012: Reinventing Data Management by Max Schireson
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Destroying Data Silos
Destroying Data SilosDestroying Data Silos
Destroying Data Silos
 
SpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud ComputingSpringPeople - Introduction to Cloud Computing
SpringPeople - Introduction to Cloud Computing
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
 
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,..."Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
"Navigating the Database Universe" by Dr. Michael Stonebraker and Scott Jarr,...
 

Mehr von Ben Stopford

NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
Ben Stopford
 
Advanced databases ben stopford
Advanced databases   ben stopfordAdvanced databases   ben stopford
Advanced databases ben stopford
Ben Stopford
 

Mehr von Ben Stopford (18)

10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka10 Principals for Effective Event-Driven Microservices with Apache Kafka
10 Principals for Effective Event-Driven Microservices with Apache Kafka
 
10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices10 Principals for Effective Event Driven Microservices
10 Principals for Effective Event Driven Microservices
 
The Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and ServerlessThe Future of Streaming: Global Apps, Event Stores and Serverless
The Future of Streaming: Global Apps, Event Stores and Serverless
 
A Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices GenerationA Global Source of Truth for the Microservices Generation
A Global Source of Truth for the Microservices Generation
 
Building Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka StreamsBuilding Event Driven Services with Kafka Streams
Building Event Driven Services with Kafka Streams
 
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017  - The Data Dichotomy- Rethinking Data and Services with StreamsNDC London 2017  - The Data Dichotomy- Rethinking Data and Services with Streams
NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams
 
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
 
Building Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful StreamsBuilding Event Driven Services with Stateful Streams
Building Event Driven Services with Stateful Streams
 
Devoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful StreamsDevoxx London 2017 - Rethinking Services With Stateful Streams
Devoxx London 2017 - Rethinking Services With Stateful Streams
 
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2:  Building Event-Driven Services with Apache KafkaEvent Driven Services Part 2:  Building Event-Driven Services with Apache Kafka
Event Driven Services Part 2: Building Event-Driven Services with Apache Kafka
 
Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy Event Driven Services Part 1: The Data Dichotomy
Event Driven Services Part 1: The Data Dichotomy
 
Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...Event Driven Services Part 3: Putting the Micro into Microservices with State...
Event Driven Services Part 3: Putting the Micro into Microservices with State...
 
Strata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data DichotomyStrata Software Architecture NY: The Data Dichotomy
Strata Software Architecture NY: The Data Dichotomy
 
Advanced databases ben stopford
Advanced databases   ben stopfordAdvanced databases   ben stopford
Advanced databases ben stopford
 
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
A Paradigm Shift: The Increasing Dominance of Memory-Oriented Solutions for H...
 
Balancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java DatabaseBalancing Replication and Partitioning in a Distributed Java Database
Balancing Replication and Partitioning in a Distributed Java Database
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
Architecting for Change: An Agile Approach
Architecting for Change: An Agile ApproachArchitecting for Change: An Agile Approach
Architecting for Change: An Agile Approach
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Big iron 2 (published)

  • 1. The Return of Big Iron? Ben Stopford Distinguished Engineer RBS Markets
  • 3. What does this mean? • A change in what customers (we) value • The mainstream is not serving customers (us) sufficiently
  • 4. The Database field has problems
  • 5. We Lose: Joe Hellerstein (Berkeley) 2001 “Databases are commoditised and cornered to slow-moving, evolving, structure intensive, applications that require schema evolution.“ … “The internet companies are lost and we will remain in the doldrums of the enterprise space.” … “As databases are black boxes which require a lot of coaxing to get maximum performance”
  • 6. His question was how to win them back?
  • 7. These new technologies also caused frustration
  • 8. Backlash (2009) Not novel (dates back to the 80’s) Physical level not the logical level (messy?) Incompatible with tooling Lack of integrity (referential) & ACID MR is brute force ignoring indexing, scew
  • 9. All points are reasonable
  • 10. And they proved it too! “A comparison of Approaches to Large Scale Data Analysis” – Sigmod 2009 • Vertica vs. DBMSX vs. Hadoop • Vertica up to 7 x faster than Hadoop over benchmarks Databases faster than Hadoop
  • 11. But possibly missed the point?
  • 13. NoSQL grew from a need to scale
  • 14.
  • 15. It’s more than just scale, they facilitate different practices
  • 16. A Better Fit They better match the way software is engineered today. – Iterative development – Fast feedback – Frequent releases
  • 17. Is NoSQL a Disruptive Technology? Christensen’s observation: Market leaders are displaced when markets shift in ways that the incumbent leaders are not prepared for.
  • 18. Aside: MongoDB • Impressive trajectory • Slightly crappy product (from a traditional database standpoint) • Most closely related to relational DB (of the NoSQLs) • Plays to the agile mindset
  • 19. Yet the NoSQL market is relatively small • Currently around $600 but projected to grow strongly • Database and systems management market is worth around $34billion
  • 20. Key Point There is more to NoSQL than just scale, it sits better with the way we build software today
  • 21. We have new building blocks to play with!
  • 22. My Problem • Sprawling application space, built over many years, grouped into both vertical and horizontal silos • Duplication of effort • Data corruption & preventative measures • Consolidation is costly, time consuming and technically challenging.
  • 23. Traditional solutions (in chronological order) – Messaging – SOA – Enterprise Data Warehouse – Data virtualisation
  • 24. Bringing data, applications, people together is hard
  • 25. A popular choice is an EDW
  • 26. EDW pattern is workable, but tough – As soon as you take a ‘view’ on what the shape of the data is, it becomes harder to change. • Leave ‘taking a view” to the last responsible moment – Multifaceted: Shape, diversity of source, diversity of population, temporal change
  • 27. Harder to do iteratively
  • 28. Is this the only way?
  • 29. The Google Approach MapReduce Google Filesystem BigTable Tenzing Megastore F1 Dremel Spanner
  • 30. And just one code base! So no enterprise schema secret society!
  • 33. Problems with solidifying a schematic representation • Risk of throwing information away, keeping only what you think you need. – OK if you create data – Bad if you got data from elsewhere • Data tends to be poly-structured in programs and on the wire • Early-binding slows down development
  • 34. But schemas are good • They guarantee a contract • That contract spans the whole dataset – Similar to static typing in programming languages.
  • 35. Compromise positions • Query schema can be a subset of data schema. • Use schemaless databases to capture diversity early and evolve it as you build.
  • 36. Common solutions today use multiple technologies M Re u ap d ce D a at W ho se are u ? Ke Vl u y ae St o re In- M mry/ eo O LTP D ba ata se
  • 37. We use an late-bound schema, sitting over a schemaless store S tructured S tandardisation Layer Raw Data Late Bound Schema
  • 38. Evolutionary Approach • Late-binding makes consolidation incremental – Schematic representation delivered at the ‘last responsible moment’ (schema on demand) – A trade in this model has 4 mandatory nodes. A fully modeled trade has around 800. • The system of record is raw data, not our ‘view’ of it • No schema migration! But this comes at a price.
  • 40. Key based access always scales Client
  • 41. But queries (without the sharding key) always broadcast Client
  • 42. As query complexity increases so does the overhead Client
  • 44. Data Replicas provide hardware isolation Client
  • 45. Scaling • Key based sharding is only sufficient very simple workloads • Course grained shards help (but suffer from skew) • Replication provides useful, if expensive, hardware isolation • Workload management is less useful in my experience
  • 46. Weak consistency forces the problem onto the developer Particularly bad for banks!
  • 47. Scaling two phase commit is hard to do efficiently • Requires distributed lock/clock/counter • Requires synchronisation of all readers & writers
  • 48. Alternatives to traditional 2PC • MVCC over explicit locking • Timestamp based strong consistency – E.g. Granola • Optimistic concurrency control – Leverage short running transactions (avoid cross-network transactions) – Tolerate different temporal viewpoints to reduce synchronization costs.
  • 49. Immutable Data • • • • • Safety ‘As was’ view Sits well with MVCC Efficiency problems Gaining popularity (e.g. Datomic)
  • 50. Use joins to avoid ‘over aggregating’ Joins are ok, so long as they are – Local – via a unique key Trade r Party Trade
  • 51. Memory/Disk Tradeoff • Memory only (possibly overplayed) • Pinned indexes (generally good idea if you can afford the RAM) • Disk resident (best general purpose solution and for very large datasets)
  • 52. Balance flexibility and complexity Operational (real time / MR) Object/S QL S tandardisation Raw Data Relational Analytics
  • 53. Supple at the front, more rigid at the back Raw Access Operational Access Analytic Access D Looser Tighter L M Untyped Object/S QL Reporting Broad Data Coverage Narrow Data Coverage Narrow Query Comprehensive Quer y
  • 54. Principals • • • • Record everything Grow a schema, don’t do it upfront Avoid using a ‘view’ as your system of record. Differentiate between sourced data (out of your control) and generated data (in your control). • Use automated replication (for isolation) as well as sharding (for scale) • Leverage asynchronicity to reduce transaction overheads
  • 55. Consolidation means more trust, less impedance mismatches and managing tighter couplings
  • 56. Target architectures are starting to look more like large applications of cloud enabled services than heterogeneous application conglomerates
  • 57. Are we going back to the mainframe?

Hinweis der Redaktion

  1. Think about the systems you built five or ten years ago. Who was involved in the building of a new system in the early 2000s? Who used a relational DB? Who seriously considered using anything else?
  2. Retrospective
  3. (no schema or high level languages)
  4. Companies that grew up around technology.