SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
You got schema in my JSON!
Evolving schemas in a schemaless world

!
Philipp Fehre

!
Technical Evangelist at Couchbase
From prototype to production and beyond
“Create an available, scalable search index”

–The Spec
Least downtime possible
Scale horizontally
Not the primary data source
Welcome to NoSQL Land
Data in —> Index
Query —> IDs out
module DB

STORE = {
keyOne: [1, 2],
keyTwo: [1, 3, 4]
}
!

def self.find_all keys
keys.map { |k| STORE.fetch(k.to_sym, []) }
end
end
def get_search_results keys
results = DB::find_all(keys)
results.flatten.uniq
end
!

get_search_results ["keyOne", “keyTwo"]
!

# => [1, 2, 3, 4]
“Add options for dynamic filtering.”

–The Spec
Data in —> Index + Filter Data
Query —> IDs out
How to evolve the data?
Still in the prototyping phase
Drop and rebuild!
module DB

STORE = {
keyOne: [{id:
{id:
keyTwo: [{id:
{id:
{id:
}

1,
2,
1,
3,
4,

prop:
prop:
prop:
prop:
prop:

"bar"},
"foo"}],
"bar"},
"bar"},
"bar"}]

!
def self.find_all keys, prop
keys.map { |k|
STORE.fetch(k.to_sym, []).map { |e|
e if e[:prop] == prop
}.compact
}

# => [[{:id=>1, :prop=>”bar"}], …]
end
end
1 != {id: 1, …}
Your code only understands one format
Your code should only understand one format
Implicit Schema
def get_search_results keys, prop = "bar"
results = DB::find_all(keys, prop)
results.flatten.uniq.map { |e| e[:id] }
end
!

get_search_results ["keyOne", “keyTwo"]
!

# => [1, 3, 4]
Time to go live
Time to improve
Part of the concept validation
“Don’t hit the main datastore for preview”

–The Spec
Data in —> Index + Filter Data + Cached Data
Query —> JSON out
How to evolve the data, now?
We can’t rebuild the dataset
In SQL this is done via migrations
class AddFields < ActiveRecord::Migration

def up
change_table :quotas do |t|
t.column :free, :integer
end
Quota.all do |qota|
quota.free = calculate_free(quota)
quota.save
end
end
end
Crawling all data is slow
class AddFields < ActiveRecord::Migration
def up
change_table :quotas do |t|
t.column :free, :integer
end

execute "COMMIT"
Quota.all do |qota|
quota.free = calculate_free(quota)
quota.save
end
end
end
During Migration the implicit Schema broken
Migrate data up “on the fly”
Migrate data down “on the fly”
Shorten the time until the new data is usable
Versioning the data
module DB

STORE = {
keyOne: [{id:
{id:
keyTwo: [{id:
{id:
{id:
}

1,
2,
1,
3,
4,

prop:
prop:
prop:
prop:
prop:

"bar", vsn: 2, free: 20 },
"foo", vsn: 2, free: 10 }],
"bar"},
"bar"},
"bar"}]

!
def self.find_all keys, prop
keys.map { |k|
STORE.fetch(k.to_sym, []).map { |e|
e if e[:prop] == prop
}.compact
}
end
end
def is_version2? data; data[:vsn] == 2; end
def get_free id; 20; end
def save data; data; end
!

def transfrom_to_v2 data
return data if is_version2?(data)
!

data[:vsn] = 2
data[:free] = calculate_free(data[:id])
save data
end
def get_search_results keys
results = DB::find_all(keys, "bar")
results.flatten
.map { |data| transfrom_to_v2 data }
.uniq
end
!

get_search_results ["keyOne", “keyTwo"]
!

# =>
[{:id=>1, :prop=>"bar", :vsn=>2, :free=>20},
{:id=>3, :prop=>"bar", :vsn=>2, :free=>20},
{:id=>4, :prop=>"bar", :vsn=>2, :free=>20}]
DRY it up: Deprecator
https://github.com/sideshowcoder/deprecator
class Thing
def initialize *args
args.each do |k, v|
self.instance_variable_set "@#{k}", v
end
@version = 0 unless @version
end
attr_accessor :version

!

include Deprecator::Versioning
ensure_version 2, :upgrade_to
!

def upgrade_to expected_version
# handle the version upgrade
save
end
!
def save
# save back to the store
end

end
Lessons learned
Painless data changes are important
Schemaless does not mean no Schema at all
With freedom comes more responsibility
Agile Code needs Agile Data
Couchbase
NoSQL Database
!

Any Questions?
Philipp Fehre
!
github.com/sideshowcoder
twitter.com/ischi
sideshowcoder.com
Talks to listen to
!

•

Schemalessness: http://cloud.dzone.com/articles/martinfowler-schemalessness

•

Introduction to NoSQL: http://www.youtube.com/watch?
v=qI_g07C_Q5I

Weitere ähnliche Inhalte

Was ist angesagt?

Barcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop PresentationBarcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop Presentation
Norberto Leite
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 

Was ist angesagt? (19)

Supercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
Supercharge your Analytics with ClickHouse, v.2. By Vadim TkachenkoSupercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
Supercharge your Analytics with ClickHouse, v.2. By Vadim Tkachenko
 
From Overnight to Always On @ Jfokus 2019
From Overnight to Always On @ Jfokus 2019From Overnight to Always On @ Jfokus 2019
From Overnight to Always On @ Jfokus 2019
 
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
How to build analytics for 100bn logs a month with ClickHouse. By Vadim Tkach...
 
Building an Observability platform with ClickHouse
Building an Observability platform with ClickHouseBuilding an Observability platform with ClickHouse
Building an Observability platform with ClickHouse
 
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustShipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
 
Polyglot ClickHouse -- ClickHouse SF Meetup Sept 10
Polyglot ClickHouse -- ClickHouse SF Meetup Sept 10Polyglot ClickHouse -- ClickHouse SF Meetup Sept 10
Polyglot ClickHouse -- ClickHouse SF Meetup Sept 10
 
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander ZaitsevClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
ClickHouse Analytical DBMS: Introduction and Case Studies, by Alexander Zaitsev
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab
 
AWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL QueriesAWS Athena vs. Google BigQuery for interactive SQL Queries
AWS Athena vs. Google BigQuery for interactive SQL Queries
 
Data file handling in python binary & csv files
Data file handling in python binary & csv filesData file handling in python binary & csv files
Data file handling in python binary & csv files
 
Clickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek VavrusaClickhouse at Cloudflare. By Marek Vavrusa
Clickhouse at Cloudflare. By Marek Vavrusa
 
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
 Adventures in Observability: How in-house ClickHouse deployment enabled Inst... Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
Adventures in Observability: How in-house ClickHouse deployment enabled Inst...
 
Paul Dix (Founder InfluxDB) - Organising Metrics at #DOXLON
Paul Dix (Founder InfluxDB) - Organising Metrics at #DOXLONPaul Dix (Founder InfluxDB) - Organising Metrics at #DOXLON
Paul Dix (Founder InfluxDB) - Organising Metrics at #DOXLON
 
RedisConf18 - Redis and Elasticsearch
RedisConf18 - Redis and ElasticsearchRedisConf18 - Redis and Elasticsearch
RedisConf18 - Redis and Elasticsearch
 
InfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard
InfluxDB 1.0 - Optimizing InfluxDB by Sam DillardInfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard
InfluxDB 1.0 - Optimizing InfluxDB by Sam Dillard
 
Data Warehouse on Kubernetes: lessons from Clickhouse Operator
Data Warehouse on Kubernetes: lessons from Clickhouse OperatorData Warehouse on Kubernetes: lessons from Clickhouse Operator
Data Warehouse on Kubernetes: lessons from Clickhouse Operator
 
Barcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop PresentationBarcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop Presentation
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 

Ähnlich wie You got schema in my json

AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
Paul Chao
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
MapR Technologies
 

Ähnlich wie You got schema in my json (20)

Data science at the command line
Data science at the command lineData science at the command line
Data science at the command line
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and Evaluation
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
 
Couchbas for dummies
Couchbas for dummiesCouchbas for dummies
Couchbas for dummies
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflow
 
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
Database connectivity in python
Database connectivity in pythonDatabase connectivity in python
Database connectivity in python
 
Working with Graphs _python.pptx
Working with Graphs _python.pptxWorking with Graphs _python.pptx
Working with Graphs _python.pptx
 
Mongodb intro
Mongodb introMongodb intro
Mongodb intro
 
More Data, More Problems: Evolving big data machine learning pipelines with S...
More Data, More Problems: Evolving big data machine learning pipelines with S...More Data, More Problems: Evolving big data machine learning pipelines with S...
More Data, More Problems: Evolving big data machine learning pipelines with S...
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
 
Shrug2017 arcpy data_and_you
Shrug2017 arcpy data_and_youShrug2017 arcpy data_and_you
Shrug2017 arcpy data_and_you
 
Slick: Bringing Scala’s Powerful Features to Your Database Access
Slick: Bringing Scala’s Powerful Features to Your Database Access Slick: Bringing Scala’s Powerful Features to Your Database Access
Slick: Bringing Scala’s Powerful Features to Your Database Access
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
 
Streaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQLStreaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQL
 
Be A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data PipelineBe A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data Pipeline
 

Mehr von Philipp Fehre

Node.js and couchbase Full Stack JSON - Munich NoSQL
Node.js and couchbase   Full Stack JSON - Munich NoSQLNode.js and couchbase   Full Stack JSON - Munich NoSQL
Node.js and couchbase Full Stack JSON - Munich NoSQL
Philipp Fehre
 

Mehr von Philipp Fehre (19)

node.js and native code extensions by example
node.js and native code extensions by examplenode.js and native code extensions by example
node.js and native code extensions by example
 
Jruby a Pi and a database
Jruby a Pi and a databaseJruby a Pi and a database
Jruby a Pi and a database
 
Couchbase Mobile on Android
Couchbase Mobile on AndroidCouchbase Mobile on Android
Couchbase Mobile on Android
 
From 0 to syncing
From 0 to syncingFrom 0 to syncing
From 0 to syncing
 
Node.js and couchbase Full Stack JSON - Munich NoSQL
Node.js and couchbase   Full Stack JSON - Munich NoSQLNode.js and couchbase   Full Stack JSON - Munich NoSQL
Node.js and couchbase Full Stack JSON - Munich NoSQL
 
What is new in Riak 2.0
What is new in Riak 2.0What is new in Riak 2.0
What is new in Riak 2.0
 
Ember background basics
Ember background basicsEmber background basics
Ember background basics
 
Ember learn from Riak Control
Ember learn from Riak ControlEmber learn from Riak Control
Ember learn from Riak Control
 
Testing tdd jasmine
Testing tdd jasmineTesting tdd jasmine
Testing tdd jasmine
 
Testing tdd dom
Testing tdd domTesting tdd dom
Testing tdd dom
 
Something about node basics
Something about node basicsSomething about node basics
Something about node basics
 
A little more advanced node
A little more advanced nodeA little more advanced node
A little more advanced node
 
Something about node in the realworld
Something about node in the realworldSomething about node in the realworld
Something about node in the realworld
 
Riak Intro at Munich Node.js
Riak Intro at Munich Node.jsRiak Intro at Munich Node.js
Riak Intro at Munich Node.js
 
PUT Knowledge BUCKET Brain KEY Riak
PUT Knowledge BUCKET Brain KEY RiakPUT Knowledge BUCKET Brain KEY Riak
PUT Knowledge BUCKET Brain KEY Riak
 
Campfire bot lightning talk
Campfire bot lightning talkCampfire bot lightning talk
Campfire bot lightning talk
 
Lighting fast rails with zeus
Lighting fast rails with zeusLighting fast rails with zeus
Lighting fast rails with zeus
 
JavaScript frontend testing from failure to good to great
JavaScript frontend testing from failure to good to greatJavaScript frontend testing from failure to good to great
JavaScript frontend testing from failure to good to great
 
Network with node
Network with nodeNetwork with node
Network with node
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

You got schema in my json