Creating a Single View: Data Design and Loading Strategies

Enterprise Architect, MongoDB
Buzz Moschetti
buzz.moschetti@mongodb.com
#ConferenceHashTag
Creating a Single View Part 2:
Data Design & Loading
Strategies

Who Is Talking To You?
• Yes, I use “Buzz” on my business cards
• Former Investment Bank Chief Architect at
JPMorganChase and Bear Stearns before that
• Over 27 years of designing and building systems
• Big and small
• Super-specialized to broadly useful in any vertical
• “Traditional” to completely disruptive
• Advocate of language leverage and strong factoring
• Inventor of perl DBI/DBD
• Still programming – using emacs, of course

What Is He Going To Talk About?
Historic Challenges
New Strategy for Success
Technical examples and tips
Overview &
Data Analysis
Data Design &
Loading
Strategies
Securing Your
Deployment
ç
Ω
Creating A Single View
Part
1
Part
2
Part
3

It’s 2014: Why is this still hard to
do?
• Business / Technical / Information Challenges
• Missteps in evolution of data transfer technology
A X

We wish this “just worked”
A
Query objects from A
with great performance
Query objects from B
with great performance
X
Query objects from
merged A and B with
great performance
B

…but Beware The Blue Arrow!
A X
• Extracting many tables into many files
• Some tables require more than one file to capture representation
• Encoding/formatting clever tricks
• Reconciliation
• Different extracts for different consumers
• Different extracts for different versions of data to same consumer

Loss of fidelity exposed
class Product {
String productName;
List<Features> ff;
Date introDate;
List<Date>
versDates;
int[] unitBundles;
//…
}
widget1,,3,,good texture,retains value,,,20142304,102.3,201401
widget2,XS,6,,,,not fragile,,,20132304,73,87653
widget3,XT,,,4,,dense,shiny,mysterious,,,19990304,73,87653,,
widget4,,,3,4,,,,,,20040101,,999999,,
AORM

What happened to XML?
class Product {
String productName;
List<Features> ff;
Date introDate;
List<Date>
versDates;
int[] unitBundles;
//…
}
<product>
<name>widget1</name>
<features>
<feature>
<text>good texture</text>
<type>A</type>
</feature>
</features>
<introDate>20140204</introDate>
<versDates>
<versDate>20100103</versDate>
</versDates>
<unitBundles>1,3,9</unitBun…
ç
Ω

XML: Created More Issues Than
Solved
<product>
<name>widget1</name>
<features>
<feature>
<text>good texture</text>
<type>A</type>
</feature>
</features>
<introDate>20140204</introDate>
<versDates>
</versDates>
<unitBundles>1,3,9</unitBun…
• No native handling of
arrays
• Attribute vs. nested tag
rules/conventions widely
variable
• Generic parsing (DOM)
yields a tree of Nodes of
Strings – not very friendly
• SAX is fast but too low
level

… and it eventually became this
<p name=“widget1” ftxt1=“good texture” ftyp1=“A” idt=“20140203” …
<p name=“widget2” ftxt1=“not fragile” ftyp1=“A” idt=“20110117” …
<p name=“widget3” ftxt1=“dense” idt=“20140203” …
<p name=“widget4” idt=“20140203” versD=“20130403,20130104,20100605” …
• Short, cryptic, conflated tag names
• Everything is a string attribute
• Mix of flattened arrays and delimited strings
• Irony: org.xml.sax.Attributes easier to deal with than rest of
DOM

Schema Change Challenges:
Multiplied & Concentrated!
X
Alter table(s)
split() more data
A
Alter table(s)
Extract more data
LOE = x1
Alter table(s)
split() more data
Alter table(s)
split() more data
B
Alter table(s)
Extract more
data
LOE = x2
C
Alter table(s)
Extract more
data
LOE = x3
LOE = xn
1
n
å + f (n)
where f() is nonlinear wrt n

SLAs & Security: Tough to
Combine
A
B
User 1 entitled to see X
User 2 entitled to see Y
User 1 entitled to see Z
User 2 entitled to see V
X
Entitlements managed per-
system/per-application here….
…are lost in the
low-fidelity transfer
of data….
…and have to be
reconstituted here
…somehow…

Solving The Problem with
mongoDB

Overall Strategy For Success
• Let the source systems entities drive the
data design, not the physical database
• Capture data in full fidelity
• Perform cross-ref and additional logic at the
single point of view, not in transit

Don’t forget the power of the API
class Product {
String productName;
List<Features> ff;
Date introDate;
List<Date> versDates;
int[] unitBundles;
//…
}
If you can, avoid files altogether!
Haskell
ç
Ω

But if you are creating files: emit
JSON
class Product {
String productName;
List<Features> ff;
Date introDate;
List<Date> versDates;
int[] unitBundles;
//…
}
{
“name”: “widget1”,
“features”: [
{ “text”: “good texture”,
“type”: “A” }
],
“introDate”: “20140204”,
“versDates”: [
“20100103”, “20100601”
],
“unitBundles”: [1,3,7,9]
// …
}
ç
Ω

Let The Feeding System Express
itself
A
B
C
{ “name”: “widget1”,
“features”: [
“type”: “A” }
]
}
{ “myColors”: [“red”,”blue”],
“myFloats”: [ 3.14159, 2.71828 ],
“nest”: { “as”: { “deep”: true }}}
}
{ “myBlob”: { “$binary”: “aGVsbG8K”},
“myDate”: { “$date”: “20130405” }
}

What if you forgot something?
{
“features”: [
“type”: “A” }
],
“introDate”: “20140204”,
“versDates”: [
“20100103”, “20100601”
],
“versMinorNum”: [1,3,7,9]
// …
}
{
“features”: [
“type”: “A” }
],
“coverage”: [ “NY”, “NJ” ],
“introDate”: “20140204”,
“versDates”: [
“20100103”, “20100601”
],
“versMinorNum”: [1,3,7,9]
// …
}
ç
Ω

The Joy (and value) of mongoDB
A
Alter table(s)
Extract more
data
LOE = .25x1
B
Alter table(s)
Extract more data
LOE = .25x2
C
Alter table(s)
Extract more data
LOE = .25x3
LOE =O(1)

Helpful Hint: Use the APIs
curs.execute("select A.did, A.fullname, B.number from contact A
left outer join phones B on A.did = B.did order by A.did")
for q in curs.fetchall():
if q[0] != lastDID:
if lastDID != None:
coll.insert(contact)
contact = { "did": q[0], "name": q[1]}
lastDID = q[0]
if q[2] is not None:
if 'phones' not in contact:
contact['phones'] = []
contact['phones'].append({"number”:q[2]})
if lastDID != None:
coll.insert(contact)
{
"did" : ”D159308",
"phones" : [
{"number”: "1-666-444-3333”},
{"number”: "1-999-444-3333”},
{"number”: "1-999-444-9999”}
],
"name" : ”Buzz"
}
ç
Ω

Helpful Hint: Declare Types
Use mongoDB conventions for dates and binary data:
{“dateA”: {“$date”:“2014-05-16T09:42:57.112-0000”}}
{“dateB”: {“$date”:1400617865438}}
{“someBlob”: { "$binary" : "YmxhIGJsYSBibGE=",
"$type" : "00" }

Helpful Hint: Keep the file flexible
Use CR-delimited JSON:
{ “name”: “buzz”, “locale”: “NY”}
{ “name”: “steve”, “locale”: “UK”}
{ “name”: “john”, “locale”: “NY”}
…instead of a giant array:
records = [
{ “name”: “buzz”, “locale”: “NY”},
{ “name”: “steve”, “locale”: “UK”},
{ “name”: “john”, “locale”: “NY”},
]

Helpful Hint: A quick sidebar on jq
$ cat myData
{ "name": "dave", “type”: “mobile”, "phones": [ { "type":
"mobile", "number": "2123455634", "dnc": false }, {
"type": "mobile", "number": "6173455634" }, { "type":
"land", "number": "2023455634" } ] }
{ "name": "bob", “type”: “WFH”, "phones": [ { "type":
”land", "number": "70812342342", "dnc": false }, { "type":
"land", "number": "7083455634" } ] }
(another 99,998 rows)

Helpful Hint: jq is JSON
awk/sed/grep
$ jq -c '.phones[] | select(.dnc == false and .type == “mobile” )' myData
{"dnc":false,"number":"2123455634","type":"mobile"}
{"dnc":false,"number":"70812342342","type":"mobile"}
…
$ jq [expression above] | wc –l
32433
$ gzip –c –d myData.gz | jq [expression above] | wc –l
32433
http://stedolan.github.io/jq/

Helpful Hint: Don’t be afraid of metadata
Use a version number in each document:
{ “v”: 1, “name”: “buzz”, “locale”: “NY”}
{ “v”: 1, “name”: “steve”, “locale”: “UK”}
{ “v”: 2, “name”: “john”, “region”: “NY”}
…or get fancier and use a header record:
{ “vers”: 1, “creator”: “ID”, “createDate”: …}

Helpful Hints: Use batch ID
{ “vers”: 1, “batchID”: “B213W”, “createDate”:…}

Now that we have the data…
You’re well on your way to a single view
consolidation…but first:
– Data Work
• Cross-reference important keys
• Potential scrubbing/cleansing
– Software Stack Work

You’ve Built a Great Data Asset;
leverage it!

DON’T Build This!
Giant
Glom
Of
GUI-biased
code
http://yourcompany/yourapp

Build THIS!
http://yourcompany/yourapp
Data Access Layer
Object Constructon Layer
Basic Functional Layer
Portal Functional Layer
GUI adapter Layer
Web Service Layer
Other Regular
Performance
Applications
Higher Performance
Applications
Special
Generic Applications

What Is Happening Next?
Access Control
Data Protection
Auditing
Overview &
Data Analysis
Data Design &
Loading
Strategies
ç
Ω
Creating A Single View
Part
1
Part
2
Securing Your
Deployment
Part
3

Enterprise Architect, MongoDB
Buzz Moschetti
buzz.moschetti@mongodb.com
#ConferenceHashTag
Q&A

Creating a Single View: Data Design and Loading Strategies

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Creating a Single View: Data Design and Loading Strategies

Ähnlich wie Creating a Single View: Data Design and Loading Strategies (20)

Mehr von MongoDB

Mehr von MongoDB (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Creating a Single View: Data Design and Loading Strategies

Hinweis der Redaktion