Oscar Engineering part 1

Our objective seemed simple at first: to
create a unified and up-to-date view
of incoming data for all users, without
misplacing a single bit.
!
It started small, just a few vendors to
connect. Maybe 5 important feeds.
!
5 became 10, then grew to 50.
There were data streams
In the beginning
Formats became more varied, as did the
properties of the feeds themselves.
!
Achieving the goal was going to require
thought, but first, let’s look at the data.
2

ASCII blobs - schemas deﬁne character
ranges and data types.
!
Example: Transaction log from a Cobol
DB. 254 tables in single feed.
!
Aggressive parsing required! Often, data
types and NULL constraints are the only
obvious clues indicating data integrity.
ASCII byte ranges
Fixed-Width
3
00226345user.ix1 00000074
2013102112110400CHANGED
00000074IT8208IV*Z IT8208
IT8208 INFO| 100000.00
100000.00 100000.00
A20050101INITLD20120720IT8208
215 91515 91414 11114 7 0 9
814 7 1 0111414 0 010 7 0
51514 01415 41515 01415 0 714
0 01015 11114 51514 41515
0151414 0 1 3 1 3 1 3 1 3 1 3
1 3 1 3 1 3 1 3 1
3SASSSASSSASSNORM
0225DEMO
AHAUSSLER@SS-HEALTHCARE.COM
18991231
NORMNORM
A00819870000271400000002C00025
89
C00272560000050000000087

Since 1989 (v2), XML as of 2005 (v3)
!
Targets clinical and admin data
interchange among hospitals
!
Core reference model has been
called an incoherent standard
(Smith & Ceusters, 2006)
Health Level Seven
HL7
4
<POLB_IN224200 ITSVersion="XML_1.0"
xmlns="urn:hl7-org:v3"
xmlns:xsi="http://www.w3.org/2001/
XMLSchema-instance">
<id root="2.16.840.1.113883.19.1122.7"
extension="CNTRL-3456"/>
<creationTime value="200202150930-0400"/>
<versionCode code="2006-05"/>
<interactionId
root="2.16.840.1.113883.1.6"
extension="POLB_IN224200"/>
<processingCode code="P"/>
<processingModeCode nullFlavor="OTH"/>
<acceptAckCode code="ER"/>
<receiver typeCode="RCV">
<device classCode="DEV"
determinerCode="INSTANCE">
<id extension="GHH LAB"
root="2.16.840.1.113883.19.1122.1"/>
<asLocatedEntity classCode="LOCE">
…
</POLB_IN224200>

Since 1971
!
Elements, segments, loops, and hierarchy - like
XML, schema is up to the designers. Unlike XML,
structure must be derived at parse time.
!
Context-sensitive grammars! Need a parse stack
to understand all EDI document types.
Electronic Data Interchange
EDI
5
ISA*00* *00*
*12*ABCCOM
*01*99999999*101127*1719*U*004
00*000003438*0*P*~GS*PO*440519
7800*999999999*20101127*1719*1
421*X*004010VICS~ST*834*0179~B
GN*00*1*20050315*110650****~RE
F*38*SAMPLE_POLICY_NUMBER~DTP*
303*D8*20080321~N1*P5*COMPAN_N
AME*FI*000000000~INS*Y*18*030*
20*A
REF*0F*SUBSCRIBER_NUMBER~NM1*I
L*1*JOHNDOE*R***34*1*0000000~P
ER*IP**HP*2138051111~N3*123
SAMPLE
RD~N4*CITY*ST*12345~DMG*D8*196
90101*F~HD*030~DTP*348*D8*2008
0101~REF*1L*INDIV_POLICY_NO~SE
*16*0179~GE*1*1421~IEA*1*00000
3438

Get a feed, write a parser, model it, store it, try to
do it right, then deploy it. 50 times! In 50 ways?
Sure, as long as we stay organized
Easy, right?
6

Every feed has its own characteristics.
!
Every project seems to want its own solution.
!
It’s so easy at ﬁrst just to implement things.
Then comes The Mess.
Insurance is hard. Let’s simplify it.
Complexity is the Enemy!
Can we identify common properties
needed by our systems, and guarantee
that those properties are satisﬁed in
reaching a fundamental goal?
7

Control latency. Integrate data as
quickly as possible.
!
And last but far from least:
!
Maintain privacy. Our jobs are often
handling user data, and they must
handle private data with great care.
To keep our data correct, we need
to ensure some fundamental
system properties
8
Respect order. You process things out
of order, you corrupt your data.
!
Break on failure. Never write bad data
or continue in an abnormal state.
!
Be idempotent. Mid-process or mid-
transaction errors are to be expected.

Judgment is required here, in
abundance. These goals are more
nebulous but of critical importance to a
growing engineering team.
Let’s not forget some
important higher-order
properties
9
Implement consistently. Avoid The
Mess. Consolidate best practices.
!
Deploy consistently. Every new deploy
style carries a constant increase in
operational complexity.

Parse and model each feed as a custom
step, but let the framework handle
common properties.
!
oscaretl: a framework for
transactional safety
10
Data streams in which data is not
independent (often the case), will
beneﬁt from being handled as
transaction logs.
!
Monotonically increasing transaction IDs
are very useful. Try to derive them if they
don’t exist naturally in the data.

Parsers and schemas are custom, but
data formatting, safe writes, and safe
execution can be factored out.
!
Common schema types can be re-used.
!
Factoring out the
common elements
11

Good start, but we need more: at this
point, a good runtime model and a job
scheduler.
What have we achieved?
12
Strict ordering. Processing will halt on missing data.
!
Idempotence. Careful state binding allows us to
resume where we left oﬀ.
!
Break on failure. Processing will halt on error.

Oscar Engineering part 1

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Oscar Engineering part 1

Ähnlich wie Oscar Engineering part 1 (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Oscar Engineering part 1