This document discusses data integrity in Java applications. It defines data integrity and explains how constraints can be enforced throughout an application's architecture. Transactions and exclusive access are discussed as ways to maintain integrity. Constraints are classified based on the type and various options for implementing constraints in distributed systems are presented. True integrity is difficult to achieve, especially in distributed systems, where compensation mechanisms may be needed.
1. On the integrity of data in Java
Applications
Lucas Jellema (AMIS)
NLJUG JFall 2013
6th November 2013, Nijkerk, The Netherlands
2. Agenda
• What is integrity?
• Enforcing data constraints
– throughout the application architecture
• Transactions
• Exclusive Access to …
• The Distributed World
3. 3
Definition of Integrity
• Truth
– Nothing but the truth
• The Only Truth
• [Degree of] success or
completeness of
actions is known
12. 13
Record (Type) level rules
• Program should be
Kids when age < 18;
and Management or
Developer when age > 18
• Using JavaScript
– when either field changes
(handle nulls)
– on submit of the entire
record
• Using Bean Validation:
custom type validator
– in either web-tier or JPA
15. 16
Validation Implementation
options & considerations
Native
Mobile Client
Native HTML 5;
JavaScript
Client
(pure HTML 5 & Java
Script)
Native HTML 5;
JavaScript
Client
(JSF based HTML 5 & Java Script)
Custom;
Web Tier JSF Validator;
Bean
JavaServer Faces Validation
Custom;
Bean Validation
RESTful Services
POJO
Domain
Model
Business Tier
JPA
RDBMS
EJB
Custom;
Bean Validation
16. 17
But wait – there is more!
• More User Interfaces
• More Attendee
•
•
•
•
Instances
More Entities
& More types
of Constraints
More Users, Sessions,
and Transactions
More Nodes in
the Middle Tier Cluster
More Data Stores
18. 19
Multiple-Instances-of-Single-Entity
constraints
• Constraints that cover multiple same type objects/instances
–
–
–
–
–
–
Attendee‟s Registration Id is unique
No more than 5 conference attendees from the same company
Not more than two sessions by the same speaker
At most one session scheduled per room per slot
Only one keynote session in a slot
Sessions from up to a maximum of three tracks can be scheduled in the same room
19. 20
Inter entity constraints
• Attendees can only attend one hands-on session during the conference
• A person cannot attend another session in a slot in which the session
(s)he is speaker of is scheduled
• No more planned session attendances are allowed than the capacity of
the room in which the session is scheduled to take place
• If the room capacity is smaller than 100, then no more than 2 people from
the same company may sign up for it
• Attendees from Amsterdam cannot attend sessions in room 010
• Common challenge:
– Many data change events
can lead to constraint violation
20. 21
Event Analysis
for Inter Entity Constraint
• No more planned session attendances are allowed than the capacity of
the room in which the session is scheduled to take place
Create,
Update (session reference)
Update (room reference)
Update (capacity [decrease])
21. 22
Constraint classification
• Based on event-analysis (when can the constraint get
violated) we discern these categories of contraints
–
–
–
–
Attribute
Tuple
Entity
Inter Entity
• Each category has its own
implementation methods,
options and considerations
– E.g. Multi record instance rules cannot
meaningfully be enforced in client/web-tier
23. 24
Nous ne sommes
pas „Sans Famille‟
Mobile Client
Client
(pure HTML 5 & Java
Script)
Client
(JSF based HTML 5 & Java Script)
Web Tier
JavaServer Faces
RESTful Services
POJO
Domain
Model
Business Tier
JPA
RDBMS
EJB
24. 25
Multiple clients for
Data Source
Client
(pure HTML 5 & Java Script)
Mobile Client
Client
(JSF based HTML 5 & Java Script)
Web Tier
JavaServer Faces
RESTful Services
POJO Domain Model
EJB
Business Tier
JPA
Mobile Client
Client
(pure HTML 5 & Java
Script)
Client
(JSF based HTML 5 & Java Script)
Web Tier
JavaServer Faces
RESTful Services
POJO
Domain
Model
Business Tier
JPA
EJB
.NET
ESB
DBA/
Application
Admin
RDBMS
Batch
25. 26
Integrity Enforcement in the
Persistent Store
• All data is available
• Persistent store is the final stop: the buck stops here
– Any alternative data manipulation (channel) has to go to the persistent store
– Mobile, Batch, DBA, ESB
• Built-in (native) mechanisms
for constraint enforcement
– Productive development, proven robustness, scalable performance
– For example:
Column Type, PK/UK, FK, Check; trigger
• Transactions
• Enforcing integrity is integral part of persisting data
– Without final validation, persistent store cannot take responsibility for integrity
27. 28
Implementation Consideration
for Multiple-Entity-Instance rule
• Implementation – how and where?
–
–
–
–
–
Is the entire set of data available
Is all associated info available
Is the data set stable?
Can the constraint elegantly be implemented (natively? good framework support?)
Are all data access paths covered?
28. 29
Implementing Multi-Instance
constraint „5 max per company‟
Register New Attendee – method A
- Ensure L2 Cache is up to date in terms of
Attendees (fetch all attendees into cache)
- Inspect the collection of attendees for
same company
- Persist Attendee if collection does not hold
5 (or more)
POJO
Domain
Model
Register New Attendee – method B
- Select count of attendees in same
company from the Data Store
- Inspect the long value
- Persist Attendee if long is < 5
Business Tier
JPA
Attendees
L2 Cache
Attendees
30. Max 5 per Company – Flaws in
JPA Enforcement
• Persist does not [always] „post to database‟
– When more than one attendee is added in a transaction, prior ones are not counted
when the latter are validated
Thread 1
POJO
Domain
Model
select count
persist
select count
persist
commit
Facade
Business Tier
JPA
Attendees
31
31. 32
JPA Facade enforcement in a
multi-threaded world
Client
HTML 5 & Java Script
Session A
Client
HTML 5 & Java Script
Session B
Web Tier
Thread 1
POJO
Domain
Model
Thread 2
select count
persist
commit
select count
persist
commit
Facade
Business Tier
JPA
Attendees
32. 33
Transactions
• Logically consistent set of data manipulations
– Atomic units of work
– Succeed or fail together
– Any changes inside a transaction are invisible to other sessions/transactions until the
transaction completes (commits)
– Note: during a transaction, constraints may be violated; the only thing that matters:
commit [time]
– Transaction ends with succesful commit or rollback –
In both cases, transaction-owned locks are released
• ACID (in RDBMS)
– vs BASE (in NoSQL: soft state, eventual consistency - hopefully)
• Note: post vs. commit with RDBMS
– Post means do [all] data manipulation (insert, update, delete) but do not commit [yet]
– Only upon commit are changes persisted and published
34. 35
Fine grained locking
Transaction 1
insert …
('John','Doe',…)
Lock on
UK1_JOHN_
DOE
Transaction 2
insert …
('Jane','Doe',…)
update <JANE>
set firstname ='John'
commit
Attendees
Unique Key UK1 on
(FirstName, LastName)
35. 36
JPA Facade enforcement
Exclusive Constraint Checking
Client
HTML 5 & Java Script
Session A
Client
HTML 5 & Java Script
Session B
Web Tier
Thread 1
POJO
Domain
Model
Thread 2
take lock
select count
persist
Facade
commit
take lock…
select count
rollback
Business Tier
JPA
LockMgr
ATT_MAX
Attendees
36. 37
Distributed or Global
Transaction
• One logical unit of work - involving data manipulations in multiple
resources (global transaction composed of local transactions)
Mobile Client
Client
(pure HTML 5 & Java
Script)
Client
(JSF based HTML 5 & Java Script)
Web Tier
JavaServer Faces
RESTful Services
POJO
Domain
Model
RDBMS
EJB
Business Tier
RDBMS
JCA
JMS
ERP
37. 38
Implementation for
Distributed Transaction
• Typical approach: two-phase commit
– Each resource locks and validates – then reports OK or NOK back to the transaction
overseeer
– When all resources have indicated OK
then phase two:
all resources commit and
release locks
– When one or more resources signal
NOK, then phase two:
all resources roll back/undo
changes and release locks
• With regards to integrity:
– With a distributed transaction,
the integrity for each participant
is handled as before;
this will result in „constraint-locks‟ in multiple separate resources
38. 39
Distributed (aka global)
transaction within a JVM
• Java EE containers (and various non-EE JTA implementations) support
global (distributed) transactions within a JVM
– JTA (JSR-907) – based on X/Open XA architecture
• Key element is Transaction Monitor (the container) and Resource
Managers (JDBC, EJB, JMS, JCA)
• One non-XA resource can participate (file system, email, …) in a global
transaction:
–
–
–
–
All XA-resources perform Phase One
The non-XA resource does its thing
Upon success of the non-XA resource: others perform Phase two by comitting
Upon failure of the non-XA resource: others roll back
39. 40
Distributed transactions
across/outside containers
Step 2:
Payment
Mobile Client
Client
(pure HTML 5 & Java
Script)
Client
(JSF based HTML 5 & Java Script)
Web Tier
JavaServer Faces
RESTful Services
POJO
Domain
Model
Business Tier
JPA
RDBMS
EJB
40. 41
Distributed transactions
across/outside containers
• Transaction involving remote containers, Web Services, File System or
any stateless transaction participant
• There is no actual common, shared vehicle (like a global XA transaction)
– There is not really a coordinated two-phase commit
• Transaction consists of
– Any resource does its thing – lock, validate, commit (or rollback), report back
– If all resources report succes: great, done
– If one resource reports failure the all other resources should perform
‘compensation’ – i.e. rollback/undo effects of a committed transaction
commit
Container
Local
Enterprise
Resource
Transaction
compensate
commit
Remote/Stateless
Enterprise
Resource
Remote/Stateless
Enterprise
Resource
41. 42
Compensation
• How to implement a compensation mechanism?
• How long after the commit can compensation be requested?
• What is the state of the enterprise resource between commit and the
compensation expiry time?
• Should the invoker notify the resource that compensation is no longer
required (so the „logical locks‟/‟temporary state‟ can be updated)
– i.e. the global distributed transaction has succussfully completed
commit
compensate
Enterprise
Resource
42. RESTful “transaction” is a
distributed transaction
Client
Resource A
Resource B
Domain Model/JPA Cache
Resource C
43
43. RESTful “transaction” is a
distributed transaction
Client
Resource A
Resource B
Domain Model/JPA
Resource C
44
44. 45
Distributed
Constraints
• Constraints that involve data collections in multiple enterprise resources
Mobile Client
Client
(pure HTML 5 & JS)
Client
(JSF based HTML 5 & Java Script)
Web Tier
JavaServer Faces
RESTful Services
POJO
Domain
Model
RDBMS
Table Y
Business Tier
RDBMS
Table X
EJB
JCA
JMS
ERP
45. 46
Distributed Constraints
• Not more than three attendees (resource A) from the same company may
attend a session (resource B)
– Insert/Update Attendance requires validation – as does update of Attendee.company
Client
ESB
Client
Web Tier
Java EE
Business Tier
Client
Web Tier
MAX_3_COMP_ATT
Java EE
Business Tier
Distributed Lock
Manager
ATTENDEES
ATTENDANCES
46. 48
Java global (distributed) lock
managers
• Within JVM: SynchronousQueue
• Across JVMs: Apache ZooKeeper, HazelCast, Oracle Coherence, …
JVM
JVM
JVM
47. 49
Summary
• Which level of integrity is required?
• Change of data potentially undermines integrity
– Data change is trigger for constraint validation
• Exclusive lock on multi-record validation
– released when transaction commits
• Ensure that all data access paths are covered
– Not all data manipulations may come through the Java middle tier
• Transactions may include multiple enterprise resources
– That may not be able to participate in a distributed transaction and have to support a
compensation mechanism
• True integrity and real robustness are very hard to achieve
– Much harder than is commonly assumed
Alternative UI on top of same data serviceRequestmanipulationIncreasingly Complex constraints=> consider (also) server side validationUsing BeanValidation (1.0 JSR 303 Java EE 6, 1.1 JSR 349 Java EE 8?)Validationhooks in frameworkCustom code
Alternative UI on top of same data serviceRequestmanipulationIncreasingly Complex constraints=> consider (also) server side validationUsing BeanValidation (1.0 JSR 303 Java EE 6, 1.1 JSR 349 Java EE 8?)Validationhooks in frameworkCustom code
Alternative UI on top of same data serviceRequestmanipulationIncreasingly Complex constraints=> consider (also) server side validationUsing BeanValidation (1.0 JSR 303 Java EE 6, 1.1 JSR 349 Java EE 8?)Validationhooks in frameworkCustom code
Alternative UI on top of same data serviceRequestmanipulationIncreasingly Complex constraints=> consider (also) server side validationUsing BeanValidation (1.0 JSR 303 Java EE 6, 1.1 JSR 349 Java EE 8?)Validationhooks in frameworkCustom code
And Web Application running againstitNumeric Data Type Last Name requiredand no more than 30 charactersCheck Constraint: Age >= 6Check <18 & kids or > 18 and != kidsUK: onesession per attendee per slotUK: onekeynotesession per slot
Alternative:No more plannedsessionattendances are allowedthan the capacity of the room in which the session is scheduledto take placeViolating eventsInsert of attendance (for session) (update of attendancenotallowed, delete cannotviolateconstraint)Update of designated room (of session)Change of room capacity (of room)Let’s focus on first violating event: insert of attendanceCreate post insert trigger that calls functionthatcountsnumber of sessionattendancesandcompareswithsession’s room capacityRaiseexceptionwhennumber > capacityNote: functioncanalsobecalledfrommiddle tier afterposting dataIfafter post (statement & trigger fires) andbeforecommitsomethingsimilar is done in a second session (new attendance, validation) andsubsequentlyboth transactions commit – the end result is invalidNo more plannedsessionattendances are allowedthan the capacity of the room in which the session is scheduledto take placeIfafter post (statement & trigger fires) andbeforecommitsomethingsimilar is done in a second session (new attendance, validation) andsubsequentlyboth transactions commit – the end result is invalidLocking is required!Lock down entire Database – no change, no integrityviolation!Lock AttendancetableToo prohibitiveAlsolock Rooms andSessionstabletoprevent change of session room assignmentand room capacityMore fine grained Lock on SESSION_ATTENDANCE_ROOM_CAPACITY_CONSTRAINT_<RoomIdentifier><SessionId>For any transaction tryingtoperform a data manipulationthatpotentiallyviolates the SESSION_ATTENDANCE_ROOM_CAPACITY_CONSTRAINT for a certainsession in a certain room, validation/enforcement is required; the fine grainedlockneedstobeacquired; ifitcannotbe, somebodyelse is validatingthisrule for this room/sessioncombination. Whendone, the changes are committedand the sessioncanproceed – withallcommitted changes from the othersession
Against JPASingle thread – constraintenforcedWithout lock – twothreads – constraintby-passedTransaction part 1In Database?Using triggersSame problem!DB trick MVStatement level plus finegrainedlockDemo UK
Suppose UI does not support Update of a SessionAttendanceIf person wants to switch fromonesessiontoanother in a certain slot, (s)he has tocreate a new oneandremove the existingoneIn the right order?!DemonstrateBoth steps are DML statementsConstrainttypicallyenforced at statement levelHowever, sometimestheycanbe made deferred [to transaction commit time]
Implementationfromprevious slideDiscusssynchronization: othersession (i.e. thread) alsomanipulated the collection; does the constraintevaluation cover the complete set of data? Depends on timingTobesure: usesynchronized access To the collection – safe, but slowTosomethingelse? – safe andmuch more scalableand elegant
UK on FirstName, LastNameUpdate existing person to John Smith (S1, T1)Update same person in different session (either name or anotherattribute) => runs intolockInsert new person John Smith (S2, T2)Runs into a lock! A new Record blockedby a Lock? What is the lock on? Locking on a logical ‘semaphor’ canbedone in a very fine-grained wayFor example: {EMP_NAME_UK1, John, Smith}
Examplewith conferenceRegister as Conference Attendee & PaymentConference registrationshouldonlysucceedwhenpayment is completePaymentshouldnotbedonewhenregistrationfails on some business rule
Common with Web Services and BPELNottrivial at all!If resources are knowntoparticipate in this type of transaction, theycould have a logical state on records (staging, reserved, …) thatrequiresconfirmation (or compensation) tobecometrulyapplied or undone(WS-Transaction is anattempt)