3. Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
• 1500 liver tumor samples
• Malignant (HCC) and benign (HCA) tumors
• Normal Tissue
Existing Data
• Clinical Annotations of malignant tumors (4D)
• Excel files which contains :
• Clinical Annotations of malignant & benign tumors
• Other annotations (mutations, clinical studies, etc.)
• Tissue extractions listings (concentrations / quantities)
Thomas Burguiere (INSERM Unit´e 674) CHCDB & CHCDBWEB May 5th, 2011 4 / 35
4. Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Existing Data
• Clinical Annotations of malignant tumors (4D)
• Excel files which contains :
• Clinical Annotations of malignant & benign tumors
• Other annotations (mutations, clinical studies, etc.)
• Tissue extractions listings (concentrations / quantities)
Problems
• Clinical Annotations of malignant tumours can only be accessed on
single machine
• Redundant data among di↵erent files
• Duplicated files on di↵erent machines
,! Discrepancies between di↵erent files
,! Cross-checking data between the di↵erent data source is cumbersome
Thomas Burguiere (INSERM Unit´e 674) CHCDB & CHCDBWEB May 5th, 2011 5 / 35
6. Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Principle
Relational Database : Software
• Relational Database Management System : software which contains
and organizes data (OracleTM
, MySQL, DB2TM
, SQL ServerTM
, etc.)
• Client Server architecture :
• Server software, which manages data, installed on a single machine
• Client software, which queries the server, installed on any machine used to
consult the database
Thomas Burguiere (INSERM Unit´e 674) CHCDB & CHCDBWEB May 5th, 2011 7 / 35
7. Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Principle
Client Server architecture
Thomas Burguiere (INSERM Unit´e 674) CHCDB & CHCDBWEB May 5th, 2011 8 / 35
8. Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Principle
Relational Database : Data
• The data is stored in a set of tables
• One can define a set of constraints regarding the data contained in
the tables
• The tables can be associated to one another by logical links : integrity
constraints
Thomas Burguiere (INSERM Unit´e 674) CHCDB & CHCDBWEB May 5th, 2011 9 / 35
11. Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Benefits
Relational Database : Benefits
• Data centralisation on the server side
• Constraints allow, in some instances, to avoid data inconsistencies
,! Consistent data
• E cient : tables containing millions of rows can be easily manipulated
• Querying a correctly structured database allows one to cross-check
data very rapidly*
Thomas Burguiere (INSERM Unit´e 674) CHCDB & CHCDBWEB May 5th, 2011 12 / 35
12. Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Interface
A database requires graphical interface
• Data manipulation in a database is done exclusively with queries,
written in SQL (Structured Query Language)
•
!"##$%&' !$()*!+,% -%. /0% -1%21)#"# 34526%3)(2# 789
!"!#$%& "!' ( )# &*+, - ./.
!"!)01& "!! ( 2-
!"!)#.& "!! 3 $.
• SELECT t a b l e 1 . t i s s u e I D , t a b l e 1 . TumorType ,
t a b l e 1 . Sex , t a b l e 1 . Age , t a b l e 2 . S t e a t o s i s ,
t a b l e 2 . nb adenomas , t a b l e 2 .CRP
FROM t a b l e 1 INNER JOIN t a b l e 2
ON ( t a b l e 1 . t i s s u e I D = t a b l e 2 . t i s s u e I D )
WHERE t a b l e 1 . TissueID = ’CHC358T ’ ;
• Powerful language, albeit counterintuitive
,! A graphical interface must be associated to the database
Thomas Burguiere (INSERM Unit´e 674) CHCDB & CHCDBWEB May 5th, 2011 13 / 35
13. Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Interface
Graphical interface : principle
Mecanism
1 The interface receives instruction from the user, and transform them
into SQL queries sent to the server
2 The server receives the SQL queries, and sends back results
3 The interface receives the results from the server, and displays the
results to the user
Interface types
• 2 types of interface : desktop program or web interface
• In our case, we decided to develop a web interface
Thomas Burguiere (INSERM Unit´e 674) CHCDB & CHCDBWEB May 5th, 2011 14 / 35
14. Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Interface
Web interface
• Software installed on a single machine : the web server
• Accessing the interface only requires a web browser
,! Avoids installation and maintenance issues on the client machines
,! Avoids OS compatibility issues (Mac, Windows, Linux, etc. . .)
Thomas Burguiere (INSERM Unit´e 674) CHCDB & CHCDBWEB May 5th, 2011 15 / 35
15. Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Interface
Web client / server architecture
Thomas Burguiere (INSERM Unit´e 674) CHCDB & CHCDBWEB May 5th, 2011 16 / 35
17. Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Specificities
Specific features of clinical annotation data
Specificities
• New variables are
frequently added
• Data regarding the same
variable can be input
di↵erently, depending of
sample provenance and
type (malignant or
benign tumor)
Consequences in a database
• Frequent addition of new
columns or sub-tables
• Tables contain a lot of columns,
with sparsely filled rows
,! Constant maintenance of the
database
Clinical annotation data must be stored in a specific database structure :
the E.A.V. structure
Thomas Burguiere (INSERM Unit´e 674) CHCDB & CHCDBWEB May 5th, 2011 18 / 35
18. Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
E.A.V.
Principle
• E.A.V. = Entity Attribute Value[?]
• An E.A.V. is a subset of tables in a relational database, with a
specific organization
• This data organization is particularly suitable of clinical annotation
data
• In the E.A.V., all clinical annotation data is stored in one 3-columns
table :
• Entity : contains the identifier of the entity for which an annotation is
stored (In our case, an entity is a tissue)
• Attribute : contains the identifier (e.g. the name) of the annotation variable
• Value : contains the value of the annotation, for a given entity
Thomas Burguiere (INSERM Unit´e 674) CHCDB & CHCDBWEB May 5th, 2011 19 / 35
25. Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
CHCDB
Software
• R.D.B.M.S. : MySQL
• Open source & free
• Most widely used open-source R.D.B.M.S.
,! Actively maintained
,! Lots of maintenance and development tools
• The machine hosting the R.D.B.M.S. has yet to be bought
Data
• CHCDB’s tables fall into one of three categories
• Tissue listings
• Clinical annotation data, in the E.A.V. structure
• Extraction data
Thomas Burguiere (INSERM Unit´e 674) CHCDB & CHCDBWEB May 5th, 2011 24 / 35
28. Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
A web interface
Peculiarities
• Installed on the server hosting the R.D.B.M.S.
• Can be reached from any machine on the CEPH network
Features
• Consultation and modificationof clinical annotations for a given tissue
• Listing of tissues and their annotations
• Listing of tissue extractions
• Management (add/modify/delete) of annotation variables
• Batch import of annotations
• Batch import of extraction data
Thomas Burguiere (INSERM Unit´e 674) CHCDB & CHCDBWEB May 5th, 2011 27 / 35
36. Introduction Relational Database Clinical Annotation data CHCDB CHCDBWEB Conclusions
Missing features in CHCDBWEB
• The tissue management interface is not yet complete
• The batch import interface for annotations and extraction is missing
CHCDB
• Defining a starting set of variables
• Importing existing data into CHCDB
Material
• Acquiring a configuring the machine which will host the database and
the web server
CHCDB and CHCDBWEB should enter production phase in June 2011.
Thomas Burguiere (INSERM Unit´e 674) CHCDB & CHCDBWEB May 5th, 2011 34 / 35