2. TMRA 2008: OpenSpace session: Dense Topic Maps
2008-10-17
Use Case: Annotate DNA
Lifescience analyzes DNA
produces multiple gigabytes per day
for each base, following information is produced:
type: Is it A,C,G,T?
probability: How probable is that it is what we believe it is?
Current custom format: 5 bytes per DNA base
type: 1 byte
probability: 4 bytes
Xuân Baldauf <xuan--dtm--2008--tmra.de@baldauf.org> 2 of 7
3. TMRA 2008: OpenSpace session: Dense Topic Maps
2008-10-17
as Topic Maps
How to represent a DNA base which
is a Thymine base
with a probability of 99.6710%
is at offset 938457
as
XTM?
CTM?
Xuân Baldauf <xuan--dtm--2008--tmra.de@baldauf.org> 3 of 7
5. TMRA 2008: OpenSpace session: Dense Topic Maps
2008-10-17
as CTM
base938457
base_info(T,0.996710,938457).
42 bytes per DNA base bloat factor: 8.4
FYI: CTM template
def base_info($base,$basetype,$probability,$offset)
$base
offset: $offset.
type-instance(type: $basetype,instance: $base) ~ [
probability: $probability
]
end
Xuân Baldauf <xuan--dtm--2008--tmra.de@baldauf.org> 5 of 7
6. TMRA 2008: OpenSpace session: Dense Topic Maps
2008-10-17
as DTM
0x543f7f2863
0x54 = quot;Tquot;
0x3f7f2863 = 0.996710
5 bytes per DNA base bloat factor: 1.0
How is this a Topic Map?
Define a Dense Topic Map Format specification for a
particular dense format
Xuân Baldauf <xuan--dtm--2008--tmra.de@baldauf.org> 6 of 7
7. TMRA 2008: OpenSpace session: Dense Topic Maps
2008-10-17
Why?
good-to-perfect compression
allows a migration path from many custom data
formats to Topic Maps
allows a (limited) migration path to many custom
data formats from Topic Maps
Xuân Baldauf <xuan--dtm--2008--tmra.de@baldauf.org> 7 of 7