ICT role in 21st century education and its challenges
Odp
1.
2.
3.
4. Ad Hoc Data in Biology format-version : 1.0 date : 11:11:2005 14:24 auto-generated-by : DAG-Edit 1.419 rev 3 default-namespace : gene_ontology subsetdef : goslim_goa "GOA and proteome slim" [Term] id : GO:0000001 name : mitochondrion inheritance namespace : biological_process def : "The distribution of mitochondria including the mitochondrial genome into daughter cells after mitosis or meiosis mediated by interactions between mitochondria and the cytoskeleton." [PMID:10873824,PMID:11389764, SGD:mcc] is_a : GO:0048308 ! organelle inheritance is_a : GO:0048311 ! mitochondrion distribution www.geneontology.org
5. Ad Hoc Data in Finance HA 00000000 START OF TEST CYCLE aA 00000001 BXYZ U1AB0000040000100B0000004200 HL 00000002 START OF OPEN INTEREST d 00000003 FZYX G1AB0000030000300000 HM 00000004 EN D OF OPEN INTEREST HE 00000005 START OF SUMMARY f 00000006 NYZX B1QB00052000120000070000B000050000000520000 00490000005100+00000100B00000005300000052500000535000 HF 00000007 END OF SUMMARY k 00000008 LYXW B1KB0000065G0000009900100000001000020000 HB 00000009 END OF TEST CYCLE www.opradata.com
6. Ad Hoc Data from Web Server Logs (CLF) 207.136.97.49 - - [15/Oct/1997:18:46:51 -0700] " GET /tk/p.txt HTTP/1.0" 200 30 tj62.aol.com - - [16/Oct/1997:14:32:22 -0700] " POST /scpt/dd@grp.org/confirm HTTP/1.0" 200 941 234.200.68.71 - - [15/Oct/1997:18:53:33 -0700] " GET /tr/img/gift.gif HTTP/1.0” 200 409 240.142.174.15 - - [15/Oct/1997:18:39:25 -0700] " GET /tr/img/wool.gif HTTP/1.0" 404 178 188.168.121.58 - - [16/Oct/1997:12:59:35 -0700] " GET / HTTP/1.0" 200 3082 214.201.210.19 ekf - [17/Oct/1997:10:08:23 -0700] " GET /img/new.gif HTTP/1.0" 304 -
34. Base Types and Pairs 122Joe|Wright|45|95|79 n/aEd|Wood|10|47|31 124Chris|Nolan|80|93|85 Burton|30|82|71 126George|Lucas|32|62|40 Tim int * stringUntil( ‘|’ ) * char 125 |
35. Base Types and Pairs 13C Programming31Types and Programming Languages20Twenty Years of PLDI36Modern Compiler Implementation in ML 27Elements o f ML Programming 13 C Programming length : . stringFW( length ) int
36.
37.
38.
39.
40. Semantics Overview t 0100100100... Parser Representation Parse Descriptor Description [ t ] [ t ] rep [ t ] pd Interpretations of t [ {x:t | e} ] rep = [ t ] rep + [ t ] rep [ x:t.t’ ] rep = [ t ] rep * [ t’ ] rep [ {x:t | e} ] pd = hdr * [ t ] pd [ x:t.t’ ] pd = hdr * [ t ] pd * [ t’ ] pd
41. Type Correctness t 0100100100... Parser Representation Parse Descriptor Description Theorem [ t ] : data [ t ] rep * [ t ] pd [ t ] [ t ] rep [ t ] pd Interpretations of t
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
Hinweis der Redaktion
In today’s electronic age, there are incredible amounts of data stored in databases and xml. Everything from your hotel reservation, to your employment records and your credit card transactions. Because databases and xml are standardized, there is a lot of support for data in such formats: schema languages, data browswers, query langauges, books, vendor support, consultants…. Make dealing with such data easier.
Yet, unfortunately. There is a lot of important data in ad hoc formats that don’t have the tools, documentation and consultants. In general, not free text, but semi-structured. Still, not as structured as XML syntax. Also, different than the syntaxes usually employed for programming languages.
This data fragment comes from the gene ontology project. It describes gene products and gives links between products that are known to be related. Might try to answer questions like: What is a particular gene or collection of genes used for?
From the financial markets. The Options Price Reporting Authority (OPRA) provides last sale information and current options quotations. This particular example describes simple test transaction.
Record of requests sent to web server
Computer networking Hex dump of dns. Mixed ascii-binary format.
Unlike std. formats, ad hoc data presents a number of challenges. As is: no chance to influence way data is produced. Errors: not enough to filter errors, often want to examine them b/c … errors don’t invalidate surrounding data. Documentation may or may not exist. If it does exist, out of date. Hijacked fields.. Evolving spec. Seems to be a fact of life: buggy data. Machines that start outputing wrong data, human data entry, missing data How to respond to errors is very application specific. Halt, discard, repair. Errors can be the most interesting part of the data. They can signal that a machine is experiencing problems, which is very crucial for monitoring applications. Or where two machines are failing to communicate.
While different in many ways, … share basic insight - Describe data with types. Descriptions include physical layout as well as semantic constraints. Automatically generate type declarations and parsers for use in host language. PADS also produces meta-data object called parse descriptor , describing errors encountered in data. The pd mimics the structure of the data, reporting for every element on the errors encountered while parsing. Trans: summary, how relate, how build next 700?
Our work is the first attempt to find a semantics well-suited to explaining current and future data description languages.
To address these challenges, researchers have begun to develop high-level languages for describing and processing ad hoc data. PacketTypes- describe binary data associated with networking protocols DataScript - describe binary data such as JAVA jar files and ELF object files Erlang binaries - like others but for erlang. PADS is a general-purpose DDL vs. DS-DDLs. Supports assorted data encodings. Features robust error handling and tool generation.
To address these questions, We provide the first semantics of DDLs. DDC is itself a DDL. But at low-level so can explain… Formally specify the meaning of DDC with a denotational semantics.
Next, I’ll show you some of the calculus by stepping through a few examples.
But in this case, the value described by the first type can be referenced in the second.
List of bowlers: ID, name, min,max, avg score.
Ad hoc listing of books on my shelf.
Not in conventional dependent type systems.
Also have denot. Sem. Unlike conventional sem. Which have 1 interpret. We have 3. Each type is also interpreted as two types in the host language. One, the type of the rep. And two, the type of the corresponding parse descriptor.
Type-corr. is important, but not enough. PD report errors in*rep*.
Focus on encodings of PADS, DataSCript and PacketTypes.
Impl. not maintain desired invariant.
Is this slide necessary?
Either drop this or let people read it. Need better transition here. Now that we’ve seen some of the calculus, I’ll give an overview of the semantics. There are multiple components to the semantics. Note: I do not bother to explain this slide.
Give picture of parsing … first. Perhaps these slides (inl. Theorems) should be scripting a picture, highliting different parts of picture. Afterwards, give Sigma code as one concrete example. Second, we have the parsing semantics themselves, in which we express the parsing behavior of a given type T as a…
Not an arbitrary output type, but strongly corresponds to type from which the parser is generated. Abstract this and next slide.
Not an arbitrary output type, but strongly corresponds to type from which the parser is generated.