1. Stein Markup 1.1
MMaarrkkuupp
LLaanngguuaaggeess
SSGG
WW
MMLL
VVOOXX
XX
HHTT
Yaakov J. Stein
Chief Scientist
DDSS
RAD Data Communications
SSS A
legal-X
CP
DDHHTT
GG
mmaatthh
C
2. I digest, edit and produce documents
Stein Markup 1.2
WWhhaatt ddoo II ddoo??
business letters
email
meeting summaries
proposals
reports
requirement specifications
project plans
web pages
research articles
review articles
books
3. Stein Markup 1.3
WWhhaatt ddoo ootthheerrss ddoo??
Pretty much the same
US corporations produce >100 billion documents per year
90% of a modern institution’s information is in documents
>50% of typical corporation’s efforts involves documents
That’s why word processing SW
was expected to bring efficiency increases
But didn’t!
4. Stein Markup 1.4
WWoorrdd pprroocceessssiinngg??
PROs
makes nicer looking documents
expedites document sharing during creation
CONs
typically 30% of effort on format and reformat
doesn’t increase information accessibility
doesn’t facilitate information mining
5. Stein Markup 1.5
DDaattaabbaasseess??
The natural alternative to documents are databases
PROs
increase information accessibility
facilitate information mining
CONs
not human readable
format inflexible
6. Stein Markup 1.6
TThhee ssoolluuttiioonn
What we really want is to write unconstrained text
but to have information retrieval as well !
Method 1 Automatic text analysis
AI program analyzes text
Recognizes document structure, sentence syntax
Performs gisting, facilitates information mining
Complete solution equivalent to solving Turing test
Method 2 Manual markup
Document author responsible for marking
Clarifies document structure
Enables automated retrieval of selected information
Suggests presentation format
7. Stein Markup 1.7
WWhhyy iiss tteexxtt aannaallyyssiiss hhaarrdd??
The man cried FIRE !
The man cried FIRE the gun !
The man cried FIRE the gun maker !
8. AArree MMLLss ccoommppuutteerr llaanngguuaaggeess??
There are many different types of computer languages:
procedural languages
Stein Markup 1.8
for (n=0;n<10;i++)
if (n>5) printf(“markup languages are fun!n”);
graphic languages
newpath
0 0 moveto 0 1 lineto 1 1 lineto 1 0 lineto
closepath fill
database languages
SELECT book FROM biblio WHERE subject=‘DSP’ AND author=‘STEIN’ ;
logical languages
useful(DSP), useful(hardware), fun(DSP), fun(web)
interesting(X) if useful(X) and fun(X)
?-interesting(X)
9. Stein Markup 1.9
TThheeyy aarree!!
Markup languages do not directly instruct computers
like procedural languages
rather indirectly instruct computer
like logical languages
They do this by using:
elements
attributes
entities
text
<BOOK SUBJECT=“dsp”>
<TITLE FORMAT=“short”>DSP-CSP</TITLE>
<AUTHOR>J. Stein</AUTHOR>
This is a great book!
&standard-disclaimer
</BOOK>
}(tags)
10. SSoommee mmaarrkkuupp eelleemmeenntt ffuunnccttiioonnss
Structural
Stein Markup 1.10
– Clarifies document structure
– Delineates document parts
Descriptive (informative)
– Indicates
– Facilitates information retrieval
Presentational (display)
– Presents information in nice format
– Helps human readability
Referential (links, applications)
– Provide hypertext links
– Launch applications
11. Stein Markup 1.11
SSttrruuccttuurraall MMaarrkkuupp
<HEADING>September 1, 2000</HEADING>
<GREETING>Dear Prof. Stein, </GREETING>
<BODY>
I would like to tell you how much I enjoyed reading your new text
“Digital Signal Processing, A Computer Science Perspective”.
I hope we will be able to meet at the next conference.
</BODY>
<SIGNATURE>
Sincerely,
Dee Espy
</SIGNATURE>
12. Stein Markup 1.12
DDeessccrriippttiivvee MMaarrkkuupp
<DATE>September 1, 2000</DATE>
Dear <PERSON>Prof. Stein,</PERSON>
I would like to tell you how much I enjoyed reading your new text
<BOOK>
“Digital Signal Processing, A Computer Science Perspective”.
</BOOK>
I hope we will be able to meet at the next <EVENT>conference.</EVENT>
Sincerely,
<PERSON>Dee Espy</PERSON>
13. Stein Markup 1.13
PPrreesseennttaattiioonnaall MMaarrkkuupp
<RIGHT-JUSTIFY>September 1, 2000</RIGHT-JUSTIFY>
<BOLD>Dear Prof. Stein,</BOLD>
I would like to tell you how much I enjoyed reading your new text
<UNDERLINE>
“Digital Signal Processing, A Computer Science Perspective”.
</UNDERLINE>
I hope we will be able to meet at the next
<BLINK>conference.</BLINK>
Sincerely,
<IMAGE SRC=“deesignature.jpg” ALIGN=“left”>
<FONT FACE=“Times-Roman”>Dee Espy</FONT>
14. Stein Markup 1.14
RReellaattiioonnaall MMaarrkkuupp
<today xlink:form=“simple” href=“date” actuate=“auto”>
Dear Prof. Stein,
I would like to tell you how much I enjoyed reading your new text
<A HREF=“www.amazon.com/exec/obidos/ASIN/04712954”>
“Digital Signal Processing, A Computer Science Perspective”.
</A>
I hope we will be able to meet at the next
<A HREF=“conference”>conference.</A>
Sincerely,
<IMAGE SRC=“dee-signature.jpg” ALIGN=“left”>
<A HREF=“mailto:dee@dee-epsy.net”>Dee Espy</A>
15. GGeenneerraalliizzeedd MMaarrkkuupp LLaanngguuaaggee
Stein Markup 1.15
William Tunnicliffe, Stanley Rice [1960s]
(independently) invent idea of structural markup language
Problem: need different ML for each type of document
(letter, report, article, book, etc)
Charles Goldfarb, Edward Mosher, Raymond Lorie (IBM) [1973]
invent Generalized Markup Language (GML)
Solution: use metalanguage
Document Type Definition (DTD) defines tags
IBM marked up 90% of its documents with GML
16. WWiitthh GGMMLL ssttrruuccttuurree iiss eevviiddeenntt
Stein Markup 1.16
Library
Novels
Journals
Textbooks
Algebraic zoology
Botanical history
Computer poetry
DSP
DSP-CSP
DSP just for fun
Elementary QED
Title
Full: Digital Signal Processing
a Computer Science Perspective
Short: DSPCSP
Author
Name: Jonathan (Y) Stein
Association: RAD Data Comm.
Publication
Publisher: John Wiley
Year: 2000
Location: New York
ISBN: 04712954
17. SSttaannddaarrdd GGeenneerraalliizzeedd MMaarrkkuupp LLaanngguuaaggee
Problems with GML:
Stein Markup 1.17
– No validating parser
– Not portable (between computer systems)
Solution:
SGML
ANSI [1978]
ISO/IEC 8879 [1986] (Intl Org for Standardization / Intl Electrotechnical Commission)
JTC1/SC34/WG1 (WG 1 of SubCommittee 34 of Joint Technical Committee 1)
For presentation:
Document Style Semantics and Specification Language
18. Stein Markup 1.18
SSGGMMLL -- ccoonntt..
If SGML is so good why doesn’t anyone use it ?
Complexity
– base standard >500 pages
– SGML is a metalanguage
– writing DTD is complex programming
– marked up text is hard to read
– DSSSL adds to complexity
Inflexibility - requires absolute conformity
– assumes only one correct way to markup
– constrains author to dictated structure
– not good at capturing author’s structure
19. HHyyppeerrTTeexxtt MMaarrkkuupp LLaanngguuaaggee
CERN (particle physics institute in Switzerland) was an early Internet adopter
Used extensively for collaboration (articles have long author lists)
Major problems with format incompatibility
Stein Markup 1.19
– only straight ASCII worked reliably
Tim Berners-Lee (computer specialist) defined requirements
simplicity (couldn’t expect physicists to use SGML)
freedom (didn’t need validation, let browser ignore bad markup)
needed hypertext links (including to documents over Internet)
presentational markup (papers must look nice - authors used to TEX)
Solution: HTML - a specific application of SGML (not metalanguage)
20. Stein Markup 1.20
HHTTMMLL vveerrssiioonnss
HTML 1.0 (1989) Berners-Lee original CERN version
hypertext, images, head+body structure, presentational markup
HTML 2.0 (1994) IETF standard - RFC 1866
added lists, forms, etc.
HTML 3.2 (1997) W3C recommendation (incorporates Netscape extensions)
added tables, applets, super/sub-scripts
HTML 4.0 (1997) W3C recommendation (and similar ISO/IEC 15445)
minimizes presentational markup
XHTML 1.0 (2000) present W3C recommendation
reformulates HTML in XML
21. HHTTMMLL ddooccuummeenntt ssttrruuccttuurree
Stein Markup 1.21
<HTML>
<HEAD>
global definitions such as
<TITLE>Web page title</TITLE>
</HEAD>
<BODY>
marked-up text
</BODY>
</HTML>
23. Stein Markup 1.23
PPrroobblleemmss wwiitthh HHTTMMLL
Presentational aspects have predominated
<B> bold text </B>
<BLINK> blinking text </BLINK>
<FONT COLOR=“red”> red text </FONT>
Practically no descriptive markup
Search engines are reduced to flat text search
Search by topic only through keywords or portals
Not extensible
Can’t add new tags
Unknown tags ignored
Links are relatively simple
Usually user action is required (except IMG)
Only full document (with offset) linkable
Link management is logistic nightmare
24. Stein Markup 1.24
NNoott eevveerryytthhiinngg iiss HHTTMMLL
Due to HTML limitations other tools are also used:
Multimedia extensions
– (dynamic) gif, jpg, …
– streaming audio
Common Gateway Interface
– generate HTML on-the-fly
– Perl, C, …
Server Push - Server Pull
Javascript
Java
25. eeXXtteennssiibbllee MMaarrkkuupp LLaanngguuaaggee
Simplified (best parts of) SGML (subset of features)
Flexible content management tool
W3C recommendation(s)
Extensible - can add new elements (even without DTD)
Easy to create special purpose languages (with DTD/SCHEMA)
Includes HTML-like hypertext links
Stein Markup 1.25
– and extensions (XLINK, XPOINTER)
The future of the web !
26. Stein Markup 1.26
XXMMLL -- aann EExxaammppllee
<?xml version="1.0" standalone="yes"?>
<bibliography>
<book isbn=04712954>
<title>Digital Signal Processing: a Computer Science Perspective</title>
<author>Jonathan (Y) Stein</author>
<publisher>John Wiley and Sons</publisher>
</book>
<article>
<title>False Alarm Reduction for ASR and OCR</title>
<author>Yaakov Stein</author>
<proceedings>Tenth AICVNN Symposium</proceedings>
<pages>195-200</pages>
</article>
...
</bibliography>
27. ??WWhhaatt ccaann wwee ddoo wwiitthh aann XXMMLL ffiillee
Check if well-formed
Check if valid (against DTD or schema)
Display “as-is” in browser
Parse in special-purpose program (SAX, DOM)
Process (XSL) to XML, HTML, etc.
Display after processing
Stein Markup 1.27
28. WWiirreelleessss MMaarrkkuupp LLaanngguuaaggee
Markup language element of Wireless Application Protocol
WAP forum (1997)
– Ericsson, Motorola, Nokia, Unwired Planet (phone.com)
– bring Internet to cellular phone users
– re-use fundamental Internet concepts (TCP/IP, http, html, javascript)
Stein Markup 1.28
but adapted to lower bandwidth
smaller screen
limited input facilities
limited computational resources
– applications scale across transport options (GSM, TDMA, CDMA, 3G)
and device types (mobile phones, personal assistants)
29. Stein Markup 1.29
WWMMLL PPhhiilloossoopphhyy
Defined using XML
Transported in compressed binary (for BW reduction)
Applications are modeled as decks of cards
Features:
Actions (OK, navigation, help) can be performed
Hyperlinks (like in HTML)
String variables
Timers
wbmp images (B&W)
Select boxes, forms (for input)
wmlscript (like javascript)
30. Stein Markup 1.30
WWMMLL ssttrruuccttuurree
< ? xml version=“1.0” ? >
<!DOCTYPE wml …>
<wml>
<card>
<p>
text
</p>
<p>
text
</p>
</card>
.<.c.ard>
</card>
</wml>
31. Stein Markup 1.31
SSoommee WWMMLL eelleemmeennttss
<p> </p> text
<a href=...> </a> hyperlink (anchor)
<do> </do> action
<go href=.../> goto wml page
<timer> trigger event (units = tenths of a second)
<input/> input user text
<prev/> return to previous page
$(…) value of variable
<img src=… /> display image
<postfield name=… value=…/> set variable
<select > <option> <option> </select> select box
32. SSoommee mmoorree mmaarrkkuupp llaanngguuaaggeess
Stein Markup 1.32
VML = Vector (graphics) Markup Language
VoiceXML
SSML = Speech Synthesis Markup Language
CPML = Call Policy Markup Language
DSML = Directory Services Markup Language
MathML = Mathematical Markup Language
CML = Chemical Markup Language
AML = Astronomical Markup Language
LegalXML
BSML = Bioinformatic Sequence Markup Language
GedML = Genealogical Data Markup Language
FinXML = Financial market Markup Language
ChessML
SDML = Signed Document Markup Language
RELML = Real Estate Listing Markup Language
etc. etc. etc. ...
33. Stein Markup 1.33
EExxaammpplleess
HTML
– html examples
XML
– xml-file xsl-file xml
VML
–vml-file
WML (get M3gate emulator)
– wml examples