1. Lecture 4: Texts and Models
Prof. Alvarado
MDST 3703/7703
11 September 2012
2. Review
• Posting “Hello, World!”
– Put file in the public_html directory of your UVA
Home Directory
– Create a post and insert a link to this file
– Categorize as: 09.06: (S) HTML
• If you cannot get to your home directory, try
uploading to
http://homedir.virginia.edu
3. Some Quick Corrections
• Digital text is not necessary
– It’s an open question (i.e. do we have to have it?)
• Nelson did not conceive of “trails,” Bush did
• HTML is not the “first big idea” in the liberal arts;
hypertext is (according to me)
• The idea that “text shapes knowledge” is not
ancient, but relatively new
– Media determinism is a 20th century perspective
– Although Plato notes the effects of literacy in the Phaedo
• Not everything can be translated into HTML
– i.e. HTML is not the richest framework for digital
representation
4. Your Questions and Observations
• Is commercialization killing creativity?
– What is the relationship between how the web is
organized economically and how it shapes
expression? EFFECT OF SOCIAL ORGANIZATION
• What happens if the associations that
someone makes is „off ‟ and illogical to
others?
– Does it loosen the way logical connections can be
made and argued? EFFECT ON LOGIC
5. Your Questions and Observations
• Computers in general still heavily rely on a
hierarchical structure
– To what extent rationalization has occurred with the
invention of hypertext?
• Do things lose value and meaning in exchange
for digital coding?
– What is the effect of digitization on value?
• Hypertexts and links online can be distracting
– Non-linear thinking or mindless surfing?
6. Your Questions and Observations
• People are trying to create the same exact
classroom experience online that exists in the
physical classroom, which is impossible
– We need to rethink and restructure the online
learning experience as a new and unique learning
experience
• How can we keep hypertext from altering us
too much?
• The beauty and the risk of an open source web
7. Practical Questions
• How can an HTML webpage on your own computer
be found by the search bar but not be on the web?
– Your browser lives on your machine
– The protocol name tells it where to look
• I wondered if the picture from my computer would
still show up if I opened the page from another
computer?
• It is interesting to see how one little thing out of
place can ruin the entire code
– Computers are stupid in that way
• Why should coders learn HTML?
– HTML is an interface language that can be easily generated
from print statements in your code
8. What is HTML?
• HTML is not a programming language
– Programming languages express IF … THEN logic
– But it is code that obeys a syntax & gets interpreted
– And it is produced and consumed by programs
• HTML is a very general interface language
• HTML is written in XML, which we discuss
today
– Technically called “XHTML”
– The original version was written in SGML
9. In general, don’t conflate HTML with
hypertext or with digital
representation in general
10. HTML is a language that
generates a species of hypertext
which is, in turn, a species of
digital representation
19. These are all examples of
traditional texts
They exhibit “latent hypertext”
20. Landow
• The concept of hypertext parallels
poststructuralist views of text
– Barthes, Foucault, Derrida, Kristeva, et al.
• In this view, a text is not, and has never
been, a bounded, closed thing
– it is a network of signifiers that connect meanings
across time and space …
21. Digital humanists have been
concerned with encoding historical
texts since at least 1949
22. Father Busa
• Creator of the Index Thomisticus
• Saw the computer as a solution to indexing
the works of Aquinas in 1949
– 13,000,000 words
– “in” took 4 years
• Solution:
– Lemmatization
– Variations tagged as
instances of a type
23. The complete works of Aquinas will be typed onto
punch cards; the machines will then work through
the words and produce a systematic index of every
word St. Thomas used, together with the number
of times it appears, where it appears, and the six
words immediately preceding and following each
appearance (to give the context). This will take the
machines 8,125 hours; the same job would be
likely to take one man a lifetime.
Time Magazine, 1956, “Religion: Sacred: Electronics”
24. So, what is text?
Let‟s look at some material
examples
29. Visual Signifiers
• Small caps
• Indentation
• Alignment
• Italics
• Space
All used to signify elements of text
30. Documents have thee Levels:
Content, Structure, Style
• Content
– TEXT, images, video clips, etc.
• Structure
– The organization of content into units (elements)
and logical relationships (e.g. reading order)
• Style
– Screen and print layout
– Fonts, colors, etc.
31. Descriptive markup languages allow
us to define structure of documents
for computational purposes
Theoretically, they do not specify
layout or content
35. Document Elements and Structures
Play – Heading
– Act + • Return Address
• Scene + • Date
– Line + • Recipient Info
– Name
Book – Title
– Chapter + – Address
• Verse + – Content
• Salutation
• Paragraph +
• Closing
Letter
38. What is XML?
• Stands for eXtensible Markup Language
– Actually invented after the web
– A simplification of SGML, the language used to create
HTML
– It specifies a set of rules for creating specialized markup
languages such as HTML and TEI
• It is simplified version of the SGML
– Standard Generalized Markup Language
• SGML was invented in the early 1970s to wrest the
control of documents from computer people who
were taking over industries like law and accounting
39.
40. XML looks like this
Notice how the element names reference units, not layout or style
42. XML Premises
1. All documents are comprised of elements.
2. Elements contain content.
3. Elements have no layout.
4. Elements are hierarchically ordered.
5. Elements are to be indicated by “markup” –
tags that define the beginning and end of an
element
43. XML Markup Rules
• Tags signify structural elements
• Three kinds of tag
– Start and End, e.g <p> and </p>
– Singleton, e.g <br />
• Start and singleton tags can have attributes
– Simple key/value pairs
– <div class="stanza" style="color:red;">
• Basic rules
– All attributes must be quoted
– All tags must nest (no overlaps!)
45. XML also provides Document Types
• A Document Type Definition (DTD) defines a
set of tags and rules for using them
– Specifies elements, attributes, and possible
combinations
– E.g. in HTML, the ol and ul elements must contain li
elements
• A DTD is just one kind of schema system used
by XML
• Schema express data models of/for texts
– TEI is a powerful way of describing primary source
materials for scholars
• Documents that use a schema properly are
called “valid”
46. Originally, DTDs defined “genres”
like business letter or mortgage form
They were later used to define more
abstract models of textual content
47. XML is used everywhere
• HTML
– E.g. Embed codes
• TEI (Text Encoding Initiative)
• RSS
• Civilization IV
• Playlists (e.g. XSPF or “spiff ”)
• Google Maps (KML)
48. A Look Again at HTML
• aka XHTML
– And now becoming HTML5
• An instance of XML (formerly SGML)
• An interface language
• Language of the World Wide Web
• Defined by a DTD that prescribes a specific
set of elements and relations
50. Basic Elements with associated Tags
Element Tags Attributes
Paragraph <p> ... </p>
Numbered List <ol>
<li> ... </li>
</ol>
Bulleted List <ul>
<li> ... </li>
</ul>
Table <table>
<tr>
<td> ... </td>
</tr>
</table>
Anchor <a> ... </a> href, target
Image <img/> src, border
Object <object> ... </object>
51. The Text Encoding Initiative created
TEI to mark up scholarly documents
Mainly primary sources such as
books and manuscripts
52. TEI
• The dominant language used to encode
scholarly text
• The current room was the locations of
UVa‟s EText Center
– World famous for text encoding
– Now part of the library and catalog
• Scholars create their own schema to match
what they are interested in
53. Examples
• The TEI Header
– http://tbe.kantl.be/TBE/examples/TBED02v00.ht
m
• TEI Prose
– http://tbe.kantl.be/TBE/examples/TBED03v00.ht
m
• Find others at the TEI By Example Project
– http://tbe.kantl.be/TBE/
55. OCHO
• XML (and therefore HTML and TEI) imply
a certain theory of text
– A text is an OHCO
• OHCO
– Ordered Hierarchy of Content Objects
• An OHCO is a kind of tree
– Elements follow each other in sequences
– Elements can contain other elements
57. OHCO allows for easy processing
• Every element has a precise address in the text
– E.g. HTML/body/p[1]
• Texts can be described in the language of
kinship
– Ancestors, parents, siblings, children, etc.
• Texts can be restructured and manipulated by
known patterns and algorithms
– Traversing
– Pruning
– Cross-referencing
60. Pages and
Paragraphs
Two common structures
that overlap
61. Solution 1: Split Elements
<page n=“2”>
...
<p id=“foo”>His good looks and his rank had one fair
claim on his attachment, since to them he must have owed a
wife</p>
</page>
<page n=“3”>
<p id=“bar” prev_id=“foo”> a very superior character to
anything deserved by his own.</p>
...
</page>
62. Solution 2: Use “Milestones”
<p>His good looks and his rank had one fair claim on
his attachment, since to them he must have owed a
wife <pb n=“3” /> a very superior character to
anything deserved by his own.</p>
One structure gets backgrounded
69. McCarty
• A different use of markup
– From document description to interpretation
– Creative “misuse”
• Reverse engineering a “grammar” of
personification from a markup strategy
– Thickness = description (of text)
– Depth = explanation (of text by reference to grammar)
• Is forced to use tables in collaboration with
markup
72. A Proposed Model
• Texts are not documents
– Documents are media, Texts are messages
• Texts and documents are part of a system
comprised of “levels”
– They are effectively archaeology sites with
stratigraphic layers
– Erasures are like cities building on top of each other
• Each level of the system is described by an
appropriate set of tools
– Document structures XML
– Textual structures, embedded ontologies Tables
73. Basic Levels
• Document
– Physical objects (paper)
– Logical objects (defined by space, style, punctuation,
etc.)
– Style and layout (also defined by space, color, etc.)
– Can have superimposed versions
• Text
– Sequences of characters
– Grammatical features
– Figures and poetic features
– Etc.
Hinweis der Redaktion
Text becomes reducible to its elementsBasic feature of the medium