SlideShare ist ein Scribd-Unternehmen logo
1 von 73
Lecture 4: Texts and Models

        Prof. Alvarado
      MDST 3703/7703
      11 September 2012
Review
• Posting “Hello, World!”
  – Put file in the public_html directory of your UVA
    Home Directory
  – Create a post and insert a link to this file
  – Categorize as: 09.06: (S) HTML
• If you cannot get to your home directory, try
  uploading to
  http://homedir.virginia.edu
Some Quick Corrections
• Digital text is not necessary
   – It’s an open question (i.e. do we have to have it?)
• Nelson did not conceive of “trails,” Bush did
• HTML is not the “first big idea” in the liberal arts;
  hypertext is (according to me)
• The idea that “text shapes knowledge” is not
  ancient, but relatively new
   – Media determinism is a 20th century perspective
   – Although Plato notes the effects of literacy in the Phaedo
• Not everything can be translated into HTML
   – i.e. HTML is not the richest framework for digital
     representation
Your Questions and Observations
• Is commercialization killing creativity?
  – What is the relationship between how the web is
    organized economically and how it shapes
    expression?  EFFECT OF SOCIAL ORGANIZATION
• What happens if the associations that
  someone makes is „off ‟ and illogical to
  others?
  – Does it loosen the way logical connections can be
    made and argued?  EFFECT ON LOGIC
Your Questions and Observations
• Computers in general still heavily rely on a
  hierarchical structure
  – To what extent rationalization has occurred with the
    invention of hypertext?
• Do things lose value and meaning in exchange
  for digital coding?
  – What is the effect of digitization on value?
• Hypertexts and links online can be distracting
  – Non-linear thinking or mindless surfing?
Your Questions and Observations
• People are trying to create the same exact
  classroom experience online that exists in the
  physical classroom, which is impossible
  – We need to rethink and restructure the online
    learning experience as a new and unique learning
    experience
• How can we keep hypertext from altering us
  too much?
• The beauty and the risk of an open source web
Practical Questions
• How can an HTML webpage on your own computer
  be found by the search bar but not be on the web?
  – Your browser lives on your machine
  – The protocol name tells it where to look
• I wondered if the picture from my computer would
  still show up if I opened the page from another
  computer?
• It is interesting to see how one little thing out of
  place can ruin the entire code
   – Computers are stupid in that way
• Why should coders learn HTML?
   – HTML is an interface language that can be easily generated
     from print statements in your code
What is HTML?
• HTML is not a programming language
  – Programming languages express IF … THEN logic
  – But it is code that obeys a syntax & gets interpreted
  – And it is produced and consumed by programs
• HTML is a very general interface language
• HTML is written in XML, which we discuss
  today
  – Technically called “XHTML”
  – The original version was written in SGML
In general, don’t conflate HTML with
       hypertext or with digital
      representation in general
HTML is a language that
generates a species of hypertext
 which is, in turn, a species of
    digital representation
A provisional
   taxonomy
Is hypertext new?
[Study Bible]
1 = Mishna, the first major
           transcription of the oral law
           2 = Gemara, analytical
           discussions
           3 = Rashi, glossary
[Talmud]   4 = Tosefos, additions
           5 = Hananel, comments
           6 = Eye of Justice, legal
           decisions
           8 = Light of the
           Bible, references to Biblical
           quotations.
           9 = Bach's Annotations
           10 = Gra's Annotations
[Charrette]
[The Wasteland]
[Critical Edition]
[OED]
These are all examples of
       traditional texts
They exhibit “latent hypertext”
Landow
• The concept of hypertext parallels
  poststructuralist views of text
  – Barthes, Foucault, Derrida, Kristeva, et al.
• In this view, a text is not, and has never
  been, a bounded, closed thing
  – it is a network of signifiers that connect meanings
    across time and space …
Digital humanists have been
concerned with encoding historical
     texts since at least 1949
Father Busa
• Creator of the Index Thomisticus
• Saw the computer as a solution to indexing
  the works of Aquinas in 1949
  – 13,000,000 words
  – “in” took 4 years
• Solution:
  – Lemmatization
  – Variations tagged as
    instances of a type
The complete works of Aquinas will be typed onto
punch cards; the machines will then work through
the words and produce a systematic index of every
word St. Thomas used, together with the number
of times it appears, where it appears, and the six
words immediately preceding and following each
appearance (to give the context). This will take the
machines 8,125 hours; the same job would be
likely to take one man a lifetime.

   Time Magazine, 1956, “Religion: Sacred: Electronics”
So, what is text?

Let‟s look at some material
         examples
page o’ text
Real world text
comes packaged in
documents
A document is a
material artifact


How is text
conveyed in
a document?
What is text?
Visual Signifiers
•   Small caps
•   Indentation
•   Alignment
•   Italics
•   Space


All used to signify elements of text
Documents have thee Levels:
        Content, Structure, Style
• Content
  – TEXT, images, video clips, etc.
• Structure
  – The organization of content into units (elements)
    and logical relationships (e.g. reading order)
• Style
  – Screen and print layout
  – Fonts, colors, etc.
Descriptive markup languages allow
us to define structure of documents
    for computational purposes

 Theoretically, they do not specify
        layout or content
[PDF, Procedural Markup]




In contrast to procedural markup like PDF
So, how are docs structured?
Hierarchically …




(theoretically)
Document Elements and Structures
Play                 – Heading
  – Act +               • Return Address
       • Scene +        • Date
          – Line +      • Recipient Info
                           – Name
Book                       – Title
  – Chapter +              – Address
       • Verse +     – Content
                        • Salutation
                        • Paragraph +
                        • Closing

Letter
These are all “trees”
XML is a markup
  language
What is XML?
• Stands for eXtensible Markup Language
   – Actually invented after the web
   – A simplification of SGML, the language used to create
     HTML
   – It specifies a set of rules for creating specialized markup
     languages such as HTML and TEI
• It is simplified version of the SGML
   – Standard Generalized Markup Language
• SGML was invented in the early 1970s to wrest the
  control of documents from computer people who
  were taking over industries like law and accounting
XML looks like this




Notice how the element names reference units, not layout or style
Also markup for “in-line” elements
XML Premises
1.   All documents are comprised of elements.
2.   Elements contain content.
3.   Elements have no layout.
4.   Elements are hierarchically ordered.
5.   Elements are to be indicated by “markup” –
     tags that define the beginning and end of an
     element
XML Markup Rules
• Tags signify structural elements
• Three kinds of tag
  – Start and End, e.g <p> and </p>
  – Singleton, e.g <br />
• Start and singleton tags can have attributes
  – Simple key/value pairs
  – <div class="stanza" style="color:red;">
• Basic rules
  – All attributes must be quoted
  – All tags must nest (no overlaps!)
Documents in XML that meet
these rules are “well formed”
XML also provides Document Types
• A Document Type Definition (DTD) defines a
  set of tags and rules for using them
  – Specifies elements, attributes, and possible
    combinations
  – E.g. in HTML, the ol and ul elements must contain li
    elements
• A DTD is just one kind of schema system used
  by XML
• Schema express data models of/for texts
  – TEI is a powerful way of describing primary source
    materials for scholars
• Documents that use a schema properly are
  called “valid”
Originally, DTDs defined “genres”
like business letter or mortgage form

They were later used to define more
 abstract models of textual content
XML is used everywhere
• HTML
    – E.g. Embed codes
•   TEI (Text Encoding Initiative)
•   RSS
•   Civilization IV
•   Playlists (e.g. XSPF or “spiff ”)
•   Google Maps (KML)
A Look Again at HTML
• aka XHTML
    – And now becoming HTML5
•   An instance of XML (formerly SGML)
•   An interface language
•   Language of the World Wide Web
•   Defined by a DTD that prescribes a specific
    set of elements and relations
HTML Document Structure
• Head
  – Title
  – [Directives]
• Body
  – H1+
  – H2+
     • P+
     • UL
          – LI
Basic Elements with associated Tags
Element         Tags                     Attributes
Paragraph       <p> ... </p>
Numbered List   <ol>
                 <li> ... </li>
                </ol>
Bulleted List   <ul>
                 <li> ... </li>
                </ul>
Table           <table>
                 <tr>
                  <td> ... </td>
                 </tr>
                </table>
Anchor          <a> ... </a>             href, target
Image           <img/>                   src, border
Object          <object> ... </object>
The Text Encoding Initiative created
TEI to mark up scholarly documents
    Mainly primary sources such as
       books and manuscripts
TEI
• The dominant language used to encode
  scholarly text
• The current room was the locations of
  UVa‟s EText Center
  – World famous for text encoding
  – Now part of the library and catalog
• Scholars create their own schema to match
  what they are interested in
Examples
• The TEI Header
  – http://tbe.kantl.be/TBE/examples/TBED02v00.ht
    m
• TEI Prose
  – http://tbe.kantl.be/TBE/examples/TBED03v00.ht
    m
• Find others at the TEI By Example Project
  – http://tbe.kantl.be/TBE/
XML contains an implicit theory
           of text
           What is it?
OCHO
• XML (and therefore HTML and TEI) imply
  a certain theory of text
  – A text is an OHCO
• OHCO
  – Ordered Hierarchy of Content Objects
• An OHCO is a kind of tree
  – Elements follow each other in sequences
  – Elements can contain other elements
What are the advantages of this
            view?
OHCO allows for easy processing
• Every element has a precise address in the text
  – E.g. HTML/body/p[1]
• Texts can be described in the language of
  kinship
  – Ancestors, parents, siblings, children, etc.
• Texts can be restructured and manipulated by
  known patterns and algorithms
  – Traversing
  – Pruning
  – Cross-referencing
What are the disadvantages of
           OCHO?
Logical vs. Physical Structure
Pages and
   Paragraphs


Two common structures
that overlap
Solution 1: Split Elements
<page n=“2”>
...
<p id=“foo”>His good looks and his rank had one fair
claim on his attachment, since to them he must have owed a
wife</p>
</page>
<page n=“3”>
<p id=“bar” prev_id=“foo”> a very superior character to
anything deserved by his own.</p>
...
</page>
Solution 2: Use “Milestones”

<p>His good looks and his rank had one fair claim on
his attachment, since to them he must have owed a
wife <pb n=“3” /> a very superior character to
anything deserved by his own.</p>



     One structure gets backgrounded
Wittgenstein’s Manuscripts




      What about this?
[Charrette]
The problem of overlap suggests
the need for a richer set of tools
What tools do McCarty and
  Unsworth reference?
Tables
A database for Ovid
McCarty
• A different use of markup
  – From document description to interpretation
  – Creative “misuse”
• Reverse engineering a “grammar” of
  personification from a markup strategy
  – Thickness = description (of text)
  – Depth = explanation (of text by reference to grammar)
• Is forced to use tables in collaboration with
  markup
Thick description = Markup
 Deep description = Tables
How to reconcile these tools?
A Proposed Model
• Texts are not documents
  – Documents are media, Texts are messages
• Texts and documents are part of a system
  comprised of “levels”
  – They are effectively archaeology sites with
    stratigraphic layers
  – Erasures are like cities building on top of each other
• Each level of the system is described by an
  appropriate set of tools
  – Document structures  XML
  – Textual structures, embedded ontologies  Tables
Basic Levels
• Document
  – Physical objects (paper)
  – Logical objects (defined by space, style, punctuation,
    etc.)
  – Style and layout (also defined by space, color, etc.)
  – Can have superimposed versions
• Text
  –   Sequences of characters
  –   Grammatical features
  –   Figures and poetic features
  –   Etc.

Weitere ähnliche Inhalte

Was ist angesagt? (6)

Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Semantic engagement handouts
Semantic engagement handoutsSemantic engagement handouts
Semantic engagement handouts
 
Ontologies: vehicles for reuse
Ontologies: vehicles for reuseOntologies: vehicles for reuse
Ontologies: vehicles for reuse
 
Large-Scale Semantic Search
Large-Scale Semantic SearchLarge-Scale Semantic Search
Large-Scale Semantic Search
 
3. introduction to text mining
3. introduction to text mining3. introduction to text mining
3. introduction to text mining
 
Web 3 final(1)
Web 3 final(1)Web 3 final(1)
Web 3 final(1)
 

Andere mochten auch

Mdst 3559-04-05-networks-and-graphs
Mdst 3559-04-05-networks-and-graphsMdst 3559-04-05-networks-and-graphs
Mdst 3559-04-05-networks-and-graphs
Rafael Alvarado
 
Mdst3559 2011-05-03-final-day
Mdst3559 2011-05-03-final-dayMdst3559 2011-05-03-final-day
Mdst3559 2011-05-03-final-day
Rafael Alvarado
 
Mdst 3559-03-03-sql-php-2
Mdst 3559-03-03-sql-php-2Mdst 3559-03-03-sql-php-2
Mdst 3559-03-03-sql-php-2
Rafael Alvarado
 
Mdst 3559-01-27-data-journalism-studio
Mdst 3559-01-27-data-journalism-studioMdst 3559-01-27-data-journalism-studio
Mdst 3559-01-27-data-journalism-studio
Rafael Alvarado
 

Andere mochten auch (8)

Mdst 3559-02-17-php2
Mdst 3559-02-17-php2Mdst 3559-02-17-php2
Mdst 3559-02-17-php2
 
Mdst 3559-04-05-networks-and-graphs
Mdst 3559-04-05-networks-and-graphsMdst 3559-04-05-networks-and-graphs
Mdst 3559-04-05-networks-and-graphs
 
Mdst3559 2011-05-03-final-day
Mdst3559 2011-05-03-final-dayMdst3559 2011-05-03-final-day
Mdst3559 2011-05-03-final-day
 
Mdst 3559-03-03-sql-php-2
Mdst 3559-03-03-sql-php-2Mdst 3559-03-03-sql-php-2
Mdst 3559-03-03-sql-php-2
 
Mdst 3559-01-27-data-journalism-studio
Mdst 3559-01-27-data-journalism-studioMdst 3559-01-27-data-journalism-studio
Mdst 3559-01-27-data-journalism-studio
 
MDST 3703 F10 Seminar 1
MDST 3703 F10 Seminar 1MDST 3703 F10 Seminar 1
MDST 3703 F10 Seminar 1
 
Mdst 3559-02-01-html
Mdst 3559-02-01-htmlMdst 3559-02-01-html
Mdst 3559-02-01-html
 
MDST 3703 F10 Studio 11
MDST 3703 F10 Studio 11MDST 3703 F10 Studio 11
MDST 3703 F10 Studio 11
 

Ähnlich wie UVA MDST 3073 Texts and Models-2012-09-11

UVA MDST 3703 Marking-Up a Text 2012-09-13
UVA MDST 3703 Marking-Up a Text 2012-09-13UVA MDST 3703 Marking-Up a Text 2012-09-13
UVA MDST 3703 Marking-Up a Text 2012-09-13
Rafael Alvarado
 
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-dataMdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Rafael Alvarado
 
Mdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databasesMdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databases
Rafael Alvarado
 
UVA MDST 3703 Thematic Research Collections 2012-09-18
UVA MDST 3703 Thematic Research Collections 2012-09-18UVA MDST 3703 Thematic Research Collections 2012-09-18
UVA MDST 3703 Thematic Research Collections 2012-09-18
Rafael Alvarado
 
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
gagravarr
 
Introduction
IntroductionIntroduction
Introduction
sriniefs
 

Ähnlich wie UVA MDST 3073 Texts and Models-2012-09-11 (20)

Editing Correspondence. The I in TEI.
Editing Correspondence. The I in TEI.Editing Correspondence. The I in TEI.
Editing Correspondence. The I in TEI.
 
UVA MDST 3703 Marking-Up a Text 2012-09-13
UVA MDST 3703 Marking-Up a Text 2012-09-13UVA MDST 3703 Marking-Up a Text 2012-09-13
UVA MDST 3703 Marking-Up a Text 2012-09-13
 
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-dataMdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
 
Mdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databasesMdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databases
 
E-publishing
E-publishingE-publishing
E-publishing
 
XML
XMLXML
XML
 
Feb.2016 Demystifying Digital Humanities - Workshop 2
Feb.2016 Demystifying Digital Humanities - Workshop 2Feb.2016 Demystifying Digital Humanities - Workshop 2
Feb.2016 Demystifying Digital Humanities - Workshop 2
 
UVA MDST 3703 Thematic Research Collections 2012-09-18
UVA MDST 3703 Thematic Research Collections 2012-09-18UVA MDST 3703 Thematic Research Collections 2012-09-18
UVA MDST 3703 Thematic Research Collections 2012-09-18
 
IR
IRIR
IR
 
Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?
 
Web Technology
Web Technology Web Technology
Web Technology
 
Web Technology
Web Technology Web Technology
Web Technology
 
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
What's With The 1S And 0S? Making Sense Of Binary Data At Scale With Tika And...
 
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
 
Introduction
IntroductionIntroduction
Introduction
 
DMDS Winter 2015 Workshop 1 slides
DMDS Winter 2015 Workshop 1 slidesDMDS Winter 2015 Workshop 1 slides
DMDS Winter 2015 Workshop 1 slides
 
Wisneski TeI workshop 2009-2010
Wisneski TeI workshop 2009-2010Wisneski TeI workshop 2009-2010
Wisneski TeI workshop 2009-2010
 
Xml Case Learns 2008
Xml Case Learns 2008Xml Case Learns 2008
Xml Case Learns 2008
 
Xml iet 2015
Xml iet 2015Xml iet 2015
Xml iet 2015
 
51095137-Semantic-WEB.ppt
51095137-Semantic-WEB.ppt51095137-Semantic-WEB.ppt
51095137-Semantic-WEB.ppt
 

Mehr von Rafael Alvarado

Mdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collectionsMdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collections
Rafael Alvarado
 
Mdst3703 2013-10-01-hypertext-and-history
Mdst3703 2013-10-01-hypertext-and-historyMdst3703 2013-10-01-hypertext-and-history
Mdst3703 2013-10-01-hypertext-and-history
Rafael Alvarado
 
Mdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertextMdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertext
Rafael Alvarado
 
Mdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-htmlMdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-html
Rafael Alvarado
 
Mdst3703 2013-09-10-textual-signals
Mdst3703 2013-09-10-textual-signalsMdst3703 2013-09-10-textual-signals
Mdst3703 2013-09-10-textual-signals
Rafael Alvarado
 
Mdst3703 2013-09-05-studio2
Mdst3703 2013-09-05-studio2Mdst3703 2013-09-05-studio2
Mdst3703 2013-09-05-studio2
Rafael Alvarado
 
Mdst3703 2013-09-03-plato2
Mdst3703 2013-09-03-plato2Mdst3703 2013-09-03-plato2
Mdst3703 2013-09-03-plato2
Rafael Alvarado
 
Mdst3703 2013-08-29-hello-world
Mdst3703 2013-08-29-hello-worldMdst3703 2013-08-29-hello-world
Mdst3703 2013-08-29-hello-world
Rafael Alvarado
 
UVA MDST 3703 2013 08-27 Introduction
UVA MDST 3703 2013 08-27 IntroductionUVA MDST 3703 2013 08-27 Introduction
UVA MDST 3703 2013 08-27 Introduction
Rafael Alvarado
 
MDST 3705 2012-03-05 Databases to Visualization
MDST 3705 2012-03-05 Databases to VisualizationMDST 3705 2012-03-05 Databases to Visualization
MDST 3705 2012-03-05 Databases to Visualization
Rafael Alvarado
 
Mdst3705 2013-02-26-db-as-genre
Mdst3705 2013-02-26-db-as-genreMdst3705 2013-02-26-db-as-genre
Mdst3705 2013-02-26-db-as-genre
Rafael Alvarado
 
Mdst3705 2013-02-12-finding-data
Mdst3705 2013-02-12-finding-dataMdst3705 2013-02-12-finding-data
Mdst3705 2013-02-12-finding-data
Rafael Alvarado
 
Mdst3705 2013-01-29-praxis
Mdst3705 2013-01-29-praxisMdst3705 2013-01-29-praxis
Mdst3705 2013-01-29-praxis
Rafael Alvarado
 
Mdst3705 2013-01-31-php3
Mdst3705 2013-01-31-php3Mdst3705 2013-01-31-php3
Mdst3705 2013-01-31-php3
Rafael Alvarado
 
Mdst3705 2012-01-22-code-as-language
Mdst3705 2012-01-22-code-as-languageMdst3705 2012-01-22-code-as-language
Mdst3705 2012-01-22-code-as-language
Rafael Alvarado
 
Mdst3705 2013-01-24-php2
Mdst3705 2013-01-24-php2Mdst3705 2013-01-24-php2
Mdst3705 2013-01-24-php2
Rafael Alvarado
 
Mdst3705 2012-01-15-introduction
Mdst3705 2012-01-15-introductionMdst3705 2012-01-15-introduction
Mdst3705 2012-01-15-introduction
Rafael Alvarado
 
Mdst3703 graph-theory-11-20-2012
Mdst3703 graph-theory-11-20-2012Mdst3703 graph-theory-11-20-2012
Mdst3703 graph-theory-11-20-2012
Rafael Alvarado
 
Mdst3703 maps-and-timelines-2012-11-13
Mdst3703 maps-and-timelines-2012-11-13Mdst3703 maps-and-timelines-2012-11-13
Mdst3703 maps-and-timelines-2012-11-13
Rafael Alvarado
 

Mehr von Rafael Alvarado (20)

Mdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collectionsMdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collections
 
Mdst3703 2013-10-01-hypertext-and-history
Mdst3703 2013-10-01-hypertext-and-historyMdst3703 2013-10-01-hypertext-and-history
Mdst3703 2013-10-01-hypertext-and-history
 
Mdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertextMdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertext
 
Presentation1
Presentation1Presentation1
Presentation1
 
Mdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-htmlMdst3703 2013-09-12-semantic-html
Mdst3703 2013-09-12-semantic-html
 
Mdst3703 2013-09-10-textual-signals
Mdst3703 2013-09-10-textual-signalsMdst3703 2013-09-10-textual-signals
Mdst3703 2013-09-10-textual-signals
 
Mdst3703 2013-09-05-studio2
Mdst3703 2013-09-05-studio2Mdst3703 2013-09-05-studio2
Mdst3703 2013-09-05-studio2
 
Mdst3703 2013-09-03-plato2
Mdst3703 2013-09-03-plato2Mdst3703 2013-09-03-plato2
Mdst3703 2013-09-03-plato2
 
Mdst3703 2013-08-29-hello-world
Mdst3703 2013-08-29-hello-worldMdst3703 2013-08-29-hello-world
Mdst3703 2013-08-29-hello-world
 
UVA MDST 3703 2013 08-27 Introduction
UVA MDST 3703 2013 08-27 IntroductionUVA MDST 3703 2013 08-27 Introduction
UVA MDST 3703 2013 08-27 Introduction
 
MDST 3705 2012-03-05 Databases to Visualization
MDST 3705 2012-03-05 Databases to VisualizationMDST 3705 2012-03-05 Databases to Visualization
MDST 3705 2012-03-05 Databases to Visualization
 
Mdst3705 2013-02-26-db-as-genre
Mdst3705 2013-02-26-db-as-genreMdst3705 2013-02-26-db-as-genre
Mdst3705 2013-02-26-db-as-genre
 
Mdst3705 2013-02-12-finding-data
Mdst3705 2013-02-12-finding-dataMdst3705 2013-02-12-finding-data
Mdst3705 2013-02-12-finding-data
 
Mdst3705 2013-01-29-praxis
Mdst3705 2013-01-29-praxisMdst3705 2013-01-29-praxis
Mdst3705 2013-01-29-praxis
 
Mdst3705 2013-01-31-php3
Mdst3705 2013-01-31-php3Mdst3705 2013-01-31-php3
Mdst3705 2013-01-31-php3
 
Mdst3705 2012-01-22-code-as-language
Mdst3705 2012-01-22-code-as-languageMdst3705 2012-01-22-code-as-language
Mdst3705 2012-01-22-code-as-language
 
Mdst3705 2013-01-24-php2
Mdst3705 2013-01-24-php2Mdst3705 2013-01-24-php2
Mdst3705 2013-01-24-php2
 
Mdst3705 2012-01-15-introduction
Mdst3705 2012-01-15-introductionMdst3705 2012-01-15-introduction
Mdst3705 2012-01-15-introduction
 
Mdst3703 graph-theory-11-20-2012
Mdst3703 graph-theory-11-20-2012Mdst3703 graph-theory-11-20-2012
Mdst3703 graph-theory-11-20-2012
 
Mdst3703 maps-and-timelines-2012-11-13
Mdst3703 maps-and-timelines-2012-11-13Mdst3703 maps-and-timelines-2012-11-13
Mdst3703 maps-and-timelines-2012-11-13
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

UVA MDST 3073 Texts and Models-2012-09-11

  • 1. Lecture 4: Texts and Models Prof. Alvarado MDST 3703/7703 11 September 2012
  • 2. Review • Posting “Hello, World!” – Put file in the public_html directory of your UVA Home Directory – Create a post and insert a link to this file – Categorize as: 09.06: (S) HTML • If you cannot get to your home directory, try uploading to http://homedir.virginia.edu
  • 3. Some Quick Corrections • Digital text is not necessary – It’s an open question (i.e. do we have to have it?) • Nelson did not conceive of “trails,” Bush did • HTML is not the “first big idea” in the liberal arts; hypertext is (according to me) • The idea that “text shapes knowledge” is not ancient, but relatively new – Media determinism is a 20th century perspective – Although Plato notes the effects of literacy in the Phaedo • Not everything can be translated into HTML – i.e. HTML is not the richest framework for digital representation
  • 4. Your Questions and Observations • Is commercialization killing creativity? – What is the relationship between how the web is organized economically and how it shapes expression?  EFFECT OF SOCIAL ORGANIZATION • What happens if the associations that someone makes is „off ‟ and illogical to others? – Does it loosen the way logical connections can be made and argued?  EFFECT ON LOGIC
  • 5. Your Questions and Observations • Computers in general still heavily rely on a hierarchical structure – To what extent rationalization has occurred with the invention of hypertext? • Do things lose value and meaning in exchange for digital coding? – What is the effect of digitization on value? • Hypertexts and links online can be distracting – Non-linear thinking or mindless surfing?
  • 6. Your Questions and Observations • People are trying to create the same exact classroom experience online that exists in the physical classroom, which is impossible – We need to rethink and restructure the online learning experience as a new and unique learning experience • How can we keep hypertext from altering us too much? • The beauty and the risk of an open source web
  • 7. Practical Questions • How can an HTML webpage on your own computer be found by the search bar but not be on the web? – Your browser lives on your machine – The protocol name tells it where to look • I wondered if the picture from my computer would still show up if I opened the page from another computer? • It is interesting to see how one little thing out of place can ruin the entire code – Computers are stupid in that way • Why should coders learn HTML? – HTML is an interface language that can be easily generated from print statements in your code
  • 8. What is HTML? • HTML is not a programming language – Programming languages express IF … THEN logic – But it is code that obeys a syntax & gets interpreted – And it is produced and consumed by programs • HTML is a very general interface language • HTML is written in XML, which we discuss today – Technically called “XHTML” – The original version was written in SGML
  • 9. In general, don’t conflate HTML with hypertext or with digital representation in general
  • 10. HTML is a language that generates a species of hypertext which is, in turn, a species of digital representation
  • 11. A provisional taxonomy
  • 14. 1 = Mishna, the first major transcription of the oral law 2 = Gemara, analytical discussions 3 = Rashi, glossary [Talmud] 4 = Tosefos, additions 5 = Hananel, comments 6 = Eye of Justice, legal decisions 8 = Light of the Bible, references to Biblical quotations. 9 = Bach's Annotations 10 = Gra's Annotations
  • 18. [OED]
  • 19. These are all examples of traditional texts They exhibit “latent hypertext”
  • 20. Landow • The concept of hypertext parallels poststructuralist views of text – Barthes, Foucault, Derrida, Kristeva, et al. • In this view, a text is not, and has never been, a bounded, closed thing – it is a network of signifiers that connect meanings across time and space …
  • 21. Digital humanists have been concerned with encoding historical texts since at least 1949
  • 22. Father Busa • Creator of the Index Thomisticus • Saw the computer as a solution to indexing the works of Aquinas in 1949 – 13,000,000 words – “in” took 4 years • Solution: – Lemmatization – Variations tagged as instances of a type
  • 23. The complete works of Aquinas will be typed onto punch cards; the machines will then work through the words and produce a systematic index of every word St. Thomas used, together with the number of times it appears, where it appears, and the six words immediately preceding and following each appearance (to give the context). This will take the machines 8,125 hours; the same job would be likely to take one man a lifetime. Time Magazine, 1956, “Religion: Sacred: Electronics”
  • 24. So, what is text? Let‟s look at some material examples
  • 25. page o’ text Real world text comes packaged in documents
  • 26. A document is a material artifact How is text conveyed in a document?
  • 27.
  • 29. Visual Signifiers • Small caps • Indentation • Alignment • Italics • Space All used to signify elements of text
  • 30. Documents have thee Levels: Content, Structure, Style • Content – TEXT, images, video clips, etc. • Structure – The organization of content into units (elements) and logical relationships (e.g. reading order) • Style – Screen and print layout – Fonts, colors, etc.
  • 31. Descriptive markup languages allow us to define structure of documents for computational purposes Theoretically, they do not specify layout or content
  • 32. [PDF, Procedural Markup] In contrast to procedural markup like PDF
  • 33. So, how are docs structured?
  • 35. Document Elements and Structures Play – Heading – Act + • Return Address • Scene + • Date – Line + • Recipient Info – Name Book – Title – Chapter + – Address • Verse + – Content • Salutation • Paragraph + • Closing Letter
  • 36. These are all “trees”
  • 37. XML is a markup language
  • 38. What is XML? • Stands for eXtensible Markup Language – Actually invented after the web – A simplification of SGML, the language used to create HTML – It specifies a set of rules for creating specialized markup languages such as HTML and TEI • It is simplified version of the SGML – Standard Generalized Markup Language • SGML was invented in the early 1970s to wrest the control of documents from computer people who were taking over industries like law and accounting
  • 39.
  • 40. XML looks like this Notice how the element names reference units, not layout or style
  • 41. Also markup for “in-line” elements
  • 42. XML Premises 1. All documents are comprised of elements. 2. Elements contain content. 3. Elements have no layout. 4. Elements are hierarchically ordered. 5. Elements are to be indicated by “markup” – tags that define the beginning and end of an element
  • 43. XML Markup Rules • Tags signify structural elements • Three kinds of tag – Start and End, e.g <p> and </p> – Singleton, e.g <br /> • Start and singleton tags can have attributes – Simple key/value pairs – <div class="stanza" style="color:red;"> • Basic rules – All attributes must be quoted – All tags must nest (no overlaps!)
  • 44. Documents in XML that meet these rules are “well formed”
  • 45. XML also provides Document Types • A Document Type Definition (DTD) defines a set of tags and rules for using them – Specifies elements, attributes, and possible combinations – E.g. in HTML, the ol and ul elements must contain li elements • A DTD is just one kind of schema system used by XML • Schema express data models of/for texts – TEI is a powerful way of describing primary source materials for scholars • Documents that use a schema properly are called “valid”
  • 46. Originally, DTDs defined “genres” like business letter or mortgage form They were later used to define more abstract models of textual content
  • 47. XML is used everywhere • HTML – E.g. Embed codes • TEI (Text Encoding Initiative) • RSS • Civilization IV • Playlists (e.g. XSPF or “spiff ”) • Google Maps (KML)
  • 48. A Look Again at HTML • aka XHTML – And now becoming HTML5 • An instance of XML (formerly SGML) • An interface language • Language of the World Wide Web • Defined by a DTD that prescribes a specific set of elements and relations
  • 49. HTML Document Structure • Head – Title – [Directives] • Body – H1+ – H2+ • P+ • UL – LI
  • 50. Basic Elements with associated Tags Element Tags Attributes Paragraph <p> ... </p> Numbered List <ol> <li> ... </li> </ol> Bulleted List <ul> <li> ... </li> </ul> Table <table> <tr> <td> ... </td> </tr> </table> Anchor <a> ... </a> href, target Image <img/> src, border Object <object> ... </object>
  • 51. The Text Encoding Initiative created TEI to mark up scholarly documents Mainly primary sources such as books and manuscripts
  • 52. TEI • The dominant language used to encode scholarly text • The current room was the locations of UVa‟s EText Center – World famous for text encoding – Now part of the library and catalog • Scholars create their own schema to match what they are interested in
  • 53. Examples • The TEI Header – http://tbe.kantl.be/TBE/examples/TBED02v00.ht m • TEI Prose – http://tbe.kantl.be/TBE/examples/TBED03v00.ht m • Find others at the TEI By Example Project – http://tbe.kantl.be/TBE/
  • 54. XML contains an implicit theory of text What is it?
  • 55. OCHO • XML (and therefore HTML and TEI) imply a certain theory of text – A text is an OHCO • OHCO – Ordered Hierarchy of Content Objects • An OHCO is a kind of tree – Elements follow each other in sequences – Elements can contain other elements
  • 56. What are the advantages of this view?
  • 57. OHCO allows for easy processing • Every element has a precise address in the text – E.g. HTML/body/p[1] • Texts can be described in the language of kinship – Ancestors, parents, siblings, children, etc. • Texts can be restructured and manipulated by known patterns and algorithms – Traversing – Pruning – Cross-referencing
  • 58. What are the disadvantages of OCHO?
  • 59. Logical vs. Physical Structure
  • 60. Pages and Paragraphs Two common structures that overlap
  • 61. Solution 1: Split Elements <page n=“2”> ... <p id=“foo”>His good looks and his rank had one fair claim on his attachment, since to them he must have owed a wife</p> </page> <page n=“3”> <p id=“bar” prev_id=“foo”> a very superior character to anything deserved by his own.</p> ... </page>
  • 62. Solution 2: Use “Milestones” <p>His good looks and his rank had one fair claim on his attachment, since to them he must have owed a wife <pb n=“3” /> a very superior character to anything deserved by his own.</p> One structure gets backgrounded
  • 63. Wittgenstein’s Manuscripts What about this?
  • 65. The problem of overlap suggests the need for a richer set of tools
  • 66. What tools do McCarty and Unsworth reference?
  • 69. McCarty • A different use of markup – From document description to interpretation – Creative “misuse” • Reverse engineering a “grammar” of personification from a markup strategy – Thickness = description (of text) – Depth = explanation (of text by reference to grammar) • Is forced to use tables in collaboration with markup
  • 70. Thick description = Markup Deep description = Tables
  • 71. How to reconcile these tools?
  • 72. A Proposed Model • Texts are not documents – Documents are media, Texts are messages • Texts and documents are part of a system comprised of “levels” – They are effectively archaeology sites with stratigraphic layers – Erasures are like cities building on top of each other • Each level of the system is described by an appropriate set of tools – Document structures  XML – Textual structures, embedded ontologies  Tables
  • 73. Basic Levels • Document – Physical objects (paper) – Logical objects (defined by space, style, punctuation, etc.) – Style and layout (also defined by space, color, etc.) – Can have superimposed versions • Text – Sequences of characters – Grammatical features – Figures and poetic features – Etc.

Hinweis der Redaktion

  1. Text becomes reducible to its elementsBasic feature of the medium
  2. (theoretically)
  3. http://biblioklept.org/2012/01/31/list-of-rejections-of-wittgensteins-mistress-david-markson/