SlideShare a Scribd company logo
1 of 64
Text Models and Markup
Prof. Alvarado
MDST 3703
17 September 2013
Business
• Plan B: If Home Directory is not working for
you, please use the Hive
– Go to http://its.virginia.edu/hive/connected.html
– Install VMWare Client
– Use Notepad++
– Home Directory link your Desktop (also as J drive)
• Tutorials
– If you feel lost about HTML let me know
Review 1: Textual Signals
• Each of the authors last week viewed the text
as a kind of signal
• A signal is a pattern that contains messages
• Messages can be grasped through parsing the
signal
• What were the messages? How were they
parsed?
text can be viewed as a long signal consisting of characters selected from a common set of characte
A model of communication.
Messages get converted into signals and back into messages
by means of a shared code.
ENCODING DECODING
SHARED CODE
Person 1 Person 2
Author Parsed elements Decoded message
Levi-Strauss Relations and
bundles
Structural
oppositions
Colby Thesaurus words Thematic patterns
Ramsay Scenes Genres
Text is like this. This
is a map of DC
generated by
thousands of
individual Flickr and
Twitter events.
The picture is a kind
of signal—collective
and unconscious, yet
meaningful.
The patterns
discerned from the
signals are not
intentional, but they
are the products of
intentional activity.
http://anthonyflo.tumblr.com/post/7590868323/photographer-and-self-described-geek-of-maps
[Text is like this]
Review 2: Semantic HTML
• Also called POSH—”Plain Old Semantic HTML”
• The use of HTML to describe a text, not to
format it (CSS is used to format)
• DIV, SPAN, CLASS, and ID are general purpose
tools to provide more flexible markup
• What kinds of things can POSH be used to
describe?
Segue
Semantic markup may be used to support the
analysis of each of our authors—including
Aristotle
Aristotle: Elements of drama, Elements of plot
<div class=“plot-element” id=“reversal-of-
fortune”> ... </div>
Levi-Strauss: Relations and Bundles in myths
<span class=“relation”> ... </span>
Colby: Theme words in folktales
<span class=“antagonism”>fight</span>
Ramsay: Scenes in plays
<div class=“scene”> ... </div>
Let’s step back and look more
closely at “text”
Let’s look at some material examples
page o’ text
Real world text
comes packaged in
documents
How is text
conveyed in
a document?
A document is a
material artifact—
a medium with
which to convey a
signal
What is text?
Visual Signifiers
• Small caps
• Indentation
• Alignment
• Italics
• Space
All used to signify elements of text
Other examples
[Charrette]
[The Wasteland]
[Critical Edition]
[OED]
Documents have thee Levels:
Structure, Content, Style
Structure
The organization of content into units (elements)
and logical relationships (e.g. reading order)
Content
TEXT, images, video clips, etc.
Style
Screen and print layout
Fonts, colors, etc.
Descriptive markup languages allow
us to define structure of documents
for computational purposes
Theoretically, they do not specify
layout or content
[PDF, Procedural Markup]
In contrast to procedural markup like PDF
So, how are documents structured?
Hierarchically …
(theoretically)
Document Elements and Structures
Play
– Act +
• Scene +
– Line +
Book
– Chapter +
• Verse +
Letter
– Heading
• Return Address
• Date
• Recipient Info
– Name
– Title
– Address
– Content
• Salutation
• Paragraph +
• Closing
These are all “trees”
XML is a markup
language
It is a more powerful
system for semantic
markup than POSH
What is XML?
• Stands for eXtensible Markup Language
– Actually invented after the web
– A simplification of SGML, the language used to create
HTML
– It specifies a set of rules for creating specialized markup
languages such as HTML and TEI
• It is simplified version of the SGML
– Standard Generalized Markup Language
• SGML was invented in the early 1970s to wrest the
control of documents from computer people who were
taking over industries like law and accounting
XML looks like this
Notice how the element names reference units, not layout or style
Also markup for “in-line” elements
XML Premises
1. All documents are comprised of elements.
2. Elements contain content.
3. Elements have no layout.
4. Elements are hierarchically ordered.
5. Elements are to be indicated by “markup” –
tags that define the beginning and end of an
element
XML Markup Rules
• Tags signify structural elements
• Three kinds of tag
– Start and End, e.g <p> and </p>
– Singleton, e.g <br />
• Start and singleton tags can have attributes
– Simple key/value pairs
– <div class="stanza" style="color:red;">
• Basic rules
– All attributes must be quoted
– All tags must nest (no overlaps!)
Documents in XML that meet
these rules are “well formed”
XML also provides Document Types
• A Document Type Definition (DTD) defines a set of
tags and rules for using them
– Specifies elements, attributes, and possible
combinations
– E.g. in HTML, the ol and ul elements must contain li
elements
• A DTD is just one kind of schema system used by
XML
• Schema express data models of/for texts
– TEI is a powerful way of describing primary source
materials for scholars
• Documents that use a schema properly are called
“valid”
Originally, DTDs defined “genres”
like business letter or mortgage form
They were later used to define more
abstract models of textual content
XML is used everywhere
• HTML
– E.g. Embed codes
• TEI (Text Encoding Initiative)
• RSS
• Civilization IV
• Playlists (e.g. XSPF or “spiff”)
• Google Maps (KML)
The Text Encoding Initiative created
TEI to mark up scholarly documents
Mainly primary sources such as
books and manuscripts
TEI
• Written in XML (was SGML)
• The dominant language used to encode
scholarly text
• Scholars can select from a large set of
elements or their own elements to match
what they are interested in
Examples
• The TEI Header
– http://tbe.kantl.be/TBE/examples/TBED02v00.ht
m
• TEI Prose
– http://tbe.kantl.be/TBE/examples/TBED03v00.ht
m
• Find others at the TEI By Example Project
– http://tbe.kantl.be/TBE/
XML and TEI both contain an
implicit theory of text
What is it?
OCHO
• XML (and therefore HTML and TEI) imply a
certain theory of text
– A text is an OHCO
• OHCO
– Ordered Hierarchy of Content Objects
• An OHCO is a kind of tree
– Elements follow each other in sequences
– Elements can contain other elements
What are the advantages of this
view?
OHCO allows for easy processing
• Every element has a precise address in the text
– E.g. HTML/body/p[1]
• Texts can be described in the language of kinship
– Ancestors, parents, siblings, children, etc.
• Texts can be restructured and manipulated by
known patterns and algorithms
– Traversing
– Pruning
– Cross-referencing
What are the disadvantages of
OCHO?
Logical vs. Physical Structure
THIS IS WHAT WE ENCOUNTERED AT THE END OF LAST WEEK’S STUDIO
Two common structures
that overlap
Pages and
Paragraphs
<page n=“2”>
. . .
<p id=“foo”>His good looks and his rank had one fair
claim on his attachment, since to them he must have owed a
wife</p>
</page>
<page n=“3”>
<p id=“bar” prev_id=“foo”> a very superior character to
anything deserved by his own.</p>
. . .
</page>
Solution 1: Split Elements
<p>His good looks and his rank had one fair claim on
his attachment, since to them he must have owed a
wife <pb n=“3” /> a very superior character to
anything deserved by his own.</p>
Solution 2: Use “Milestones”
One structure gets backgrounded
Wittgenstein’s Manuscripts
What about this?
The problem of overlap suggests
that OHCO is not a simple as it looks
How does Renear “solve” the
problem?
Each OHCO markup schema
represents an analytical perspective,
an interpretive model
[Charrette]
So, XML, TEI, POSH – these allow us
to impose a model on a text
How does Unsworth characterize
these models?
A markup schema is a
“knowledge represention”
A KR is a model that comprises
1. A set of categories (aka Ontology)
Names and relationships between names
2. A set of inference rules (aka Logic)
A method of traversing names and relations
3. A medium for computation
A medium for mechanically producing inferences
4. A language for expressing these things
Such as a programming or markup language
What tools beside XML does
Unsworth reference as useful for
KR?
Tables
What are some differences
between trees and tables?
Tables are more rigid
Trees allow for indefinite depth
But tables are easier to manipulate
In any case, tables and trees are two
major kinds of data structure that
you will encounter …
How to reconcile these tools?
A Proposed Model
• Texts are not documents
– Documents are media, Texts are messages
• Texts and documents are part of a system
comprised of “levels”
– They are effectively archaeology sites with
stratigraphic layers
– Erasures are like cities building on top of each other
• Each level of the system is described by an
appropriate set of tools
– Document structures  XML
– Textual structures, embedded ontologies  Tables
Basic Levels
• Document
– Physical objects (paper)
– Logical objects (defined by space, style, punctuation,
etc.)
– Style and layout (also defined by space, color, etc.)
– Can have superimposed versions
• Text
– Sequences of characters
– Grammatical features
– Figures and poetic features
– Etc.

More Related Content

What's hot

Taxonomy, ontology, folksonomies & SKOS.
Taxonomy, ontology, folksonomies & SKOS.Taxonomy, ontology, folksonomies & SKOS.
Taxonomy, ontology, folksonomies & SKOS.
Janet Leu
 
Ontology Engineering: ontology construction II
Ontology Engineering: ontology construction IIOntology Engineering: ontology construction II
Ontology Engineering: ontology construction II
Guus Schreiber
 

What's hot (13)

Taxonomy, ontology, folksonomies & SKOS.
Taxonomy, ontology, folksonomies & SKOS.Taxonomy, ontology, folksonomies & SKOS.
Taxonomy, ontology, folksonomies & SKOS.
 
Ontology Engineering: ontology construction II
Ontology Engineering: ontology construction IIOntology Engineering: ontology construction II
Ontology Engineering: ontology construction II
 
The Standardization of Semantic Web Ontology
The Standardization of Semantic Web OntologyThe Standardization of Semantic Web Ontology
The Standardization of Semantic Web Ontology
 
Semantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: IntroductionSemantic Web, Ontology, and Ontology Learning: Introduction
Semantic Web, Ontology, and Ontology Learning: Introduction
 
The Semantic Web
The Semantic WebThe Semantic Web
The Semantic Web
 
Ontology
OntologyOntology
Ontology
 
MDST 3703 F10 Seminar 8
MDST 3703 F10 Seminar 8MDST 3703 F10 Seminar 8
MDST 3703 F10 Seminar 8
 
Semantic web
Semantic webSemantic web
Semantic web
 
Computer Science Library Orientation 2018
Computer Science Library Orientation 2018Computer Science Library Orientation 2018
Computer Science Library Orientation 2018
 
Metadata
MetadataMetadata
Metadata
 
Ontology
Ontology Ontology
Ontology
 
Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic web
 
Schema and Identity for Linked Data
Schema and Identity for Linked DataSchema and Identity for Linked Data
Schema and Identity for Linked Data
 

Viewers also liked

Mdst3703 2013-08-29-hello-world
Mdst3703 2013-08-29-hello-worldMdst3703 2013-08-29-hello-world
Mdst3703 2013-08-29-hello-world
Rafael Alvarado
 
Mdst3705 2012-01-22-code-as-language
Mdst3705 2012-01-22-code-as-languageMdst3705 2012-01-22-code-as-language
Mdst3705 2012-01-22-code-as-language
Rafael Alvarado
 
Mdst3703 ontology-overrated-2012-10-16
Mdst3703 ontology-overrated-2012-10-16Mdst3703 ontology-overrated-2012-10-16
Mdst3703 ontology-overrated-2012-10-16
Rafael Alvarado
 

Viewers also liked (8)

MDST 3703 F10 Studio 9
MDST 3703 F10 Studio 9MDST 3703 F10 Studio 9
MDST 3703 F10 Studio 9
 
Mdst 3559-02-17-php2
Mdst 3559-02-17-php2Mdst 3559-02-17-php2
Mdst 3559-02-17-php2
 
Mdst 3559-04-21-data-2
Mdst 3559-04-21-data-2Mdst 3559-04-21-data-2
Mdst 3559-04-21-data-2
 
Mdst3703 2013-08-29-hello-world
Mdst3703 2013-08-29-hello-worldMdst3703 2013-08-29-hello-world
Mdst3703 2013-08-29-hello-world
 
Mdst3705 2012-01-22-code-as-language
Mdst3705 2012-01-22-code-as-languageMdst3705 2012-01-22-code-as-language
Mdst3705 2012-01-22-code-as-language
 
Mdst3703 ontology-overrated-2012-10-16
Mdst3703 ontology-overrated-2012-10-16Mdst3703 ontology-overrated-2012-10-16
Mdst3703 ontology-overrated-2012-10-16
 
MDST 3703 F10 Seminar 4
MDST 3703 F10 Seminar 4MDST 3703 F10 Seminar 4
MDST 3703 F10 Seminar 4
 
Mdst 3559-03-22-case-1
Mdst 3559-03-22-case-1Mdst 3559-03-22-case-1
Mdst 3559-03-22-case-1
 

Similar to Mdst3703 2013-09-17-text-models

UVA MDST 3073 Texts and Models-2012-09-11
UVA MDST 3073 Texts and Models-2012-09-11UVA MDST 3073 Texts and Models-2012-09-11
UVA MDST 3073 Texts and Models-2012-09-11
Rafael Alvarado
 
UVA MDST 3703 Marking-Up a Text 2012-09-13
UVA MDST 3703 Marking-Up a Text 2012-09-13UVA MDST 3703 Marking-Up a Text 2012-09-13
UVA MDST 3703 Marking-Up a Text 2012-09-13
Rafael Alvarado
 
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-dataMdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Rafael Alvarado
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
KU Leuven
 
Mdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databasesMdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databases
Rafael Alvarado
 
Lecture4202011 110420175305-phpapp01
Lecture4202011 110420175305-phpapp01Lecture4202011 110420175305-phpapp01
Lecture4202011 110420175305-phpapp01
Tarek Koudsi
 
Ppt programming by alyssa marie paral
Ppt programming by alyssa marie paralPpt programming by alyssa marie paral
Ppt programming by alyssa marie paral
alyssamarieparal
 

Similar to Mdst3703 2013-09-17-text-models (20)

UVA MDST 3073 Texts and Models-2012-09-11
UVA MDST 3073 Texts and Models-2012-09-11UVA MDST 3073 Texts and Models-2012-09-11
UVA MDST 3073 Texts and Models-2012-09-11
 
UVA MDST 3703 Marking-Up a Text 2012-09-13
UVA MDST 3703 Marking-Up a Text 2012-09-13UVA MDST 3703 Marking-Up a Text 2012-09-13
UVA MDST 3703 Marking-Up a Text 2012-09-13
 
CS6010 Social Network Analysis Unit II
CS6010 Social Network Analysis   Unit IICS6010 Social Network Analysis   Unit II
CS6010 Social Network Analysis Unit II
 
XML
XMLXML
XML
 
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-dataMdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
 
Text data mining1
Text data mining1Text data mining1
Text data mining1
 
Wisneski TeI workshop 2009-2010
Wisneski TeI workshop 2009-2010Wisneski TeI workshop 2009-2010
Wisneski TeI workshop 2009-2010
 
Xml Case Learns 2008
Xml Case Learns 2008Xml Case Learns 2008
Xml Case Learns 2008
 
Ontologies Fmi 042010
Ontologies Fmi 042010Ontologies Fmi 042010
Ontologies Fmi 042010
 
Mdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databasesMdst3705 2013-02-05-databases
Mdst3705 2013-02-05-databases
 
Lecture4202011 110420175305-phpapp01
Lecture4202011 110420175305-phpapp01Lecture4202011 110420175305-phpapp01
Lecture4202011 110420175305-phpapp01
 
SKOS, RDFa, Microformats, Microdata
SKOS, RDFa, Microformats, MicrodataSKOS, RDFa, Microformats, Microdata
SKOS, RDFa, Microformats, Microdata
 
MDST 3703 F10 Studio 4
MDST 3703 F10 Studio 4MDST 3703 F10 Studio 4
MDST 3703 F10 Studio 4
 
Xml iet 2015
Xml iet 2015Xml iet 2015
Xml iet 2015
 
Ppt programming by alyssa marie paral
Ppt programming by alyssa marie paralPpt programming by alyssa marie paral
Ppt programming by alyssa marie paral
 
Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?Semantic technology in nutshell 2013. Semantic! are you a linguist?
Semantic technology in nutshell 2013. Semantic! are you a linguist?
 
Ontologies Presentation
Ontologies PresentationOntologies Presentation
Ontologies Presentation
 
Ontologies Presentation
Ontologies PresentationOntologies Presentation
Ontologies Presentation
 
Xml and webdata
Xml and webdataXml and webdata
Xml and webdata
 
Xml and webdata
Xml and webdataXml and webdata
Xml and webdata
 

More from Rafael Alvarado

Mdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collectionsMdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collections
Rafael Alvarado
 
Mdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertextMdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertext
Rafael Alvarado
 
Mdst3703 2013-09-10-textual-signals
Mdst3703 2013-09-10-textual-signalsMdst3703 2013-09-10-textual-signals
Mdst3703 2013-09-10-textual-signals
Rafael Alvarado
 
Mdst3703 2013-09-05-studio2
Mdst3703 2013-09-05-studio2Mdst3703 2013-09-05-studio2
Mdst3703 2013-09-05-studio2
Rafael Alvarado
 
Mdst3703 2013-09-03-plato2
Mdst3703 2013-09-03-plato2Mdst3703 2013-09-03-plato2
Mdst3703 2013-09-03-plato2
Rafael Alvarado
 
MDST 3705 2012-03-05 Databases to Visualization
MDST 3705 2012-03-05 Databases to VisualizationMDST 3705 2012-03-05 Databases to Visualization
MDST 3705 2012-03-05 Databases to Visualization
Rafael Alvarado
 
Mdst3705 2013-02-26-db-as-genre
Mdst3705 2013-02-26-db-as-genreMdst3705 2013-02-26-db-as-genre
Mdst3705 2013-02-26-db-as-genre
Rafael Alvarado
 
Mdst3705 2013-02-12-finding-data
Mdst3705 2013-02-12-finding-dataMdst3705 2013-02-12-finding-data
Mdst3705 2013-02-12-finding-data
Rafael Alvarado
 
Mdst3705 2013-01-29-praxis
Mdst3705 2013-01-29-praxisMdst3705 2013-01-29-praxis
Mdst3705 2013-01-29-praxis
Rafael Alvarado
 
Mdst3705 2013-01-31-php3
Mdst3705 2013-01-31-php3Mdst3705 2013-01-31-php3
Mdst3705 2013-01-31-php3
Rafael Alvarado
 
Mdst3705 2013-01-24-php2
Mdst3705 2013-01-24-php2Mdst3705 2013-01-24-php2
Mdst3705 2013-01-24-php2
Rafael Alvarado
 
Mdst3705 2012-01-15-introduction
Mdst3705 2012-01-15-introductionMdst3705 2012-01-15-introduction
Mdst3705 2012-01-15-introduction
Rafael Alvarado
 
Mdst3703 graph-theory-11-20-2012
Mdst3703 graph-theory-11-20-2012Mdst3703 graph-theory-11-20-2012
Mdst3703 graph-theory-11-20-2012
Rafael Alvarado
 
Mdst3703 maps-and-timelines-2012-11-13
Mdst3703 maps-and-timelines-2012-11-13Mdst3703 maps-and-timelines-2012-11-13
Mdst3703 maps-and-timelines-2012-11-13
Rafael Alvarado
 
Mdst3703 culturomics-2012-11-01
Mdst3703 culturomics-2012-11-01Mdst3703 culturomics-2012-11-01
Mdst3703 culturomics-2012-11-01
Rafael Alvarado
 
Mdst3703 visualization-2012-10-23
Mdst3703 visualization-2012-10-23Mdst3703 visualization-2012-10-23
Mdst3703 visualization-2012-10-23
Rafael Alvarado
 
Mdst3703 shiva-2012-10-18
Mdst3703 shiva-2012-10-18Mdst3703 shiva-2012-10-18
Mdst3703 shiva-2012-10-18
Rafael Alvarado
 
Mdst3703 projects-2012-10-11
Mdst3703 projects-2012-10-11Mdst3703 projects-2012-10-11
Mdst3703 projects-2012-10-11
Rafael Alvarado
 
UVA MDST 3703 JavaScript (ii) 2012-10-04
UVA MDST 3703 JavaScript (ii) 2012-10-04UVA MDST 3703 JavaScript (ii) 2012-10-04
UVA MDST 3703 JavaScript (ii) 2012-10-04
Rafael Alvarado
 

More from Rafael Alvarado (20)

Mdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collectionsMdst3703 2013-10-08-thematic-research-collections
Mdst3703 2013-10-08-thematic-research-collections
 
Mdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertextMdst3703 2013-09-24-hypertext
Mdst3703 2013-09-24-hypertext
 
Presentation1
Presentation1Presentation1
Presentation1
 
Mdst3703 2013-09-10-textual-signals
Mdst3703 2013-09-10-textual-signalsMdst3703 2013-09-10-textual-signals
Mdst3703 2013-09-10-textual-signals
 
Mdst3703 2013-09-05-studio2
Mdst3703 2013-09-05-studio2Mdst3703 2013-09-05-studio2
Mdst3703 2013-09-05-studio2
 
Mdst3703 2013-09-03-plato2
Mdst3703 2013-09-03-plato2Mdst3703 2013-09-03-plato2
Mdst3703 2013-09-03-plato2
 
MDST 3705 2012-03-05 Databases to Visualization
MDST 3705 2012-03-05 Databases to VisualizationMDST 3705 2012-03-05 Databases to Visualization
MDST 3705 2012-03-05 Databases to Visualization
 
Mdst3705 2013-02-26-db-as-genre
Mdst3705 2013-02-26-db-as-genreMdst3705 2013-02-26-db-as-genre
Mdst3705 2013-02-26-db-as-genre
 
Mdst3705 2013-02-12-finding-data
Mdst3705 2013-02-12-finding-dataMdst3705 2013-02-12-finding-data
Mdst3705 2013-02-12-finding-data
 
Mdst3705 2013-01-29-praxis
Mdst3705 2013-01-29-praxisMdst3705 2013-01-29-praxis
Mdst3705 2013-01-29-praxis
 
Mdst3705 2013-01-31-php3
Mdst3705 2013-01-31-php3Mdst3705 2013-01-31-php3
Mdst3705 2013-01-31-php3
 
Mdst3705 2013-01-24-php2
Mdst3705 2013-01-24-php2Mdst3705 2013-01-24-php2
Mdst3705 2013-01-24-php2
 
Mdst3705 2012-01-15-introduction
Mdst3705 2012-01-15-introductionMdst3705 2012-01-15-introduction
Mdst3705 2012-01-15-introduction
 
Mdst3703 graph-theory-11-20-2012
Mdst3703 graph-theory-11-20-2012Mdst3703 graph-theory-11-20-2012
Mdst3703 graph-theory-11-20-2012
 
Mdst3703 maps-and-timelines-2012-11-13
Mdst3703 maps-and-timelines-2012-11-13Mdst3703 maps-and-timelines-2012-11-13
Mdst3703 maps-and-timelines-2012-11-13
 
Mdst3703 culturomics-2012-11-01
Mdst3703 culturomics-2012-11-01Mdst3703 culturomics-2012-11-01
Mdst3703 culturomics-2012-11-01
 
Mdst3703 visualization-2012-10-23
Mdst3703 visualization-2012-10-23Mdst3703 visualization-2012-10-23
Mdst3703 visualization-2012-10-23
 
Mdst3703 shiva-2012-10-18
Mdst3703 shiva-2012-10-18Mdst3703 shiva-2012-10-18
Mdst3703 shiva-2012-10-18
 
Mdst3703 projects-2012-10-11
Mdst3703 projects-2012-10-11Mdst3703 projects-2012-10-11
Mdst3703 projects-2012-10-11
 
UVA MDST 3703 JavaScript (ii) 2012-10-04
UVA MDST 3703 JavaScript (ii) 2012-10-04UVA MDST 3703 JavaScript (ii) 2012-10-04
UVA MDST 3703 JavaScript (ii) 2012-10-04
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

Mdst3703 2013-09-17-text-models

  • 1. Text Models and Markup Prof. Alvarado MDST 3703 17 September 2013
  • 2. Business • Plan B: If Home Directory is not working for you, please use the Hive – Go to http://its.virginia.edu/hive/connected.html – Install VMWare Client – Use Notepad++ – Home Directory link your Desktop (also as J drive) • Tutorials – If you feel lost about HTML let me know
  • 3. Review 1: Textual Signals • Each of the authors last week viewed the text as a kind of signal • A signal is a pattern that contains messages • Messages can be grasped through parsing the signal • What were the messages? How were they parsed?
  • 4. text can be viewed as a long signal consisting of characters selected from a common set of characte
  • 5. A model of communication. Messages get converted into signals and back into messages by means of a shared code. ENCODING DECODING SHARED CODE Person 1 Person 2
  • 6. Author Parsed elements Decoded message Levi-Strauss Relations and bundles Structural oppositions Colby Thesaurus words Thematic patterns Ramsay Scenes Genres
  • 7. Text is like this. This is a map of DC generated by thousands of individual Flickr and Twitter events. The picture is a kind of signal—collective and unconscious, yet meaningful. The patterns discerned from the signals are not intentional, but they are the products of intentional activity. http://anthonyflo.tumblr.com/post/7590868323/photographer-and-self-described-geek-of-maps [Text is like this]
  • 8. Review 2: Semantic HTML • Also called POSH—”Plain Old Semantic HTML” • The use of HTML to describe a text, not to format it (CSS is used to format) • DIV, SPAN, CLASS, and ID are general purpose tools to provide more flexible markup • What kinds of things can POSH be used to describe?
  • 9. Segue Semantic markup may be used to support the analysis of each of our authors—including Aristotle Aristotle: Elements of drama, Elements of plot <div class=“plot-element” id=“reversal-of- fortune”> ... </div> Levi-Strauss: Relations and Bundles in myths <span class=“relation”> ... </span> Colby: Theme words in folktales <span class=“antagonism”>fight</span> Ramsay: Scenes in plays <div class=“scene”> ... </div>
  • 10. Let’s step back and look more closely at “text” Let’s look at some material examples
  • 11. page o’ text Real world text comes packaged in documents
  • 12. How is text conveyed in a document? A document is a material artifact— a medium with which to convey a signal
  • 13.
  • 15. Visual Signifiers • Small caps • Indentation • Alignment • Italics • Space All used to signify elements of text
  • 20. [OED]
  • 21. Documents have thee Levels: Structure, Content, Style Structure The organization of content into units (elements) and logical relationships (e.g. reading order) Content TEXT, images, video clips, etc. Style Screen and print layout Fonts, colors, etc.
  • 22. Descriptive markup languages allow us to define structure of documents for computational purposes Theoretically, they do not specify layout or content
  • 23. [PDF, Procedural Markup] In contrast to procedural markup like PDF
  • 24. So, how are documents structured?
  • 26. Document Elements and Structures Play – Act + • Scene + – Line + Book – Chapter + • Verse + Letter – Heading • Return Address • Date • Recipient Info – Name – Title – Address – Content • Salutation • Paragraph + • Closing
  • 27. These are all “trees”
  • 28. XML is a markup language It is a more powerful system for semantic markup than POSH
  • 29. What is XML? • Stands for eXtensible Markup Language – Actually invented after the web – A simplification of SGML, the language used to create HTML – It specifies a set of rules for creating specialized markup languages such as HTML and TEI • It is simplified version of the SGML – Standard Generalized Markup Language • SGML was invented in the early 1970s to wrest the control of documents from computer people who were taking over industries like law and accounting
  • 30.
  • 31. XML looks like this Notice how the element names reference units, not layout or style
  • 32. Also markup for “in-line” elements
  • 33. XML Premises 1. All documents are comprised of elements. 2. Elements contain content. 3. Elements have no layout. 4. Elements are hierarchically ordered. 5. Elements are to be indicated by “markup” – tags that define the beginning and end of an element
  • 34. XML Markup Rules • Tags signify structural elements • Three kinds of tag – Start and End, e.g <p> and </p> – Singleton, e.g <br /> • Start and singleton tags can have attributes – Simple key/value pairs – <div class="stanza" style="color:red;"> • Basic rules – All attributes must be quoted – All tags must nest (no overlaps!)
  • 35. Documents in XML that meet these rules are “well formed”
  • 36. XML also provides Document Types • A Document Type Definition (DTD) defines a set of tags and rules for using them – Specifies elements, attributes, and possible combinations – E.g. in HTML, the ol and ul elements must contain li elements • A DTD is just one kind of schema system used by XML • Schema express data models of/for texts – TEI is a powerful way of describing primary source materials for scholars • Documents that use a schema properly are called “valid”
  • 37. Originally, DTDs defined “genres” like business letter or mortgage form They were later used to define more abstract models of textual content
  • 38. XML is used everywhere • HTML – E.g. Embed codes • TEI (Text Encoding Initiative) • RSS • Civilization IV • Playlists (e.g. XSPF or “spiff”) • Google Maps (KML)
  • 39. The Text Encoding Initiative created TEI to mark up scholarly documents Mainly primary sources such as books and manuscripts
  • 40. TEI • Written in XML (was SGML) • The dominant language used to encode scholarly text • Scholars can select from a large set of elements or their own elements to match what they are interested in
  • 41. Examples • The TEI Header – http://tbe.kantl.be/TBE/examples/TBED02v00.ht m • TEI Prose – http://tbe.kantl.be/TBE/examples/TBED03v00.ht m • Find others at the TEI By Example Project – http://tbe.kantl.be/TBE/
  • 42. XML and TEI both contain an implicit theory of text What is it?
  • 43. OCHO • XML (and therefore HTML and TEI) imply a certain theory of text – A text is an OHCO • OHCO – Ordered Hierarchy of Content Objects • An OHCO is a kind of tree – Elements follow each other in sequences – Elements can contain other elements
  • 44. What are the advantages of this view?
  • 45. OHCO allows for easy processing • Every element has a precise address in the text – E.g. HTML/body/p[1] • Texts can be described in the language of kinship – Ancestors, parents, siblings, children, etc. • Texts can be restructured and manipulated by known patterns and algorithms – Traversing – Pruning – Cross-referencing
  • 46. What are the disadvantages of OCHO?
  • 47. Logical vs. Physical Structure THIS IS WHAT WE ENCOUNTERED AT THE END OF LAST WEEK’S STUDIO
  • 48. Two common structures that overlap Pages and Paragraphs
  • 49. <page n=“2”> . . . <p id=“foo”>His good looks and his rank had one fair claim on his attachment, since to them he must have owed a wife</p> </page> <page n=“3”> <p id=“bar” prev_id=“foo”> a very superior character to anything deserved by his own.</p> . . . </page> Solution 1: Split Elements
  • 50. <p>His good looks and his rank had one fair claim on his attachment, since to them he must have owed a wife <pb n=“3” /> a very superior character to anything deserved by his own.</p> Solution 2: Use “Milestones” One structure gets backgrounded
  • 52. The problem of overlap suggests that OHCO is not a simple as it looks How does Renear “solve” the problem?
  • 53. Each OHCO markup schema represents an analytical perspective, an interpretive model
  • 55. So, XML, TEI, POSH – these allow us to impose a model on a text How does Unsworth characterize these models?
  • 56. A markup schema is a “knowledge represention”
  • 57. A KR is a model that comprises 1. A set of categories (aka Ontology) Names and relationships between names 2. A set of inference rules (aka Logic) A method of traversing names and relations 3. A medium for computation A medium for mechanically producing inferences 4. A language for expressing these things Such as a programming or markup language
  • 58. What tools beside XML does Unsworth reference as useful for KR?
  • 60. What are some differences between trees and tables?
  • 61. Tables are more rigid Trees allow for indefinite depth But tables are easier to manipulate In any case, tables and trees are two major kinds of data structure that you will encounter …
  • 62. How to reconcile these tools?
  • 63. A Proposed Model • Texts are not documents – Documents are media, Texts are messages • Texts and documents are part of a system comprised of “levels” – They are effectively archaeology sites with stratigraphic layers – Erasures are like cities building on top of each other • Each level of the system is described by an appropriate set of tools – Document structures  XML – Textual structures, embedded ontologies  Tables
  • 64. Basic Levels • Document – Physical objects (paper) – Logical objects (defined by space, style, punctuation, etc.) – Style and layout (also defined by space, color, etc.) – Can have superimposed versions • Text – Sequences of characters – Grammatical features – Figures and poetic features – Etc.

Editor's Notes

  1. ----- Meeting Notes (9/17/13 12:14) -----This is where I can add notes ...
  2. Old French illuminated manuscript. What does the image mean?
  3. TS Eliot, the Wasteland – note use of line breaks; what do they mean?
  4. A critical edition of Jane Austen’s Persuasion
  5. A dictionary entry …
  6. (theoretically)
  7. http://biblioklept.org/2012/01/31/list-of-rejections-of-wittgensteins-mistress-david-markson/