SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
Minoan linguistic resources: The Linear A digital Corpus
Tommaso Petrolito⊙⊕ Ruggero Petrolito⊙ Grégoire Winterstein⊖⊕
Francesco Perono Cacciafoco⊕⊙
⊙
Filologia Letteratura e Linguistica, University of Pisa, Italy
⊖
Linguistics and Modern Language Studies,
The Hong Kong Institute of Education, Hong Kong
⊕
Linguistics and Multilingual Studies,
Nanyang Technological University, Singapore
tommasouni@gmail.com,ruggero.petrolito@gmail.com,
gregoire@ied.edu.hk,fcacciafoco@ntu.edu.sg
30 July 2015
Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 1 / 16
Introduction
We’ll describe the Linear A/Minoan digital corpus and the approaches
we applied to develop it
Why we should develop a Linear A Corpus and the reasons for which
we chose XML-TEI EpiDoc
Available resources and developing process
The Linear A Corpus as Cultural Heritage
Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 2 / 16
Linear A and Minoan
The Linear A script was used by the Minoan Civilization (Crete, 2500
– 1450 BC) and it still remains undeciphered
Many symbols are shared by both Linear A and Linear B and are
assumed to have phonetic values. The others are probably logograms:
Linear A/B Linear A
symbols 81 260
value syllable logogram
Linear B has been deciphered (during the ’50s) and found to be used
to write an Ancient Greek dialect, so many scholars are trying to
decipher Linear A too
Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 3 / 16
Lack in digital resources
After decades no deciphering attempts have been successful
No heavy computational approaches have been attempted
Only John G. Younger, in his website, provides a complete digital
collection
▶ Nevertheless, it is stored in two simple HTML pages with not strict
structure and transcribed as transliterations
A new digital corpus in a suitable format and well organized may be a
useful resource
Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 4 / 16
Available resources
1,427 Linear A documents containing 7,362-7,396 signs
(about 2 A4 pages of text at 11pt)
GORILA paper collection of inscriptions and transcriptions
John G. Younger’s website
Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 5 / 16
GORILA
GORILA: Louis Godart and Jean-Pierre Olivier, Recueil des
inscriptions en Linéaire A
GORILA contains
▶ a catalog of symbols/numeric codes
▶ documents indexes with information about original place and type of
support (these indexes were defined in the first place by Pope&Raison)
▶ indexed documents descriptions including pictures, drawings and
handmade transcriptions
the GORILA information is the standard point of reference: even
recent collections always refer to the GORILA volume and page
Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 6 / 16
John G. Younger’s website
http://people.ku.edu/~jyounger/LinearA/
the website contains
▶ two HTML pages, one for Haghia Triada’s documents, one for all the
other places of origin
▶ 1,077 transcriptions, with Linear B phonetics and GORILA code
numbers (75.5% of the total amount of existing documents listed in
GORILA)
▶ a conversion table: GORILA code numbers to syllables
Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 7 / 16
From Younger’s syllables to Unicode
Unicode GORILA Syllable
10600 AB01 DA
10601 AB02 RO
10602 AB03 PA
The Unicode set of characters for Linear A was released in June 2014
The 1,077 documents represented on Younger’s website have been
automatically converted
▶ from the syllable transcription (coexisting alongside GORILA code
numbers for symbols not included in Linear B) to the full GORILA code
numbers transcription
▶ from GORILA code numbers to Unicode
Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 8 / 16
Segmentation issues
Separation is mainly indicated in two ways:
▶ by isolating sign groups with numbers or logograms, thereby implying a
separation
▶ dots between sign groups, always used if there are long sign groups
strings
Example: This is a Linear A line:
▶ is a number (it is assumed to be a number 5)
▶ so and are assumed to be separated sign groups
Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 9 / 16
Corpus data format
XML provides important advantages
▶ metadata on several levels of annotation
▶ elements and entities for unsupported glyphs or symbols
EpiDoc is a TEI DTD with customization for Epigraphy
▶ TEI-using community can provide support
▶ a wide range of best-practice examples are available online
The ”old” Leiden system annotation task, familiar to epigraphers, is
quite similar to the XML TEI EpiDoc annotation process
Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 10 / 16
Corpus data format example
<div lang="minoan"
n="text"
type="edition"
part="N"
sample="complete"
org="uniform">
<head lang="eng">Edition</head>
<cb rend="front" n="HM 1673"/>
<ab part="N">
<lb n="1"/>
<w part="N"> </w>
<space dim="horizontal"
extent="1em"
unit="character"/>
<w part="N"> </w>
<lb n="2"/>
<w part="N"> </w>
<g ref="#n5"/>
<w part="N"> </w>
<lb n="3"/>
<w part="N"> </w>
<g ref="#n12"/>
<w part="N"> </w>
Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 11 / 16
Unsupported glyphs handling
Inside the EncodingDesc>CharDecl elements, glyph elements can
be defined
g elements referring to glyphs can be used to represent unsupported
symbols
<glyph xml:id="n5">
<glyphName>
Number 5
</glyphName>
<mapping type="standardized">
5
</mapping>
</glyph>
<lb n="2"/>
<w part="N"> </w>
<g ref="#n5"/>
<w part="N"> </w>
Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 12 / 16
Corpus size
GORILA: 1,427 Linear A documents
John G. Younger’s website: 1,077 Linear A transcriptions (75.5% of
the total)
Our corpus will contain up to 1,077 Linear A XML TEI EpiDoc
documents
The Unicode conversions of John G. Younger’s transcriptions have
been converted in XML in an automatic way but the tagging has been
only partially carried out
The main remaing work (still in progress) is manually checking the
data with the GORILA volumes
Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 13 / 16
John Younger ttf
Before the release of Unicode 7.0, there was no way to visualize
characters in the range 10600–1077F
The ’traditional’ Linear A font, LA.ttf, included wrong Unicode
positions
We developed a new Linear A font, named after John Younger to
show our appreciation for his work: John_Younger.ttf (available at
http://openfontlibrary.org/en/font/john-younger)
Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 14 / 16
From Linear A to Minoan culture
The Linear A corpus is an important cultural monument, storing
information about tradition, knowledge and lifestyle of Minoan people
Even without a full understanding of transcriptions some cultural
features can be inferred
▶ Economics and commerce: as some ideograms for basic commodities
are similar to their Linear B counterparts, we can compare types and
amounts of commodities
▶ Religion: there are around thirty libation formulas transcribed on
various supports
Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 15 / 16
Future work and Acknowledgements
XSL style sheets in order to create suitable HTML pages
A web interface to annotate and enrich the corpus information
All the data will be freely available and published at the following
URL: http://ling.ied.edu.HK/~gregoire/lineara
This work was started when the 1st, 3rd and 4th authors were visitors
at NTU, support by the Erasmus MULTI II exchange program.
We thank John Younger for permission to use the data from his
website.
Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 16 / 16

Weitere ähnliche Inhalte

Mehr von Association for Computational Linguistics

Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Association for Computational Linguistics
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Association for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsAssociation for Computational Linguistics
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Association for Computational Linguistics
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Association for Computational Linguistics
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Association for Computational Linguistics
 

Mehr von Association for Computational Linguistics (20)

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal TextMuis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
 
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
 
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour AnalysisCastro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text SimplificationElior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 

Kürzlich hochgeladen

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 

Kürzlich hochgeladen (20)

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 

Tommaso Petrolito - 2015 - Minoan linguistic resources: The Linear A Digital Corpus

  • 1. Minoan linguistic resources: The Linear A digital Corpus Tommaso Petrolito⊙⊕ Ruggero Petrolito⊙ Grégoire Winterstein⊖⊕ Francesco Perono Cacciafoco⊕⊙ ⊙ Filologia Letteratura e Linguistica, University of Pisa, Italy ⊖ Linguistics and Modern Language Studies, The Hong Kong Institute of Education, Hong Kong ⊕ Linguistics and Multilingual Studies, Nanyang Technological University, Singapore tommasouni@gmail.com,ruggero.petrolito@gmail.com, gregoire@ied.edu.hk,fcacciafoco@ntu.edu.sg 30 July 2015 Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 1 / 16
  • 2. Introduction We’ll describe the Linear A/Minoan digital corpus and the approaches we applied to develop it Why we should develop a Linear A Corpus and the reasons for which we chose XML-TEI EpiDoc Available resources and developing process The Linear A Corpus as Cultural Heritage Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 2 / 16
  • 3. Linear A and Minoan The Linear A script was used by the Minoan Civilization (Crete, 2500 – 1450 BC) and it still remains undeciphered Many symbols are shared by both Linear A and Linear B and are assumed to have phonetic values. The others are probably logograms: Linear A/B Linear A symbols 81 260 value syllable logogram Linear B has been deciphered (during the ’50s) and found to be used to write an Ancient Greek dialect, so many scholars are trying to decipher Linear A too Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 3 / 16
  • 4. Lack in digital resources After decades no deciphering attempts have been successful No heavy computational approaches have been attempted Only John G. Younger, in his website, provides a complete digital collection ▶ Nevertheless, it is stored in two simple HTML pages with not strict structure and transcribed as transliterations A new digital corpus in a suitable format and well organized may be a useful resource Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 4 / 16
  • 5. Available resources 1,427 Linear A documents containing 7,362-7,396 signs (about 2 A4 pages of text at 11pt) GORILA paper collection of inscriptions and transcriptions John G. Younger’s website Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 5 / 16
  • 6. GORILA GORILA: Louis Godart and Jean-Pierre Olivier, Recueil des inscriptions en Linéaire A GORILA contains ▶ a catalog of symbols/numeric codes ▶ documents indexes with information about original place and type of support (these indexes were defined in the first place by Pope&Raison) ▶ indexed documents descriptions including pictures, drawings and handmade transcriptions the GORILA information is the standard point of reference: even recent collections always refer to the GORILA volume and page Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 6 / 16
  • 7. John G. Younger’s website http://people.ku.edu/~jyounger/LinearA/ the website contains ▶ two HTML pages, one for Haghia Triada’s documents, one for all the other places of origin ▶ 1,077 transcriptions, with Linear B phonetics and GORILA code numbers (75.5% of the total amount of existing documents listed in GORILA) ▶ a conversion table: GORILA code numbers to syllables Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 7 / 16
  • 8. From Younger’s syllables to Unicode Unicode GORILA Syllable 10600 AB01 DA 10601 AB02 RO 10602 AB03 PA The Unicode set of characters for Linear A was released in June 2014 The 1,077 documents represented on Younger’s website have been automatically converted ▶ from the syllable transcription (coexisting alongside GORILA code numbers for symbols not included in Linear B) to the full GORILA code numbers transcription ▶ from GORILA code numbers to Unicode Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 8 / 16
  • 9. Segmentation issues Separation is mainly indicated in two ways: ▶ by isolating sign groups with numbers or logograms, thereby implying a separation ▶ dots between sign groups, always used if there are long sign groups strings Example: This is a Linear A line: ▶ is a number (it is assumed to be a number 5) ▶ so and are assumed to be separated sign groups Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 9 / 16
  • 10. Corpus data format XML provides important advantages ▶ metadata on several levels of annotation ▶ elements and entities for unsupported glyphs or symbols EpiDoc is a TEI DTD with customization for Epigraphy ▶ TEI-using community can provide support ▶ a wide range of best-practice examples are available online The ”old” Leiden system annotation task, familiar to epigraphers, is quite similar to the XML TEI EpiDoc annotation process Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 10 / 16
  • 11. Corpus data format example <div lang="minoan" n="text" type="edition" part="N" sample="complete" org="uniform"> <head lang="eng">Edition</head> <cb rend="front" n="HM 1673"/> <ab part="N"> <lb n="1"/> <w part="N"> </w> <space dim="horizontal" extent="1em" unit="character"/> <w part="N"> </w> <lb n="2"/> <w part="N"> </w> <g ref="#n5"/> <w part="N"> </w> <lb n="3"/> <w part="N"> </w> <g ref="#n12"/> <w part="N"> </w> Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 11 / 16
  • 12. Unsupported glyphs handling Inside the EncodingDesc>CharDecl elements, glyph elements can be defined g elements referring to glyphs can be used to represent unsupported symbols <glyph xml:id="n5"> <glyphName> Number 5 </glyphName> <mapping type="standardized"> 5 </mapping> </glyph> <lb n="2"/> <w part="N"> </w> <g ref="#n5"/> <w part="N"> </w> Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 12 / 16
  • 13. Corpus size GORILA: 1,427 Linear A documents John G. Younger’s website: 1,077 Linear A transcriptions (75.5% of the total) Our corpus will contain up to 1,077 Linear A XML TEI EpiDoc documents The Unicode conversions of John G. Younger’s transcriptions have been converted in XML in an automatic way but the tagging has been only partially carried out The main remaing work (still in progress) is manually checking the data with the GORILA volumes Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 13 / 16
  • 14. John Younger ttf Before the release of Unicode 7.0, there was no way to visualize characters in the range 10600–1077F The ’traditional’ Linear A font, LA.ttf, included wrong Unicode positions We developed a new Linear A font, named after John Younger to show our appreciation for his work: John_Younger.ttf (available at http://openfontlibrary.org/en/font/john-younger) Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 14 / 16
  • 15. From Linear A to Minoan culture The Linear A corpus is an important cultural monument, storing information about tradition, knowledge and lifestyle of Minoan people Even without a full understanding of transcriptions some cultural features can be inferred ▶ Economics and commerce: as some ideograms for basic commodities are similar to their Linear B counterparts, we can compare types and amounts of commodities ▶ Religion: there are around thirty libation formulas transcribed on various supports Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 15 / 16
  • 16. Future work and Acknowledgements XSL style sheets in order to create suitable HTML pages A web interface to annotate and enrich the corpus information All the data will be freely available and published at the following URL: http://ling.ied.edu.HK/~gregoire/lineara This work was started when the 1st, 3rd and 4th authors were visitors at NTU, support by the Erasmus MULTI II exchange program. We thank John Younger for permission to use the data from his website. Petrolito, Winterstein, Perono Cacciafoco Linear A Corpus 30 July 2015 16 / 16