SlideShare a Scribd company logo
1 of 18
Download to read offline
Patterns in language for POS
disambiguation in a style checker

Mike Unwalla,

Source location: http://www.techscribe.co.uk/ta/patterns-in-language-tcuk-2013.pdf

.
What this presentation is about
A lookup tool is not sufficient
LanguageTool: customizable structure
Finding misused terms: structure of a grammar rule
Part of speech disambiguation:
nouns, verbs, adjectives

Some difficult problems
How good are the rules?
Demonstration
Questions: interrupt and at the end

Slide number 2
A lookup tool is not sufficient

The lookup tool has limits:
• Slow (5 minutes to check a 50-page document)
• No explanation of problem
• No linguistic intelligence.

Slide number 3
LT is open-source proofreading software
LanguageTool: www.languagetool.org.
LT is fully customizable.

LT has rules for style and for grammar.
Can embed LT into other software.

Term checker uses LT: www.simplified-english.co.uk.

Slide number 4
Disambiguation.xml: what a term is
Grammar.xml: the problem with a term
The structure of a grammar rule
<rule id="MS_POS_INPUT" name="Microsoft, noun approved: input">
<pattern>
<token>input<exception postag="IS_NOUN"/></token>
</pattern>
<message>Microsoft. Make sure that '<match no="1"/>' is a noun.</message>
<example type="incorrect"><marker>Input</marker> the data.</example>
<example type="incorrect">If you <marker>input</marker> the data...</example>
<example type="correct">The <marker>input</marker> was correct.</example>

</rule>

Slide number 6
The alternative method is not good
1)

<token>input<exception
postag="IS_NOUN"/></token>

2)

<token postag="IS_VERB">input</token>

Disambiguation is not 100% accurate.
Primary design decision: no false negatives.
Option 2 is not safe.
Alternative: www.acrolinx.com/checking.html.

Slide number 7
Typical types of conflict in MMoS
Approved

Not approved Example

verb

noun

install

noun

verb

input

adjective

verb

checked

Grammarians use language patterns to parse text.

Slide number 8
Simple POS disambiguation for nouns (1)
In 'an + X + was', X is a noun.
Use sets of simple patterns to disambiguate.

General or specific rules:
• Specific: The X was; The X is; Some X takes
• General: MODIFIER + X + VERB
Trade-off:
• Specific: not practical
• General: need fewer rules, but sometimes get
disambiguation errors.

Slide number 9
Simple POS disambiguation for nouns (2)
Microsoft style: Use 'input' as a noun.

 You must input the data.

 The computer had input that was unusual.
 The device had inputs that were unusual.
Rule: HAVE + NOUN SINGULAR NON-COUNT +
THAT|WHICH
Rule: HAVE + NOUN PLURAL + THAT|WHICH
Rules are in groups for nouns, adjectives, verbs.

Slide number 10
Example of postags in LT

Slide number 11
Simple POS disambiguation for verbs
Microsoft style: Use 'install' as a verb:
 If the install was a problem...

 You must install all the...
 You must never install all the...
If X can be used as a verb, then in 'must + X',
X is a verb.
General rule: MODAL AUXILIARY VERB + X.

If a counter-example exists, then 3 options:
• Re-write the rule for better disambiguation.
• Make X an exception to the rule.
• Do nothing.
Slide number 12
Simple POS disambiguation for adjectives
Microsoft style: Use 'checked' as an adjective:

 If you checked the box...

 If the checked command is...
In 'ARTICLE + X + NOUN', X is an adjective.

Slide number 13
A difficult problem: noun or verb?
Noun cluster: plastic bucket, fire engine, oil sample
Sub-pattern:
NOUN SINGULAR + NOUN PLURAL + END OF SENTENCE

Is the last word a noun or a verb?
•
•
•
•
•
•

Use the metal covers.
The device analyses the oil samples.
The alarm covers.
The alarm sounds.
The oil system leaks.
The electrical equipment sparks.

What pattern identifies the nouns?
Slide number 14

Noun
Noun
Noun
Ambiguous
Ambiguous
Ambiguous
A difficult problem: noun or verb?
For the previous sub-pattern, if a word can be both a
noun and a verb AND
• If the verb is transitive only, the word is a noun.
• If the verb is intransitive, the POS is ambiguous.
Transitive = has an object:
The metal covers the hole.
Intransitive = does not have an object:
She laughs.
Some verbs are both transitive and intransitive:
• Transitive: The heat melted the snow.
• Intransitive: The snow melted.
Slide number 15
Some text is always ambiguous
Microsoft style avoids the passive voice.

 The wire was disconnected by the technician.


The wire was disconnected quickly.

?

The wire was disconnected.

 Passive voice?
 Adjective disconnected describes the wire?
(Compare, "The wire was dirty.")
Real-world knowledge:
• The water was drunk. (Passive voice)
• The waiter was drunk. (Adjective)
Slide number 16
How good are the rules?
Context is important:
• Winemaker: The must is contaminated.
• Informal: Warm clothes are a must in cold
weather.
Term checker:

• Approximately 150 sentences a second
• Approximately 6% false positives for STE rules.
Demonstration: Microsoft examples.

Slide number 17
Questions
Questions?

mike@techscribe.co.uk
www.techscribe.co.uk
Slide number 18

More Related Content

What's hot

Careers in translation by patricia phillips batoma
Careers in translation by patricia phillips batomaCareers in translation by patricia phillips batoma
Careers in translation by patricia phillips batomaAnnie Abbott
 
IELTS Listening- Table Completion - Introduction - Useful Tips
IELTS Listening- Table Completion - Introduction - Useful TipsIELTS Listening- Table Completion - Introduction - Useful Tips
IELTS Listening- Table Completion - Introduction - Useful TipsIELTSBackup
 
IELTS Reading - An Overview of IELTS Reading Question Types AC - GT
IELTS Reading - An Overview of IELTS Reading Question Types AC - GTIELTS Reading - An Overview of IELTS Reading Question Types AC - GT
IELTS Reading - An Overview of IELTS Reading Question Types AC - GTIELTSBackup
 
IELTS Listening - MCQs -Intorduction - Useful Tips
IELTS Listening - MCQs -Intorduction - Useful TipsIELTS Listening - MCQs -Intorduction - Useful Tips
IELTS Listening - MCQs -Intorduction - Useful TipsIELTSBackup
 
2. python basic syntax
2. python   basic syntax2. python   basic syntax
2. python basic syntaxSoba Arjun
 
What the math geeks don't want you to know about F#
What the math geeks don't want you to know about F#What the math geeks don't want you to know about F#
What the math geeks don't want you to know about F#Kevin Hazzard
 
IELTS Listening - Form Completion - Introduction - Useful Tips
IELTS Listening - Form Completion - Introduction - Useful TipsIELTS Listening - Form Completion - Introduction - Useful Tips
IELTS Listening - Form Completion - Introduction - Useful TipsIELTSBackup
 
Powerpoint for the thesis codeswtiching
Powerpoint for the thesis codeswtichingPowerpoint for the thesis codeswtiching
Powerpoint for the thesis codeswtiching09326428332
 
Code Switching
Code SwitchingCode Switching
Code Switchingamikay27
 
Langauage model
Langauage modelLangauage model
Langauage modelc sharada
 
Deciability (automata presentation)
Deciability (automata presentation)Deciability (automata presentation)
Deciability (automata presentation)Sagar Kumar
 

What's hot (16)

Careers in translation by patricia phillips batoma
Careers in translation by patricia phillips batomaCareers in translation by patricia phillips batoma
Careers in translation by patricia phillips batoma
 
IELTS Listening- Table Completion - Introduction - Useful Tips
IELTS Listening- Table Completion - Introduction - Useful TipsIELTS Listening- Table Completion - Introduction - Useful Tips
IELTS Listening- Table Completion - Introduction - Useful Tips
 
IELTS Reading - An Overview of IELTS Reading Question Types AC - GT
IELTS Reading - An Overview of IELTS Reading Question Types AC - GTIELTS Reading - An Overview of IELTS Reading Question Types AC - GT
IELTS Reading - An Overview of IELTS Reading Question Types AC - GT
 
Parallelism 1
Parallelism 1Parallelism 1
Parallelism 1
 
Clean code-Coding Rules
Clean code-Coding RulesClean code-Coding Rules
Clean code-Coding Rules
 
IELTS Listening - MCQs -Intorduction - Useful Tips
IELTS Listening - MCQs -Intorduction - Useful TipsIELTS Listening - MCQs -Intorduction - Useful Tips
IELTS Listening - MCQs -Intorduction - Useful Tips
 
2. python basic syntax
2. python   basic syntax2. python   basic syntax
2. python basic syntax
 
What the math geeks don't want you to know about F#
What the math geeks don't want you to know about F#What the math geeks don't want you to know about F#
What the math geeks don't want you to know about F#
 
Fafl notes [2010] (sjbit)
Fafl notes [2010] (sjbit)Fafl notes [2010] (sjbit)
Fafl notes [2010] (sjbit)
 
Arabic spell checkers
Arabic spell  checkersArabic spell  checkers
Arabic spell checkers
 
IELTS Listening - Form Completion - Introduction - Useful Tips
IELTS Listening - Form Completion - Introduction - Useful TipsIELTS Listening - Form Completion - Introduction - Useful Tips
IELTS Listening - Form Completion - Introduction - Useful Tips
 
3 describing syntax
3 describing syntax3 describing syntax
3 describing syntax
 
Powerpoint for the thesis codeswtiching
Powerpoint for the thesis codeswtichingPowerpoint for the thesis codeswtiching
Powerpoint for the thesis codeswtiching
 
Code Switching
Code SwitchingCode Switching
Code Switching
 
Langauage model
Langauage modelLangauage model
Langauage model
 
Deciability (automata presentation)
Deciability (automata presentation)Deciability (automata presentation)
Deciability (automata presentation)
 

Viewers also liked

TCUK 2013 - Martin Block - Creating narrated videos for user assistance
TCUK 2013 - Martin Block - Creating narrated videos for user assistanceTCUK 2013 - Martin Block - Creating narrated videos for user assistance
TCUK 2013 - Martin Block - Creating narrated videos for user assistanceTCUK Conference
 
TCUK 2013 - Managing as a freelance technical communicator
TCUK 2013 - Managing as a freelance technical communicatorTCUK 2013 - Managing as a freelance technical communicator
TCUK 2013 - Managing as a freelance technical communicatorTCUK Conference
 
TCUK 2013 - Andrew Peck - When culture meets content
TCUK 2013 - Andrew Peck - When culture meets contentTCUK 2013 - Andrew Peck - When culture meets content
TCUK 2013 - Andrew Peck - When culture meets contentTCUK Conference
 
TCUK 2012, Alison Peck, CPD - Why Bother?
TCUK 2012, Alison Peck, CPD - Why Bother?TCUK 2012, Alison Peck, CPD - Why Bother?
TCUK 2012, Alison Peck, CPD - Why Bother?TCUK Conference
 
TCUK 2013 - Matthew Ellison - Time saving tools and techniques for capturing ...
TCUK 2013 - Matthew Ellison - Time saving tools and techniques for capturing ...TCUK 2013 - Matthew Ellison - Time saving tools and techniques for capturing ...
TCUK 2013 - Matthew Ellison - Time saving tools and techniques for capturing ...TCUK Conference
 
TCUK 2013 - Adrian Morse - The challenges of remote management
TCUK 2013 - Adrian Morse - The challenges of remote managementTCUK 2013 - Adrian Morse - The challenges of remote management
TCUK 2013 - Adrian Morse - The challenges of remote managementTCUK Conference
 
TCUK 2012, Tony Self, DITA Style Guide
TCUK 2012, Tony Self, DITA Style GuideTCUK 2012, Tony Self, DITA Style Guide
TCUK 2012, Tony Self, DITA Style GuideTCUK Conference
 
TCUK 2013 - Don't manage: lead
TCUK 2013 - Don't manage: leadTCUK 2013 - Don't manage: lead
TCUK 2013 - Don't manage: leadTCUK Conference
 
TCUK 2013 - Roll-Royce - Common Technical Data
TCUK 2013 - Roll-Royce - Common Technical DataTCUK 2013 - Roll-Royce - Common Technical Data
TCUK 2013 - Roll-Royce - Common Technical DataTCUK Conference
 
TCUK 2013 - Using technology to simplify export documentation
TCUK 2013 - Using technology to simplify export documentationTCUK 2013 - Using technology to simplify export documentation
TCUK 2013 - Using technology to simplify export documentationTCUK Conference
 

Viewers also liked (10)

TCUK 2013 - Martin Block - Creating narrated videos for user assistance
TCUK 2013 - Martin Block - Creating narrated videos for user assistanceTCUK 2013 - Martin Block - Creating narrated videos for user assistance
TCUK 2013 - Martin Block - Creating narrated videos for user assistance
 
TCUK 2013 - Managing as a freelance technical communicator
TCUK 2013 - Managing as a freelance technical communicatorTCUK 2013 - Managing as a freelance technical communicator
TCUK 2013 - Managing as a freelance technical communicator
 
TCUK 2013 - Andrew Peck - When culture meets content
TCUK 2013 - Andrew Peck - When culture meets contentTCUK 2013 - Andrew Peck - When culture meets content
TCUK 2013 - Andrew Peck - When culture meets content
 
TCUK 2012, Alison Peck, CPD - Why Bother?
TCUK 2012, Alison Peck, CPD - Why Bother?TCUK 2012, Alison Peck, CPD - Why Bother?
TCUK 2012, Alison Peck, CPD - Why Bother?
 
TCUK 2013 - Matthew Ellison - Time saving tools and techniques for capturing ...
TCUK 2013 - Matthew Ellison - Time saving tools and techniques for capturing ...TCUK 2013 - Matthew Ellison - Time saving tools and techniques for capturing ...
TCUK 2013 - Matthew Ellison - Time saving tools and techniques for capturing ...
 
TCUK 2013 - Adrian Morse - The challenges of remote management
TCUK 2013 - Adrian Morse - The challenges of remote managementTCUK 2013 - Adrian Morse - The challenges of remote management
TCUK 2013 - Adrian Morse - The challenges of remote management
 
TCUK 2012, Tony Self, DITA Style Guide
TCUK 2012, Tony Self, DITA Style GuideTCUK 2012, Tony Self, DITA Style Guide
TCUK 2012, Tony Self, DITA Style Guide
 
TCUK 2013 - Don't manage: lead
TCUK 2013 - Don't manage: leadTCUK 2013 - Don't manage: lead
TCUK 2013 - Don't manage: lead
 
TCUK 2013 - Roll-Royce - Common Technical Data
TCUK 2013 - Roll-Royce - Common Technical DataTCUK 2013 - Roll-Royce - Common Technical Data
TCUK 2013 - Roll-Royce - Common Technical Data
 
TCUK 2013 - Using technology to simplify export documentation
TCUK 2013 - Using technology to simplify export documentationTCUK 2013 - Using technology to simplify export documentation
TCUK 2013 - Using technology to simplify export documentation
 

Similar to TCUK 2013 - Mike Unwalla - Patterns in language for POS disambiguation in a style checker

Test-Driven Development
 Test-Driven Development  Test-Driven Development
Test-Driven Development Amir Assad
 
Prompt-Engineering-Lecture-Elvis learn prompt engineering
Prompt-Engineering-Lecture-Elvis learn prompt engineeringPrompt-Engineering-Lecture-Elvis learn prompt engineering
Prompt-Engineering-Lecture-Elvis learn prompt engineeringSaweraKhadium
 
Principled And Clean Coding
Principled And Clean CodingPrincipled And Clean Coding
Principled And Clean CodingMetin Ogurlu
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPMENGSAYLOEM1
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdfRamya Nellutla
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLPSatyam Saxena
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLPAnuj Gupta
 
TDD — Are you sure you properly test code?
TDD — Are you sure you properly test code?TDD — Are you sure you properly test code?
TDD — Are you sure you properly test code?Dmitriy Nesteryuk
 
Parts of speech tagger
Parts of speech taggerParts of speech tagger
Parts of speech taggersadakpramodh
 
Cracking the coding interview u penn - sept 30 2010
Cracking the coding interview   u penn - sept 30 2010Cracking the coding interview   u penn - sept 30 2010
Cracking the coding interview u penn - sept 30 2010careercup
 
Algorithm Design and Complexity - Course 1&2
Algorithm Design and Complexity - Course 1&2Algorithm Design and Complexity - Course 1&2
Algorithm Design and Complexity - Course 1&2Traian Rebedea
 
PC Troubleshooting - Module 1
PC Troubleshooting - Module 1PC Troubleshooting - Module 1
PC Troubleshooting - Module 1spseale
 
iOS Test-Driven Development
iOS Test-Driven DevelopmentiOS Test-Driven Development
iOS Test-Driven DevelopmentPablo Villar
 
Adopting tdd in the workplace
Adopting tdd in the workplaceAdopting tdd in the workplace
Adopting tdd in the workplaceDonny Wals
 

Similar to TCUK 2013 - Mike Unwalla - Patterns in language for POS disambiguation in a style checker (20)

Init() day2
Init() day2Init() day2
Init() day2
 
Init() Lesson 2
Init() Lesson 2Init() Lesson 2
Init() Lesson 2
 
Test-Driven Development
 Test-Driven Development  Test-Driven Development
Test-Driven Development
 
Prompt-Engineering-Lecture-Elvis learn prompt engineering
Prompt-Engineering-Lecture-Elvis learn prompt engineeringPrompt-Engineering-Lecture-Elvis learn prompt engineering
Prompt-Engineering-Lecture-Elvis learn prompt engineering
 
Principled And Clean Coding
Principled And Clean CodingPrincipled And Clean Coding
Principled And Clean Coding
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdf
 
Agile Practices
Agile PracticesAgile Practices
Agile Practices
 
Intro to TDD
Intro to TDDIntro to TDD
Intro to TDD
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLP
 
TDD — Are you sure you properly test code?
TDD — Are you sure you properly test code?TDD — Are you sure you properly test code?
TDD — Are you sure you properly test code?
 
Parts of speech tagger
Parts of speech taggerParts of speech tagger
Parts of speech tagger
 
Cracking the coding interview u penn - sept 30 2010
Cracking the coding interview   u penn - sept 30 2010Cracking the coding interview   u penn - sept 30 2010
Cracking the coding interview u penn - sept 30 2010
 
Algorithm Design and Complexity - Course 1&2
Algorithm Design and Complexity - Course 1&2Algorithm Design and Complexity - Course 1&2
Algorithm Design and Complexity - Course 1&2
 
PC Troubleshooting - Module 1
PC Troubleshooting - Module 1PC Troubleshooting - Module 1
PC Troubleshooting - Module 1
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
Lecture 24
Lecture 24Lecture 24
Lecture 24
 
iOS Test-Driven Development
iOS Test-Driven DevelopmentiOS Test-Driven Development
iOS Test-Driven Development
 
Adopting tdd in the workplace
Adopting tdd in the workplaceAdopting tdd in the workplace
Adopting tdd in the workplace
 

More from TCUK Conference

TCUK 2013 - Digital accessibility: Strategy, content and delivery
TCUK 2013 - Digital accessibility: Strategy, content and deliveryTCUK 2013 - Digital accessibility: Strategy, content and delivery
TCUK 2013 - Digital accessibility: Strategy, content and deliveryTCUK Conference
 
TCUK 2013 - Technical writing in Energy and Resources: Risks and opportunities
TCUK 2013 - Technical writing in Energy and Resources: Risks and opportunitiesTCUK 2013 - Technical writing in Energy and Resources: Risks and opportunities
TCUK 2013 - Technical writing in Energy and Resources: Risks and opportunitiesTCUK Conference
 
TCUK 2013 - Fiona Parker - Building a team to support proposal production
TCUK 2013 - Fiona Parker - Building a team to support proposal productionTCUK 2013 - Fiona Parker - Building a team to support proposal production
TCUK 2013 - Fiona Parker - Building a team to support proposal productionTCUK Conference
 
TCUK 2012, Rob Scott Norton, Beyond the Comfort Zone: Today, You Will Not Be ...
TCUK 2012, Rob Scott Norton, Beyond the Comfort Zone: Today, You Will Not Be ...TCUK 2012, Rob Scott Norton, Beyond the Comfort Zone: Today, You Will Not Be ...
TCUK 2012, Rob Scott Norton, Beyond the Comfort Zone: Today, You Will Not Be ...TCUK Conference
 
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?TCUK Conference
 
TCUK 2012, Leah Guren, Golden Rules Redux
TCUK 2012, Leah Guren, Golden Rules ReduxTCUK 2012, Leah Guren, Golden Rules Redux
TCUK 2012, Leah Guren, Golden Rules ReduxTCUK Conference
 
TCUK 2012, Ian Ampleford and Peter Jones, Why would we want to talk to customers
TCUK 2012, Ian Ampleford and Peter Jones, Why would we want to talk to customersTCUK 2012, Ian Ampleford and Peter Jones, Why would we want to talk to customers
TCUK 2012, Ian Ampleford and Peter Jones, Why would we want to talk to customersTCUK Conference
 
TCUK 2012, Bryan Lade, How to sell yourself as a Technical Author
TCUK 2012, Bryan Lade, How to sell yourself as a Technical AuthorTCUK 2012, Bryan Lade, How to sell yourself as a Technical Author
TCUK 2012, Bryan Lade, How to sell yourself as a Technical AuthorTCUK Conference
 
TCUK 2012, Leah Guren, A Fish Tale
TCUK 2012, Leah Guren, A Fish TaleTCUK 2012, Leah Guren, A Fish Tale
TCUK 2012, Leah Guren, A Fish TaleTCUK Conference
 

More from TCUK Conference (9)

TCUK 2013 - Digital accessibility: Strategy, content and delivery
TCUK 2013 - Digital accessibility: Strategy, content and deliveryTCUK 2013 - Digital accessibility: Strategy, content and delivery
TCUK 2013 - Digital accessibility: Strategy, content and delivery
 
TCUK 2013 - Technical writing in Energy and Resources: Risks and opportunities
TCUK 2013 - Technical writing in Energy and Resources: Risks and opportunitiesTCUK 2013 - Technical writing in Energy and Resources: Risks and opportunities
TCUK 2013 - Technical writing in Energy and Resources: Risks and opportunities
 
TCUK 2013 - Fiona Parker - Building a team to support proposal production
TCUK 2013 - Fiona Parker - Building a team to support proposal productionTCUK 2013 - Fiona Parker - Building a team to support proposal production
TCUK 2013 - Fiona Parker - Building a team to support proposal production
 
TCUK 2012, Rob Scott Norton, Beyond the Comfort Zone: Today, You Will Not Be ...
TCUK 2012, Rob Scott Norton, Beyond the Comfort Zone: Today, You Will Not Be ...TCUK 2012, Rob Scott Norton, Beyond the Comfort Zone: Today, You Will Not Be ...
TCUK 2012, Rob Scott Norton, Beyond the Comfort Zone: Today, You Will Not Be ...
 
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
 
TCUK 2012, Leah Guren, Golden Rules Redux
TCUK 2012, Leah Guren, Golden Rules ReduxTCUK 2012, Leah Guren, Golden Rules Redux
TCUK 2012, Leah Guren, Golden Rules Redux
 
TCUK 2012, Ian Ampleford and Peter Jones, Why would we want to talk to customers
TCUK 2012, Ian Ampleford and Peter Jones, Why would we want to talk to customersTCUK 2012, Ian Ampleford and Peter Jones, Why would we want to talk to customers
TCUK 2012, Ian Ampleford and Peter Jones, Why would we want to talk to customers
 
TCUK 2012, Bryan Lade, How to sell yourself as a Technical Author
TCUK 2012, Bryan Lade, How to sell yourself as a Technical AuthorTCUK 2012, Bryan Lade, How to sell yourself as a Technical Author
TCUK 2012, Bryan Lade, How to sell yourself as a Technical Author
 
TCUK 2012, Leah Guren, A Fish Tale
TCUK 2012, Leah Guren, A Fish TaleTCUK 2012, Leah Guren, A Fish Tale
TCUK 2012, Leah Guren, A Fish Tale
 

Recently uploaded

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Recently uploaded (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

TCUK 2013 - Mike Unwalla - Patterns in language for POS disambiguation in a style checker

  • 1. Patterns in language for POS disambiguation in a style checker Mike Unwalla, Source location: http://www.techscribe.co.uk/ta/patterns-in-language-tcuk-2013.pdf .
  • 2. What this presentation is about A lookup tool is not sufficient LanguageTool: customizable structure Finding misused terms: structure of a grammar rule Part of speech disambiguation: nouns, verbs, adjectives Some difficult problems How good are the rules? Demonstration Questions: interrupt and at the end Slide number 2
  • 3. A lookup tool is not sufficient The lookup tool has limits: • Slow (5 minutes to check a 50-page document) • No explanation of problem • No linguistic intelligence. Slide number 3
  • 4. LT is open-source proofreading software LanguageTool: www.languagetool.org. LT is fully customizable. LT has rules for style and for grammar. Can embed LT into other software. Term checker uses LT: www.simplified-english.co.uk. Slide number 4
  • 5. Disambiguation.xml: what a term is Grammar.xml: the problem with a term
  • 6. The structure of a grammar rule <rule id="MS_POS_INPUT" name="Microsoft, noun approved: input"> <pattern> <token>input<exception postag="IS_NOUN"/></token> </pattern> <message>Microsoft. Make sure that '<match no="1"/>' is a noun.</message> <example type="incorrect"><marker>Input</marker> the data.</example> <example type="incorrect">If you <marker>input</marker> the data...</example> <example type="correct">The <marker>input</marker> was correct.</example> </rule> Slide number 6
  • 7. The alternative method is not good 1) <token>input<exception postag="IS_NOUN"/></token> 2) <token postag="IS_VERB">input</token> Disambiguation is not 100% accurate. Primary design decision: no false negatives. Option 2 is not safe. Alternative: www.acrolinx.com/checking.html. Slide number 7
  • 8. Typical types of conflict in MMoS Approved Not approved Example verb noun install noun verb input adjective verb checked Grammarians use language patterns to parse text. Slide number 8
  • 9. Simple POS disambiguation for nouns (1) In 'an + X + was', X is a noun. Use sets of simple patterns to disambiguate. General or specific rules: • Specific: The X was; The X is; Some X takes • General: MODIFIER + X + VERB Trade-off: • Specific: not practical • General: need fewer rules, but sometimes get disambiguation errors. Slide number 9
  • 10. Simple POS disambiguation for nouns (2) Microsoft style: Use 'input' as a noun.  You must input the data.  The computer had input that was unusual.  The device had inputs that were unusual. Rule: HAVE + NOUN SINGULAR NON-COUNT + THAT|WHICH Rule: HAVE + NOUN PLURAL + THAT|WHICH Rules are in groups for nouns, adjectives, verbs. Slide number 10
  • 11. Example of postags in LT Slide number 11
  • 12. Simple POS disambiguation for verbs Microsoft style: Use 'install' as a verb:  If the install was a problem...  You must install all the...  You must never install all the... If X can be used as a verb, then in 'must + X', X is a verb. General rule: MODAL AUXILIARY VERB + X. If a counter-example exists, then 3 options: • Re-write the rule for better disambiguation. • Make X an exception to the rule. • Do nothing. Slide number 12
  • 13. Simple POS disambiguation for adjectives Microsoft style: Use 'checked' as an adjective:  If you checked the box...  If the checked command is... In 'ARTICLE + X + NOUN', X is an adjective. Slide number 13
  • 14. A difficult problem: noun or verb? Noun cluster: plastic bucket, fire engine, oil sample Sub-pattern: NOUN SINGULAR + NOUN PLURAL + END OF SENTENCE Is the last word a noun or a verb? • • • • • • Use the metal covers. The device analyses the oil samples. The alarm covers. The alarm sounds. The oil system leaks. The electrical equipment sparks. What pattern identifies the nouns? Slide number 14 Noun Noun Noun Ambiguous Ambiguous Ambiguous
  • 15. A difficult problem: noun or verb? For the previous sub-pattern, if a word can be both a noun and a verb AND • If the verb is transitive only, the word is a noun. • If the verb is intransitive, the POS is ambiguous. Transitive = has an object: The metal covers the hole. Intransitive = does not have an object: She laughs. Some verbs are both transitive and intransitive: • Transitive: The heat melted the snow. • Intransitive: The snow melted. Slide number 15
  • 16. Some text is always ambiguous Microsoft style avoids the passive voice.  The wire was disconnected by the technician.  The wire was disconnected quickly. ? The wire was disconnected.  Passive voice?  Adjective disconnected describes the wire? (Compare, "The wire was dirty.") Real-world knowledge: • The water was drunk. (Passive voice) • The waiter was drunk. (Adjective) Slide number 16
  • 17. How good are the rules? Context is important: • Winemaker: The must is contaminated. • Informal: Warm clothes are a must in cold weather. Term checker: • Approximately 150 sentences a second • Approximately 6% false positives for STE rules. Demonstration: Microsoft examples. Slide number 17