SlideShare ist ein Scribd-Unternehmen logo
1 von 23
John Blake
Japan Advanced Institute of Science and Technology
Personalised statistical writing analysis
Overview
• Introduction
– context, impetus
– focus, process
• Five aspects
– statistical analysis
• Personalised writing analysis
– sample extracts
• Interview survey
• Future direction
2
Context
*Proofreading for faculty
*Writing assistance for PhD candidates
3
70%  50%
science
Impetus
21 email exchange on various points, including:
• “minor scary incident”で統一したいと思います。
• “near miss”“ではなく”minor scary incident”で統一し
たいと思います。
• 提出先に聞きました。near accidentというのが一
般的なようです。これで修正しました。
• “near-miss incident”に変更しました。 ….先生から
指示に従うように提案されました。
• Near miss incident → Near miss incidents に全て修正
しました。
4
From one research article (RA)
minor scary incident  near-miss incident ヒヤリ・
ハット
Focus
Enable research articles meet generic
expectations of:
• Accuracy by being factually correct
• Clarity by avoiding ambiguity
• Formality by adopting appropriate style
5
rhetorical structure, logic, originality,
flawed method, etc.= important, but…
Five aspects of generic integrity
1. Vocabulary fit
2. Readability
3. Word type balance
4. Style and usage
5. Lexicogrammatical
errors
Summary statistics
6
Bhatia, V. K. (1993). Analysing genre: Language use in professional
settings. London: Longman.
Process for each research article
•Create target corpus (TC)
•Analyse RA and TC
•Identify errors in RA
•Compile ratios where poss.
•Create feedback document
7
Five aspects
8
• keyness of RA & TCVocabulary fit
• Readability statistics of RA & TCReadability
• Ratio of GSL, AWL and off-list for
RA & TC
Word type
balance
• Markedness, modality, register
Style and
usage
• Vocabulary & grammatical errors
Lexico-
grammar
1. Vocabulary fit
Scott & Tribble (2006, p.56)
``keyness [is what a text] boils down to``
Hyland (2011) paper-journal fit
9
Hyland, K. (2011). Welcome to the Machine: Thoughts on writing for scholarly publication.
Journal of Second Language Teaching and Research, 1 (1), 58–68.
Scott, M., & Tribble, C. (2006). Textual Patterns: Key Words and Corpus Analysis in Language
Education. Amsterdam, Philadelphia: John Benjamins.
TC firm knowledge market international foreign
performance research variables markets countries
export country relationship business model
RA organizational TMSs coordination DOPPO expertise interactions
mechanisms BLOCK employee leader
team coordinate informal information management
Prepared using AntConc 3.2.4w with Brown Corpus as reference
TC = 243 RAs, c. 2.1 million words RA = 10k words
10
Prepared using Wordle with RA, 10k words
TC firm knowledge market international foreign
performance research variables markets countries
export country relationship business model
RA
2. Readability
11
0
5
10
15
20
25
Gunning fog
index
Flesch Kincaid
grade level
Mean sentence
length
Draft
Target
Bogert, J. (1985). In Defense of the Fog Index. Business Communication Quarterly, 48 (2), 9-12.
Gilquin, G., & Paquot, M. (2008). Too chatty: Learner academic writing and register variation.
English Text Construction, 1 (1), 41-61.
McClure, G. (1987). Readability Formulas: Useful or Useless, Professional Communication, IEEE
Transactions on, 30 (1), 12-15.
Bogert (1985) & McClure (1987) – factors affecting readability
Gilquin & Paquot (2008) - Learner academic writing – rather `chatty`
Research articles tend to have a higher reading difficulty.
3. Word type balance
Levels academic text
1st 1000 73.5%
2nd 1000 4.6%
AWL 8.5%
Other 13.3%
12
First 2k
words
69%
AWL
16%
Off-list
15%
Cobb , T. (2013). Web Vocabprofile. www.lextutor.ca/vp/
Nation, I.S.P. (2001). Learning vocabulary in another language. Cambridge:
Cambridge University Press.
Used in EAP courses at PolyU and CityU in Hong Kong
Nation (2001,p.17)
RA analysed by Web
VP classic v4 (Cobb, 2013)
4. Style and usage errors
13
Marked usage Ratio Suggestion
People provide first 0:9 COCA People first provide
Hyland (1998) – hedging
Robb (2003) – “Google as a quick ‘n’ dirty corpus tool”
Hyland, K. (1998). Hedging in scientific research articles. Amsterdam : John Benjamins
Robb, T. (2003). Google as a quick ‘n’ dirty corpus tool. TESL-EJ, 7(2).
Corpora: IS, KS, MS, BNC , COCA , WAC
5. Lexicogrammatical errors
14
Grammatical or vocabulary errors
Incorrect form Correct form Comment
1 Taking account
differences
Taking account of
differences
preposition
2 this study answers to
two questions
this study answers
two questions
answer to s.b. /
answer s.th.
3 former employee a former employee employee [singular]
4 to participate to this
study
to participate in this
study
collocation
(participate in)
5 emphasis is given on
XX
emphasis is placed
on XX
collocation
(give to / place on)
6 for being responsible to be responsible general vs. specific
purpose
Summary statistics
15
Based on requests for simple to understand evaluation 
Caveat: subjective evaluations disguised as statistics
Personalised writing analysis
16
Selected statistics for subject 1
Readability Yours Target Word type balance Yours
%
Target
%
Gunning fog
index
13.2 13.2 1k words 68.58 74.39
Mean sentence
length
15.49 19.37 2K words 6.69 5.29
Mean number
of clauses
/sentence
1.19 1.54 AWL 16.36 7.67
Lexical density 0.63 0.57 Off-list words 8.36 12.65
Personalised writing analysis
17
Selected statistics for subject 4
Style and usage
Sentence Ratio Comment or correction
1 minor scary incidents 1: 58,700 WAC near-miss incidents
2 falling-accident 0: 19 COCA slips, trips and falls OR
falling objects
3 a medical examination
by interview
1: 525 WAC
0: 1 COCA
a medical consultation
4 According to sex 1: 18 WAC According to the gender
5 175 indoor workers n/a Use One hundred and ….
6 Tomio,T. (1995)
proposes
n/a Omit initials in in-text
citations unless …
Personalised writing analysis
18
Selected statistics for subject 7
Style and usage
Sentence Ratio Comment or correction
1 people provide first their
expertise …
0:9
COCA
people first provide their
expertise …
2 XX also engage into XX 1:9000
COCA
XX also engage in XX
3 The XX structure limits
become
n/a Use limits for boundaries and
limitations for restrictions/
inabilities
4 future studies are able to n/a Use may be to show uncertainty
5 employee simultaneous
participation
0:5
WAC
simultaneous participation of
employees
Interview survey
Interviewer = me
Subjects = 4 faculty, 1 PhD candidate
Nationalities = 3 Japanese, 2 non-Japanese
Number = 5 participants
Interview time = 30 minutes
Location = private office on campus
Dates of interview = Jun-Jul 2013
Semi-structured interviews
e.g. `What revisions did you make to your paper since…..?
`How can I make the feedback more useful?`
19
Survey results
20
• Explanatory notes
– too long
• Key word lists
– couldn`t understand
• Three readability scores
– too complex
• Raw ratios
– too difficult
e.g. 47:211,120 1:4500
• Lexico-grammatical errors
• Word type balance
• Ratios for style and usage
Incremental improvements (made)
1. Create summary statistic scorecard 
2. Use word tag cloud for vocabulary fit 
3. Shorten explanatory notes 
4. Simplify and approximate ratios 
5. Show word type balance graphically with
percentages
6. Select `most useful` readability measure(s) –
mean sentence and word length?
21
Future developments
• Integration of metrics into one-stop online
porthole (thanks to reviewer for idea) for
researchers to submit drafts
• Statistical comparison of draft and published
versions to evaluate success of feedback
22
Any questions, suggestions or
comments?
John Blake
johnb@jaist.ac.jp

Weitere ähnliche Inhalte

Was ist angesagt?

Recent benchmarks for natural language inference
Recent benchmarks for natural language inferenceRecent benchmarks for natural language inference
Recent benchmarks for natural language inferenceShashank Raghuvanshi
 
ISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITION
ISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITIONISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITION
ISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITIONijnlc
 
Named Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid ApproachNamed Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid ApproachWaqas Tariq
 
116079 bec vantage_exam_report_2010
116079 bec vantage_exam_report_2010116079 bec vantage_exam_report_2010
116079 bec vantage_exam_report_2010vasudev.kamath
 
PDFTextProcessing
PDFTextProcessingPDFTextProcessing
PDFTextProcessingJoshua Mathias
 
Fragebogen mit bildern
Fragebogen mit bildernFragebogen mit bildern
Fragebogen mit bildernStefan Gradmann
 
Fragen: visualisierung
Fragen: visualisierungFragen: visualisierung
Fragen: visualisierungStefan Gradmann
 
Dimensions of Media Object Comprehensibility
Dimensions of Media Object ComprehensibilityDimensions of Media Object Comprehensibility
Dimensions of Media Object ComprehensibilityLawrie Hunter
 
Ijartes v1-i1-002
Ijartes v1-i1-002Ijartes v1-i1-002
Ijartes v1-i1-002IJARTES
 
Word sense disambiguation a survey
Word sense disambiguation  a surveyWord sense disambiguation  a survey
Word sense disambiguation a surveyijctcm
 
Wavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script IdentificationWavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script IdentificationCSCJournals
 
Fragen visualisierung svantje
Fragen visualisierung svantjeFragen visualisierung svantje
Fragen visualisierung svantjeStefan Gradmann
 
2104 Talk @SSU
2104 Talk @SSU2104 Talk @SSU
2104 Talk @SSUWarNik Chow
 
Tool selection for argument visualization
Tool selection for argument visualizationTool selection for argument visualization
Tool selection for argument visualizationLawrie Hunter
 

Was ist angesagt? (15)

Recent benchmarks for natural language inference
Recent benchmarks for natural language inferenceRecent benchmarks for natural language inference
Recent benchmarks for natural language inference
 
ISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITION
ISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITIONISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITION
ISSUES AND CHALLENGES IN MARATHI NAMED ENTITY RECOGNITION
 
Named Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid ApproachNamed Entity Recognition System for Hindi Language: A Hybrid Approach
Named Entity Recognition System for Hindi Language: A Hybrid Approach
 
116079 bec vantage_exam_report_2010
116079 bec vantage_exam_report_2010116079 bec vantage_exam_report_2010
116079 bec vantage_exam_report_2010
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
PDFTextProcessing
PDFTextProcessingPDFTextProcessing
PDFTextProcessing
 
Fragebogen mit bildern
Fragebogen mit bildernFragebogen mit bildern
Fragebogen mit bildern
 
Fragen: visualisierung
Fragen: visualisierungFragen: visualisierung
Fragen: visualisierung
 
Dimensions of Media Object Comprehensibility
Dimensions of Media Object ComprehensibilityDimensions of Media Object Comprehensibility
Dimensions of Media Object Comprehensibility
 
Ijartes v1-i1-002
Ijartes v1-i1-002Ijartes v1-i1-002
Ijartes v1-i1-002
 
Word sense disambiguation a survey
Word sense disambiguation  a surveyWord sense disambiguation  a survey
Word sense disambiguation a survey
 
Wavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script IdentificationWavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script Identification
 
Fragen visualisierung svantje
Fragen visualisierung svantjeFragen visualisierung svantje
Fragen visualisierung svantje
 
2104 Talk @SSU
2104 Talk @SSU2104 Talk @SSU
2104 Talk @SSU
 
Tool selection for argument visualization
Tool selection for argument visualizationTool selection for argument visualization
Tool selection for argument visualization
 

Andere mochten auch

Vem representerar miljĂśpartiet
Vem representerar miljĂśpartietVem representerar miljĂśpartiet
Vem representerar miljĂśpartietPierre Ringborg
 
Parker White Lily Residency, Sonepat - Aadi Properties..Pvt
Parker White Lily Residency, Sonepat - Aadi Properties..PvtParker White Lily Residency, Sonepat - Aadi Properties..Pvt
Parker White Lily Residency, Sonepat - Aadi Properties..PvtAadi Property
 
Bangladesh Project - Information Session
Bangladesh Project - Information SessionBangladesh Project - Information Session
Bangladesh Project - Information SessionShaina Azam
 
有爱心的小和尚08 04-12 judy
有爱心的小和尚08 04-12 judy有爱心的小和尚08 04-12 judy
有爱心的小和尚08 04-12 judyjboose
 
Alternative cigarettes for Tobacco Smokers
Alternative cigarettes for Tobacco SmokersAlternative cigarettes for Tobacco Smokers
Alternative cigarettes for Tobacco Smokerspurecigs
 
Pacesetter Exclusive Event - November 6, 2014
Pacesetter Exclusive Event - November 6, 2014 Pacesetter Exclusive Event - November 6, 2014
Pacesetter Exclusive Event - November 6, 2014 caseypereira
 
Kf 161115, bilaga ärende 17 budgetförslag
Kf 161115, bilaga ärende 17 budgetförslagKf 161115, bilaga ärende 17 budgetförslag
Kf 161115, bilaga ärende 17 budgetförslagPierre Ringborg
 
What kind of media institution might distribute your media product and why?
What kind of media institution might distribute your media product and why?What kind of media institution might distribute your media product and why?
What kind of media institution might distribute your media product and why?Emily Erskine
 
Catalog Halal Baby Food -Halal Feeding Baby
Catalog Halal Baby Food  -Halal Feeding BabyCatalog Halal Baby Food  -Halal Feeding Baby
Catalog Halal Baby Food -Halal Feeding BabyVITAMEAL baby Halal
 
Tugas difusi inovasi
Tugas difusi inovasiTugas difusi inovasi
Tugas difusi inovasiali muntaha
 
Episodio 1 en_bĂşsqueda_de_las_tic
Episodio 1 en_bĂşsqueda_de_las_ticEpisodio 1 en_bĂşsqueda_de_las_tic
Episodio 1 en_bĂşsqueda_de_las_ticNataly Falla
 
Prog db-and-web-with-html-php-and-my sql
Prog db-and-web-with-html-php-and-my sqlProg db-and-web-with-html-php-and-my sql
Prog db-and-web-with-html-php-and-my sqlAntara Sharma
 
Express city!! 9350193692, 9910208778
Express city!! 9350193692, 9910208778Express city!! 9350193692, 9910208778
Express city!! 9350193692, 9910208778Aadi Property
 
Valu inlaga 1991-2010
Valu inlaga 1991-2010Valu inlaga 1991-2010
Valu inlaga 1991-2010Pierre Ringborg
 

Andere mochten auch (19)

Vem representerar miljĂśpartiet
Vem representerar miljĂśpartietVem representerar miljĂśpartiet
Vem representerar miljĂśpartiet
 
Parker White Lily Residency, Sonepat - Aadi Properties..Pvt
Parker White Lily Residency, Sonepat - Aadi Properties..PvtParker White Lily Residency, Sonepat - Aadi Properties..Pvt
Parker White Lily Residency, Sonepat - Aadi Properties..Pvt
 
Stage Door Access
Stage Door AccessStage Door Access
Stage Door Access
 
Bangladesh Project - Information Session
Bangladesh Project - Information SessionBangladesh Project - Information Session
Bangladesh Project - Information Session
 
有爱心的小和尚08 04-12 judy
有爱心的小和尚08 04-12 judy有爱心的小和尚08 04-12 judy
有爱心的小和尚08 04-12 judy
 
Alternative cigarettes for Tobacco Smokers
Alternative cigarettes for Tobacco SmokersAlternative cigarettes for Tobacco Smokers
Alternative cigarettes for Tobacco Smokers
 
Pacesetter Exclusive Event - November 6, 2014
Pacesetter Exclusive Event - November 6, 2014 Pacesetter Exclusive Event - November 6, 2014
Pacesetter Exclusive Event - November 6, 2014
 
Kf 161115, bilaga ärende 17 budgetförslag
Kf 161115, bilaga ärende 17 budgetförslagKf 161115, bilaga ärende 17 budgetförslag
Kf 161115, bilaga ärende 17 budgetförslag
 
What kind of media institution might distribute your media product and why?
What kind of media institution might distribute your media product and why?What kind of media institution might distribute your media product and why?
What kind of media institution might distribute your media product and why?
 
Catalog Halal Baby Food -Halal Feeding Baby
Catalog Halal Baby Food  -Halal Feeding BabyCatalog Halal Baby Food  -Halal Feeding Baby
Catalog Halal Baby Food -Halal Feeding Baby
 
About Ekaterinburg, Russia
About Ekaterinburg, RussiaAbout Ekaterinburg, Russia
About Ekaterinburg, Russia
 
Four seasons in Russia
Four seasons in RussiaFour seasons in Russia
Four seasons in Russia
 
Tugas difusi inovasi
Tugas difusi inovasiTugas difusi inovasi
Tugas difusi inovasi
 
Episodio 1 en_bĂşsqueda_de_las_tic
Episodio 1 en_bĂşsqueda_de_las_ticEpisodio 1 en_bĂşsqueda_de_las_tic
Episodio 1 en_bĂşsqueda_de_las_tic
 
Prog db-and-web-with-html-php-and-my sql
Prog db-and-web-with-html-php-and-my sqlProg db-and-web-with-html-php-and-my sql
Prog db-and-web-with-html-php-and-my sql
 
Slideshare darly naranjo
Slideshare darly naranjoSlideshare darly naranjo
Slideshare darly naranjo
 
Express city!! 9350193692, 9910208778
Express city!! 9350193692, 9910208778Express city!! 9350193692, 9910208778
Express city!! 9350193692, 9910208778
 
Mars
MarsMars
Mars
 
Valu inlaga 1991-2010
Valu inlaga 1991-2010Valu inlaga 1991-2010
Valu inlaga 1991-2010
 

Ähnlich wie Personalised statistical writing analysis

Responding to scientific writing using the five-filters approach
Responding to scientific writing  using the five-filters approachResponding to scientific writing  using the five-filters approach
Responding to scientific writing using the five-filters approachjohn6938
 
Common errors in scientific research articles (for JAIST students)
Common errors in scientific research articles (for JAIST students)Common errors in scientific research articles (for JAIST students)
Common errors in scientific research articles (for JAIST students)john6938
 
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...Antonio Toral
 
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Anastasia Zhukova
 
Themes identification techniques in qualitative research
Themes identification techniques in qualitative researchThemes identification techniques in qualitative research
Themes identification techniques in qualitative researchGhulam Qambar
 
Dictogloss replication study: ESSE Brno 2018
Dictogloss replication study: ESSE Brno 2018Dictogloss replication study: ESSE Brno 2018
Dictogloss replication study: ESSE Brno 2018Shona Whyte
 
Concepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineConcepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineGan Keng Hoon
 
Week 11Collection of Data – questionnaire and Instruments & .docx
Week 11Collection of Data – questionnaire and Instruments & .docxWeek 11Collection of Data – questionnaire and Instruments & .docx
Week 11Collection of Data – questionnaire and Instruments & .docxjessiehampson
 
Written Analysis Grading Rubric CRITERIA Outstanding Above.docx
Written Analysis Grading Rubric CRITERIA Outstanding Above.docxWritten Analysis Grading Rubric CRITERIA Outstanding Above.docx
Written Analysis Grading Rubric CRITERIA Outstanding Above.docxjeffevans62972
 
Salford uni pres 2011
Salford uni pres 2011Salford uni pres 2011
Salford uni pres 2011oseamons
 
Salford uni pres 2011
Salford uni pres 2011Salford uni pres 2011
Salford uni pres 2011oseamons
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
A Model Of Research Article Writing Sociolinguistic Competence (RAWSC) Evide...
A Model Of Research Article Writing Sociolinguistic Competence (RAWSC)  Evide...A Model Of Research Article Writing Sociolinguistic Competence (RAWSC)  Evide...
A Model Of Research Article Writing Sociolinguistic Competence (RAWSC) Evide...Arlene Smith
 
Vědecké publikování v anglickém jazyce
Vědecké publikování v anglickém jazyceVědecké publikování v anglickém jazyce
Vědecké publikování v anglickém jazyceÚstřední knihovna FF MU
 
ESSLLI2016 DTS Lecture Day 5-1: Introduction to day 5
ESSLLI2016 DTS Lecture Day 5-1: Introduction to day 5ESSLLI2016 DTS Lecture Day 5-1: Introduction to day 5
ESSLLI2016 DTS Lecture Day 5-1: Introduction to day 5Daisuke BEKKI
 
Topic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental surveyTopic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental surveyICDEcCnferenece
 
Word Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented LanguagesWord Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented Languageshs0041
 

Ähnlich wie Personalised statistical writing analysis (20)

Responding to scientific writing using the five-filters approach
Responding to scientific writing  using the five-filters approachResponding to scientific writing  using the five-filters approach
Responding to scientific writing using the five-filters approach
 
Common errors in scientific research articles (for JAIST students)
Common errors in scientific research articles (for JAIST students)Common errors in scientific research articles (for JAIST students)
Common errors in scientific research articles (for JAIST students)
 
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...
Attaining the Unattainable? Reassessing Claims of Human Parity in Neural Mach...
 
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
 
Themes identification techniques in qualitative research
Themes identification techniques in qualitative researchThemes identification techniques in qualitative research
Themes identification techniques in qualitative research
 
Dictogloss replication study: ESSE Brno 2018
Dictogloss replication study: ESSE Brno 2018Dictogloss replication study: ESSE Brno 2018
Dictogloss replication study: ESSE Brno 2018
 
Concepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search EngineConcepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search Engine
 
Week 11Collection of Data – questionnaire and Instruments & .docx
Week 11Collection of Data – questionnaire and Instruments & .docxWeek 11Collection of Data – questionnaire and Instruments & .docx
Week 11Collection of Data – questionnaire and Instruments & .docx
 
Written Analysis Grading Rubric CRITERIA Outstanding Above.docx
Written Analysis Grading Rubric CRITERIA Outstanding Above.docxWritten Analysis Grading Rubric CRITERIA Outstanding Above.docx
Written Analysis Grading Rubric CRITERIA Outstanding Above.docx
 
APA style
APA styleAPA style
APA style
 
Salford uni pres 2011
Salford uni pres 2011Salford uni pres 2011
Salford uni pres 2011
 
Salford uni pres 2011
Salford uni pres 2011Salford uni pres 2011
Salford uni pres 2011
 
Cue Forum2008
Cue Forum2008Cue Forum2008
Cue Forum2008
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
A Model Of Research Article Writing Sociolinguistic Competence (RAWSC) Evide...
A Model Of Research Article Writing Sociolinguistic Competence (RAWSC)  Evide...A Model Of Research Article Writing Sociolinguistic Competence (RAWSC)  Evide...
A Model Of Research Article Writing Sociolinguistic Competence (RAWSC) Evide...
 
Vědecké publikování v anglickém jazyce
Vědecké publikování v anglickém jazyceVědecké publikování v anglickém jazyce
Vědecké publikování v anglickém jazyce
 
ESSLLI2016 DTS Lecture Day 5-1: Introduction to day 5
ESSLLI2016 DTS Lecture Day 5-1: Introduction to day 5ESSLLI2016 DTS Lecture Day 5-1: Introduction to day 5
ESSLLI2016 DTS Lecture Day 5-1: Introduction to day 5
 
Topic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental surveyTopic modeling of marketing scientific papers: An experimental survey
Topic modeling of marketing scientific papers: An experimental survey
 
Guestion paper
Guestion paperGuestion paper
Guestion paper
 
Word Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented LanguagesWord Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented Languages
 

Mehr von john6938

Social Media Ethics.pptx
Social Media Ethics.pptxSocial Media Ethics.pptx
Social Media Ethics.pptxjohn6938
 
Future of Information Ethics.pptx
Future of Information Ethics.pptxFuture of Information Ethics.pptx
Future of Information Ethics.pptxjohn6938
 
Bioethics.pptx
Bioethics.pptxBioethics.pptx
Bioethics.pptxjohn6938
 
Surveillance and security.pptx
Surveillance and security.pptxSurveillance and security.pptx
Surveillance and security.pptxjohn6938
 
Introduction to Expert Systems.pptx
Introduction to Expert Systems.pptxIntroduction to Expert Systems.pptx
Introduction to Expert Systems.pptxjohn6938
 
Starbuck.pptx
Starbuck.pptxStarbuck.pptx
Starbuck.pptxjohn6938
 
Unit 4 Problem breakdown.pptx
Unit 4 Problem breakdown.pptxUnit 4 Problem breakdown.pptx
Unit 4 Problem breakdown.pptxjohn6938
 
Image_recognition.pptx
Image_recognition.pptxImage_recognition.pptx
Image_recognition.pptxjohn6938
 
Algorithms.pptx
Algorithms.pptxAlgorithms.pptx
Algorithms.pptxjohn6938
 
Artificial_intelligence.pptx
Artificial_intelligence.pptxArtificial_intelligence.pptx
Artificial_intelligence.pptxjohn6938
 
Image_generation.pptx
Image_generation.pptxImage_generation.pptx
Image_generation.pptxjohn6938
 
Computer_Graphics.pptx
Computer_Graphics.pptxComputer_Graphics.pptx
Computer_Graphics.pptxjohn6938
 
Security.pptx
Security.pptxSecurity.pptx
Security.pptxjohn6938
 
Gravitational_wave_detection.pptx
Gravitational_wave_detection.pptxGravitational_wave_detection.pptx
Gravitational_wave_detection.pptxjohn6938
 
Embedded_Systems.pptx
Embedded_Systems.pptxEmbedded_Systems.pptx
Embedded_Systems.pptxjohn6938
 
Software_engineering.pptx
Software_engineering.pptxSoftware_engineering.pptx
Software_engineering.pptxjohn6938
 
Quantum_computers.pptx
Quantum_computers.pptxQuantum_computers.pptx
Quantum_computers.pptxjohn6938
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptxjohn6938
 
Sensors_SLAM.pptx
Sensors_SLAM.pptxSensors_SLAM.pptx
Sensors_SLAM.pptxjohn6938
 
Maths.pptx
Maths.pptxMaths.pptx
Maths.pptxjohn6938
 

Mehr von john6938 (20)

Social Media Ethics.pptx
Social Media Ethics.pptxSocial Media Ethics.pptx
Social Media Ethics.pptx
 
Future of Information Ethics.pptx
Future of Information Ethics.pptxFuture of Information Ethics.pptx
Future of Information Ethics.pptx
 
Bioethics.pptx
Bioethics.pptxBioethics.pptx
Bioethics.pptx
 
Surveillance and security.pptx
Surveillance and security.pptxSurveillance and security.pptx
Surveillance and security.pptx
 
Introduction to Expert Systems.pptx
Introduction to Expert Systems.pptxIntroduction to Expert Systems.pptx
Introduction to Expert Systems.pptx
 
Starbuck.pptx
Starbuck.pptxStarbuck.pptx
Starbuck.pptx
 
Unit 4 Problem breakdown.pptx
Unit 4 Problem breakdown.pptxUnit 4 Problem breakdown.pptx
Unit 4 Problem breakdown.pptx
 
Image_recognition.pptx
Image_recognition.pptxImage_recognition.pptx
Image_recognition.pptx
 
Algorithms.pptx
Algorithms.pptxAlgorithms.pptx
Algorithms.pptx
 
Artificial_intelligence.pptx
Artificial_intelligence.pptxArtificial_intelligence.pptx
Artificial_intelligence.pptx
 
Image_generation.pptx
Image_generation.pptxImage_generation.pptx
Image_generation.pptx
 
Computer_Graphics.pptx
Computer_Graphics.pptxComputer_Graphics.pptx
Computer_Graphics.pptx
 
Security.pptx
Security.pptxSecurity.pptx
Security.pptx
 
Gravitational_wave_detection.pptx
Gravitational_wave_detection.pptxGravitational_wave_detection.pptx
Gravitational_wave_detection.pptx
 
Embedded_Systems.pptx
Embedded_Systems.pptxEmbedded_Systems.pptx
Embedded_Systems.pptx
 
Software_engineering.pptx
Software_engineering.pptxSoftware_engineering.pptx
Software_engineering.pptx
 
Quantum_computers.pptx
Quantum_computers.pptxQuantum_computers.pptx
Quantum_computers.pptx
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
 
Sensors_SLAM.pptx
Sensors_SLAM.pptxSensors_SLAM.pptx
Sensors_SLAM.pptx
 
Maths.pptx
Maths.pptxMaths.pptx
Maths.pptx
 

KĂźrzlich hochgeladen

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 

KĂźrzlich hochgeladen (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

Personalised statistical writing analysis

  • 1. John Blake Japan Advanced Institute of Science and Technology Personalised statistical writing analysis
  • 2. Overview • Introduction – context, impetus – focus, process • Five aspects – statistical analysis • Personalised writing analysis – sample extracts • Interview survey • Future direction 2
  • 3. Context *Proofreading for faculty *Writing assistance for PhD candidates 3 70%  50% science
  • 4. Impetus 21 email exchange on various points, including: • “minor scary incident”で統一したいと思います。 • “near miss”“ではなく”minor scary incident”で統一し たいと思います。 • 提出先に聞きました。near accidentというのが一 般的なようです。これで修正しました。 • “near-miss incident”に変更しました。 ….先生から 指示に従うように提案されました。 • Near miss incident → Near miss incidents に全て修正 しました。 4 From one research article (RA) minor scary incident  near-miss incident ヒヤリ・ ハット
  • 5. Focus Enable research articles meet generic expectations of: • Accuracy by being factually correct • Clarity by avoiding ambiguity • Formality by adopting appropriate style 5 rhetorical structure, logic, originality, flawed method, etc.= important, but…
  • 6. Five aspects of generic integrity 1. Vocabulary fit 2. Readability 3. Word type balance 4. Style and usage 5. Lexicogrammatical errors Summary statistics 6 Bhatia, V. K. (1993). Analysing genre: Language use in professional settings. London: Longman.
  • 7. Process for each research article •Create target corpus (TC) •Analyse RA and TC •Identify errors in RA •Compile ratios where poss. •Create feedback document 7
  • 8. Five aspects 8 • keyness of RA & TCVocabulary fit • Readability statistics of RA & TCReadability • Ratio of GSL, AWL and off-list for RA & TC Word type balance • Markedness, modality, register Style and usage • Vocabulary & grammatical errors Lexico- grammar
  • 9. 1. Vocabulary fit Scott & Tribble (2006, p.56) ``keyness [is what a text] boils down to`` Hyland (2011) paper-journal fit 9 Hyland, K. (2011). Welcome to the Machine: Thoughts on writing for scholarly publication. Journal of Second Language Teaching and Research, 1 (1), 58–68. Scott, M., & Tribble, C. (2006). Textual Patterns: Key Words and Corpus Analysis in Language Education. Amsterdam, Philadelphia: John Benjamins. TC firm knowledge market international foreign performance research variables markets countries export country relationship business model RA organizational TMSs coordination DOPPO expertise interactions mechanisms BLOCK employee leader team coordinate informal information management Prepared using AntConc 3.2.4w with Brown Corpus as reference TC = 243 RAs, c. 2.1 million words RA = 10k words
  • 10. 10 Prepared using Wordle with RA, 10k words TC firm knowledge market international foreign performance research variables markets countries export country relationship business model RA
  • 11. 2. Readability 11 0 5 10 15 20 25 Gunning fog index Flesch Kincaid grade level Mean sentence length Draft Target Bogert, J. (1985). In Defense of the Fog Index. Business Communication Quarterly, 48 (2), 9-12. Gilquin, G., & Paquot, M. (2008). Too chatty: Learner academic writing and register variation. English Text Construction, 1 (1), 41-61. McClure, G. (1987). Readability Formulas: Useful or Useless, Professional Communication, IEEE Transactions on, 30 (1), 12-15. Bogert (1985) & McClure (1987) – factors affecting readability Gilquin & Paquot (2008) - Learner academic writing – rather `chatty` Research articles tend to have a higher reading difficulty.
  • 12. 3. Word type balance Levels academic text 1st 1000 73.5% 2nd 1000 4.6% AWL 8.5% Other 13.3% 12 First 2k words 69% AWL 16% Off-list 15% Cobb , T. (2013). Web Vocabprofile. www.lextutor.ca/vp/ Nation, I.S.P. (2001). Learning vocabulary in another language. Cambridge: Cambridge University Press. Used in EAP courses at PolyU and CityU in Hong Kong Nation (2001,p.17) RA analysed by Web VP classic v4 (Cobb, 2013)
  • 13. 4. Style and usage errors 13 Marked usage Ratio Suggestion People provide first 0:9 COCA People first provide Hyland (1998) – hedging Robb (2003) – “Google as a quick ‘n’ dirty corpus tool” Hyland, K. (1998). Hedging in scientific research articles. Amsterdam : John Benjamins Robb, T. (2003). Google as a quick ‘n’ dirty corpus tool. TESL-EJ, 7(2). Corpora: IS, KS, MS, BNC , COCA , WAC
  • 14. 5. Lexicogrammatical errors 14 Grammatical or vocabulary errors Incorrect form Correct form Comment 1 Taking account differences Taking account of differences preposition 2 this study answers to two questions this study answers two questions answer to s.b. / answer s.th. 3 former employee a former employee employee [singular] 4 to participate to this study to participate in this study collocation (participate in) 5 emphasis is given on XX emphasis is placed on XX collocation (give to / place on) 6 for being responsible to be responsible general vs. specific purpose
  • 15. Summary statistics 15 Based on requests for simple to understand evaluation  Caveat: subjective evaluations disguised as statistics
  • 16. Personalised writing analysis 16 Selected statistics for subject 1 Readability Yours Target Word type balance Yours % Target % Gunning fog index 13.2 13.2 1k words 68.58 74.39 Mean sentence length 15.49 19.37 2K words 6.69 5.29 Mean number of clauses /sentence 1.19 1.54 AWL 16.36 7.67 Lexical density 0.63 0.57 Off-list words 8.36 12.65
  • 17. Personalised writing analysis 17 Selected statistics for subject 4 Style and usage Sentence Ratio Comment or correction 1 minor scary incidents 1: 58,700 WAC near-miss incidents 2 falling-accident 0: 19 COCA slips, trips and falls OR falling objects 3 a medical examination by interview 1: 525 WAC 0: 1 COCA a medical consultation 4 According to sex 1: 18 WAC According to the gender 5 175 indoor workers n/a Use One hundred and …. 6 Tomio,T. (1995) proposes n/a Omit initials in in-text citations unless …
  • 18. Personalised writing analysis 18 Selected statistics for subject 7 Style and usage Sentence Ratio Comment or correction 1 people provide first their expertise … 0:9 COCA people first provide their expertise … 2 XX also engage into XX 1:9000 COCA XX also engage in XX 3 The XX structure limits become n/a Use limits for boundaries and limitations for restrictions/ inabilities 4 future studies are able to n/a Use may be to show uncertainty 5 employee simultaneous participation 0:5 WAC simultaneous participation of employees
  • 19. Interview survey Interviewer = me Subjects = 4 faculty, 1 PhD candidate Nationalities = 3 Japanese, 2 non-Japanese Number = 5 participants Interview time = 30 minutes Location = private office on campus Dates of interview = Jun-Jul 2013 Semi-structured interviews e.g. `What revisions did you make to your paper since…..? `How can I make the feedback more useful?` 19
  • 20. Survey results 20 • Explanatory notes – too long • Key word lists – couldn`t understand • Three readability scores – too complex • Raw ratios – too difficult e.g. 47:211,120 1:4500 • Lexico-grammatical errors • Word type balance • Ratios for style and usage
  • 21. Incremental improvements (made) 1. Create summary statistic scorecard  2. Use word tag cloud for vocabulary fit  3. Shorten explanatory notes  4. Simplify and approximate ratios  5. Show word type balance graphically with percentages 6. Select `most useful` readability measure(s) – mean sentence and word length? 21
  • 22. Future developments • Integration of metrics into one-stop online porthole (thanks to reviewer for idea) for researchers to submit drafts • Statistical comparison of draft and published versions to evaluate success of feedback 22
  • 23. Any questions, suggestions or comments? John Blake johnb@jaist.ac.jp

Hinweis der Redaktion

  1. Overgeneralising, discriminatory language, ambiguous pronouns, abbreviations