SlideShare a Scribd company logo
1 of 43
Towards Open Methods: Using Scientific Workflows in Linguistics Richard Littauer 1
Various tools, such as Kepler, Taverna, Vistrails, and many others have been designed in order to allow for scientific workflows to be created, executed, and shared among scientists and laboratories.  Introduction 2
Scientific workflows are typically used to automate the processing, analysis, and management of scientific data.  Introduction 3
Scientific workflows are typically used to automate the processing, analysis, and management of scientific data.  They provide a way of tracing provenance and methodologies to help foster reproducible science and the publications of executable papers. Introduction 4
By providing front-end visualisationsand adaptations of shell scripts and manual steps, it is easier for scientists to do their work, especially when integrating grids and parallel processing or external databases. Introduction 5
How does this relate to Linguistics?  Workflows in Linguistics 6
How does this relate to Linguistics? Many workflow systems I've been looking at would work in the field of corpus linguistics if we merely had open source databases online to mine.  Workflows in Linguistics 7
How does this relate to Linguistics? Many workflow systems I've been looking at would work in the field of corpus linguistics if we merely had open source databases online to mine.  They, most often, provide a way of cleaning data, and a way of processing repetitive tasks. This is directly applicable to Linguistic work. Workflows in Linguistics 8
How does this relate to Open Linguistics?  Workflows in Linguistics 9
Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data. Act as a central point of reference and support for people interested in open linguistic data. Provide guidance on legal issues surrounding linguistic data to the community. Build an index of indexes of open linguistic data sources and tools and link existing resources. Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical infrastructure. Assemble best-practice guidelines / use cases to create, use and distribute data. Open Linguistics 10
Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data. Act as a central point of reference and support for people interested in open linguistic data. Provide guidance on legal issues surrounding linguistic data to the community. Build an index of indexes of open linguistic data sources and tools and link existing resources. Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical infrastructure. Assemble best-practice guidelines / use cases to create, use and distribute data. Open Linguistics 11
Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data. Act as a central point of reference and support for people interested in open linguistic data. Provide guidance on legal issues surrounding linguistic data to the community. Build an index of indexes of open linguistic data sources and tools and link existing resources. Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical infrastructure. Assemble best-practice guidelines / use cases to create, use and distribute data. Open Linguistics 12
Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data. Act as a central point of reference and support for people interested in open linguistic data. Provide guidance on legal issues surrounding linguistic data to the community. Build an index of indexes of open linguistic data sources and tools and link existing resources. Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical infrastructure. Assemble best-practice guidelines / use cases to create, use and distribute data. Open Linguistics 13
Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data. Act as a central point of reference and support for people interested in open linguistic data. Provide guidance on legal issues surrounding linguistic data to the community. Build an index of indexes of open linguistic data sources and tools and link existing resources. Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical infrastructure. Assemble best-practice guidelines / use cases to create, use and distribute data. Open Linguistics 14
Examples ,[object Object],15
Examples ,[object Object]
This grabs the most recent XKCD comic off the web.
http://www.myexperiment.org/workflows/1370.html16
Examples ,[object Object],17
Examples ,[object Object]
This workflow retrieves relevant documents, based on a query optimized by adding a string to the original query that will rank the search output according to the most recent years.
http://www.myexperiment.org/workflows/117.html18
Hypothetical Example 19
Hypothetical Example 20 Chinese character  from a text
Hypothetical Example 21 [ zhi1], [zi2], [zhi2], [shi2], [ci1] Chinese character  from a text Dictionary Database
Hypothetical Example 22 [ zhi1], [zi2], [zhi2], [shi2], [ci1] Chinese character  from a text Dictionary Database Geographical data from researcher
Hypothetical Example 23 [ zhi1], [zi2], [zhi2], [shi2], [ci1] Chinese character  from a text Dictionary Database Geographical data from researcher
Hypothetical Example 24 [ zhi1], [zi2], [zhi2], [shi2], [ci1] Chinese character  from a text Dictionary Database Geographical data from researcher Character - Proper dialect reading - definition
Use in Linguistics ,[object Object],25
Use in Linguistics ,[object Object]
Hypothetically, it should be possible to use current workflow systems to access and download data26
Use in Linguistics ,[object Object]
Hypothetically, it should be possible to use current workflow systems to access and download data
My hope is to see how feasible this is27
Use in Linguistics 28 Other use:
Use in Linguistics 29 Other use: Shims: data conversion workflows.
Use in Linguistics 30 Other use: Shims: data conversion workflows. As seen in the LexInfo slides, there are varying definitions for parts of speech (from 5 to 181 different types). Workflows could be used to standardise these after accessing the database…
Use in Linguistics 31 How does this help Open Methods?
Use in Linguistics 32 How does this help Open Methods? By keeping track of workflows and workflow systems before they start being popular, we can make sure that users upload and share their workflows to a single repository (like myExperiment.)
Use in Linguistics 33 How does this help Open Methods? By keeping track of workflows and workflow systems before they start being popular, we can make sure that users upload and share their workflows to a single repository (like myExperiment.) This could then be used by other linguists, along with data supplements, to produce replications, and to check methodology.
Use in Linguistics 34 How does this help Open Methods? Also, most workflows are now focusing more on providing provenance solutions.
Use in Linguistics 35 How does this help Open Methods? Also, most workflows are now focusing more on providing provenance solutions. This would make linguistics research more sharable, understandable and repeatable.
Use in Linguistics Work going on this, currently: 36

More Related Content

What's hot

Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureRoss Mounce
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextEric Kansa
 
Web Data Management in the RDF Age
Web Data Management in the RDF AgeWeb Data Management in the RDF Age
Web Data Management in the RDF AgeM. Tamer Özsu
 
LOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked DataLOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked DataFilip Ilievski
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
Contributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataMarcia Zeng
 
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...Marko Rodriguez
 
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Julien PLU
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale Bernadette Hyland-Wood
 
Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Juan Sequeda
 
The Network Data Structure in Computing
The Network Data Structure in ComputingThe Network Data Structure in Computing
The Network Data Structure in ComputingMarko Rodriguez
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web ArchivesMichael Nelson
 
Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDMpetermurrayrust
 
Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century ResearchRoss Mounce
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkPaul Groth
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Maulik Kamdar
 

What's hot (20)

Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open Context
 
Web Data Management in the RDF Age
Web Data Management in the RDF AgeWeb Data Management in the RDF Age
Web Data Management in the RDF Age
 
LOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked DataLOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked Data
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Contributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library Data
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
 
Probabilistic Topic models
Probabilistic Topic modelsProbabilistic Topic models
Probabilistic Topic models
 
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010
 
Dante al tempo del web semantico
Dante al tempo del web semanticoDante al tempo del web semantico
Dante al tempo del web semantico
 
The Network Data Structure in Computing
The Network Data Structure in ComputingThe Network Data Structure in Computing
The Network Data Structure in Computing
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDM
 
Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century Research
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...
 

Viewers also liked

Trends in Use of Scientific Workflows: Insights from a Public Repository and ...
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...Trends in Use of Scientific Workflows: Insights from a Public Repository and ...
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...Richard Littauer
 
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...Richard Littauer
 
Composing Domain-Specific Languages
Composing Domain-Specific LanguagesComposing Domain-Specific Languages
Composing Domain-Specific LanguagesEelco Visser
 
Static name resolution
Static name resolutionStatic name resolution
Static name resolutionEelco Visser
 

Viewers also liked (6)

Trends in Use of Scientific Workflows: Insights from a Public Repository and ...
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...Trends in Use of Scientific Workflows: Insights from a Public Repository and ...
Trends in Use of Scientific Workflows: Insights from a Public Repository and ...
 
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
 
Composing Domain-Specific Languages
Composing Domain-Specific LanguagesComposing Domain-Specific Languages
Composing Domain-Specific Languages
 
Static name resolution
Static name resolutionStatic name resolution
Static name resolution
 
Type analysis
Type analysisType analysis
Type analysis
 
Dynamic Semantics
Dynamic SemanticsDynamic Semantics
Dynamic Semantics
 

Similar to Towards Open Methods: Using Scientific Workflows in Linguistics

Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019heila1
 
12.10.14 Slides, “Roadmap to the Future of SHARE”
12.10.14 Slides, “Roadmap to the Future of SHARE”12.10.14 Slides, “Roadmap to the Future of SHARE”
12.10.14 Slides, “Roadmap to the Future of SHARE”DuraSpace
 
Open Opportunities
Open OpportunitiesOpen Opportunities
Open OpportunitiesRuss White
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Dataopenminted_eu
 
Open science / open research
Open science / open researchOpen science / open research
Open science / open researchheila1
 
Research resources: curating the new eagle-i discovery system
Research resources: curating the new eagle-i discovery systemResearch resources: curating the new eagle-i discovery system
Research resources: curating the new eagle-i discovery systemNicole Vasilevsky
 
AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011Alex Hardisty
 
Making working thesauri
Making working thesauriMaking working thesauri
Making working thesauriliddy
 
Open Access: Prospectors Wanted!
Open Access: Prospectors Wanted!Open Access: Prospectors Wanted!
Open Access: Prospectors Wanted!Amos Kujenga
 
L&P Eric Celeste - SHARE
L&P Eric Celeste -  SHAREL&P Eric Celeste -  SHARE
L&P Eric Celeste - SHARECASRAI
 
Overview of open access progress globally
Overview of open access progress globallyOverview of open access progress globally
Overview of open access progress globallyIryna Kuchma
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphSören Auer
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Robert H. McDonald
 
Open access for researchers, policy makers and research managers - Short ver...
Open access  for researchers, policy makers and research managers - Short ver...Open access  for researchers, policy makers and research managers - Short ver...
Open access for researchers, policy makers and research managers - Short ver...Iryna Kuchma
 
A demonstration of transparent and scalable OpenURL quality metrics for use i...
A demonstration of transparent and scalable OpenURL quality metrics for use i...A demonstration of transparent and scalable OpenURL quality metrics for use i...
A demonstration of transparent and scalable OpenURL quality metrics for use i...alc28
 
Reshaping the world of scholarly communication by Dr. Usha Munshi
Reshaping the world of scholarly communication by Dr. Usha MunshiReshaping the world of scholarly communication by Dr. Usha Munshi
Reshaping the world of scholarly communication by Dr. Usha MunshiAta Rehman
 
OpenAIRE-connect: Services for open science
OpenAIRE-connect: Services for open scienceOpenAIRE-connect: Services for open science
OpenAIRE-connect: Services for open scienceJisc
 

Similar to Towards Open Methods: Using Scientific Workflows in Linguistics (20)

Open sciencerefresher2019
Open sciencerefresher2019Open sciencerefresher2019
Open sciencerefresher2019
 
UKON 2014
UKON 2014UKON 2014
UKON 2014
 
Final Johnson Research Libraries and Computational Research
Final Johnson Research Libraries and Computational ResearchFinal Johnson Research Libraries and Computational Research
Final Johnson Research Libraries and Computational Research
 
12.10.14 Slides, “Roadmap to the Future of SHARE”
12.10.14 Slides, “Roadmap to the Future of SHARE”12.10.14 Slides, “Roadmap to the Future of SHARE”
12.10.14 Slides, “Roadmap to the Future of SHARE”
 
Open Opportunities
Open OpportunitiesOpen Opportunities
Open Opportunities
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
 
Data and science
Data and scienceData and science
Data and science
 
Open science / open research
Open science / open researchOpen science / open research
Open science / open research
 
Research resources: curating the new eagle-i discovery system
Research resources: curating the new eagle-i discovery systemResearch resources: curating the new eagle-i discovery system
Research resources: curating the new eagle-i discovery system
 
AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011
 
Making working thesauri
Making working thesauriMaking working thesauri
Making working thesauri
 
Open Access: Prospectors Wanted!
Open Access: Prospectors Wanted!Open Access: Prospectors Wanted!
Open Access: Prospectors Wanted!
 
L&P Eric Celeste - SHARE
L&P Eric Celeste -  SHAREL&P Eric Celeste -  SHARE
L&P Eric Celeste - SHARE
 
Overview of open access progress globally
Overview of open access progress globallyOverview of open access progress globally
Overview of open access progress globally
 
Towards an Open Research Knowledge Graph
Towards an Open Research Knowledge GraphTowards an Open Research Knowledge Graph
Towards an Open Research Knowledge Graph
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
 
Open access for researchers, policy makers and research managers - Short ver...
Open access  for researchers, policy makers and research managers - Short ver...Open access  for researchers, policy makers and research managers - Short ver...
Open access for researchers, policy makers and research managers - Short ver...
 
A demonstration of transparent and scalable OpenURL quality metrics for use i...
A demonstration of transparent and scalable OpenURL quality metrics for use i...A demonstration of transparent and scalable OpenURL quality metrics for use i...
A demonstration of transparent and scalable OpenURL quality metrics for use i...
 
Reshaping the world of scholarly communication by Dr. Usha Munshi
Reshaping the world of scholarly communication by Dr. Usha MunshiReshaping the world of scholarly communication by Dr. Usha Munshi
Reshaping the world of scholarly communication by Dr. Usha Munshi
 
OpenAIRE-connect: Services for open science
OpenAIRE-connect: Services for open scienceOpenAIRE-connect: Services for open science
OpenAIRE-connect: Services for open science
 

More from Richard Littauer

Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationNamed Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationRichard Littauer
 
Barzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationBarzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationRichard Littauer
 
Building Corpora from Social Media
Building Corpora from Social MediaBuilding Corpora from Social Media
Building Corpora from Social MediaRichard Littauer
 
Visualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat MapsVisualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat MapsRichard Littauer
 
On Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem IsoglossOn Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem IsoglossRichard Littauer
 
The Evolution of Morphological Agreement
The Evolution of Morphological AgreementThe Evolution of Morphological Agreement
The Evolution of Morphological AgreementRichard Littauer
 
Evolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche KuchaEvolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche KuchaRichard Littauer
 
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...Richard Littauer
 
The Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer SimulationThe Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer SimulationRichard Littauer
 
A Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for LanguageA Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for LanguageRichard Littauer
 

More from Richard Littauer (12)

Named Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 PresentationNamed Entity Recognition - ACL 2011 Presentation
Named Entity Recognition - ACL 2011 Presentation
 
Marcu 2000 presentation
Marcu 2000 presentationMarcu 2000 presentation
Marcu 2000 presentation
 
Barzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentationBarzilay & Lapata 2008 presentation
Barzilay & Lapata 2008 presentation
 
Saarland and UdS
Saarland and UdSSaarland and UdS
Saarland and UdS
 
Building Corpora from Social Media
Building Corpora from Social MediaBuilding Corpora from Social Media
Building Corpora from Social Media
 
Visualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat MapsVisualising Typological Relationships: Plotting WALS with Heat Maps
Visualising Typological Relationships: Plotting WALS with Heat Maps
 
On Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem IsoglossOn Tocharian Exceptionality to the centum/satem Isogloss
On Tocharian Exceptionality to the centum/satem Isogloss
 
The Evolution of Morphological Agreement
The Evolution of Morphological AgreementThe Evolution of Morphological Agreement
The Evolution of Morphological Agreement
 
Evolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche KuchaEvolution of Morphological Agreement - Peche Kucha
Evolution of Morphological Agreement - Peche Kucha
 
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
 
The Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer SimulationThe Evolution of Speech Segmentation: A Computer Simulation
The Evolution of Speech Segmentation: A Computer Simulation
 
A Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for LanguageA Reanalysis of Anatomical Changes for Language
A Reanalysis of Anatomical Changes for Language
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Towards Open Methods: Using Scientific Workflows in Linguistics

  • 1. Towards Open Methods: Using Scientific Workflows in Linguistics Richard Littauer 1
  • 2. Various tools, such as Kepler, Taverna, Vistrails, and many others have been designed in order to allow for scientific workflows to be created, executed, and shared among scientists and laboratories. Introduction 2
  • 3. Scientific workflows are typically used to automate the processing, analysis, and management of scientific data. Introduction 3
  • 4. Scientific workflows are typically used to automate the processing, analysis, and management of scientific data. They provide a way of tracing provenance and methodologies to help foster reproducible science and the publications of executable papers. Introduction 4
  • 5. By providing front-end visualisationsand adaptations of shell scripts and manual steps, it is easier for scientists to do their work, especially when integrating grids and parallel processing or external databases. Introduction 5
  • 6. How does this relate to Linguistics? Workflows in Linguistics 6
  • 7. How does this relate to Linguistics? Many workflow systems I've been looking at would work in the field of corpus linguistics if we merely had open source databases online to mine. Workflows in Linguistics 7
  • 8. How does this relate to Linguistics? Many workflow systems I've been looking at would work in the field of corpus linguistics if we merely had open source databases online to mine. They, most often, provide a way of cleaning data, and a way of processing repetitive tasks. This is directly applicable to Linguistic work. Workflows in Linguistics 8
  • 9. How does this relate to Open Linguistics? Workflows in Linguistics 9
  • 10. Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data. Act as a central point of reference and support for people interested in open linguistic data. Provide guidance on legal issues surrounding linguistic data to the community. Build an index of indexes of open linguistic data sources and tools and link existing resources. Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical infrastructure. Assemble best-practice guidelines / use cases to create, use and distribute data. Open Linguistics 10
  • 11. Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data. Act as a central point of reference and support for people interested in open linguistic data. Provide guidance on legal issues surrounding linguistic data to the community. Build an index of indexes of open linguistic data sources and tools and link existing resources. Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical infrastructure. Assemble best-practice guidelines / use cases to create, use and distribute data. Open Linguistics 11
  • 12. Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data. Act as a central point of reference and support for people interested in open linguistic data. Provide guidance on legal issues surrounding linguistic data to the community. Build an index of indexes of open linguistic data sources and tools and link existing resources. Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical infrastructure. Assemble best-practice guidelines / use cases to create, use and distribute data. Open Linguistics 12
  • 13. Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data. Act as a central point of reference and support for people interested in open linguistic data. Provide guidance on legal issues surrounding linguistic data to the community. Build an index of indexes of open linguistic data sources and tools and link existing resources. Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical infrastructure. Assemble best-practice guidelines / use cases to create, use and distribute data. Open Linguistics 13
  • 14. Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data. Act as a central point of reference and support for people interested in open linguistic data. Provide guidance on legal issues surrounding linguistic data to the community. Build an index of indexes of open linguistic data sources and tools and link existing resources. Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical infrastructure. Assemble best-practice guidelines / use cases to create, use and distribute data. Open Linguistics 14
  • 15.
  • 16.
  • 17. This grabs the most recent XKCD comic off the web.
  • 19.
  • 20.
  • 21. This workflow retrieves relevant documents, based on a query optimized by adding a string to the original query that will rank the search output according to the most recent years.
  • 24. Hypothetical Example 20 Chinese character from a text
  • 25. Hypothetical Example 21 [ zhi1], [zi2], [zhi2], [shi2], [ci1] Chinese character from a text Dictionary Database
  • 26. Hypothetical Example 22 [ zhi1], [zi2], [zhi2], [shi2], [ci1] Chinese character from a text Dictionary Database Geographical data from researcher
  • 27. Hypothetical Example 23 [ zhi1], [zi2], [zhi2], [shi2], [ci1] Chinese character from a text Dictionary Database Geographical data from researcher
  • 28. Hypothetical Example 24 [ zhi1], [zi2], [zhi2], [shi2], [ci1] Chinese character from a text Dictionary Database Geographical data from researcher Character - Proper dialect reading - definition
  • 29.
  • 30.
  • 31. Hypothetically, it should be possible to use current workflow systems to access and download data26
  • 32.
  • 33. Hypothetically, it should be possible to use current workflow systems to access and download data
  • 34. My hope is to see how feasible this is27
  • 35. Use in Linguistics 28 Other use:
  • 36. Use in Linguistics 29 Other use: Shims: data conversion workflows.
  • 37. Use in Linguistics 30 Other use: Shims: data conversion workflows. As seen in the LexInfo slides, there are varying definitions for parts of speech (from 5 to 181 different types). Workflows could be used to standardise these after accessing the database…
  • 38. Use in Linguistics 31 How does this help Open Methods?
  • 39. Use in Linguistics 32 How does this help Open Methods? By keeping track of workflows and workflow systems before they start being popular, we can make sure that users upload and share their workflows to a single repository (like myExperiment.)
  • 40. Use in Linguistics 33 How does this help Open Methods? By keeping track of workflows and workflow systems before they start being popular, we can make sure that users upload and share their workflows to a single repository (like myExperiment.) This could then be used by other linguists, along with data supplements, to produce replications, and to check methodology.
  • 41. Use in Linguistics 34 How does this help Open Methods? Also, most workflows are now focusing more on providing provenance solutions.
  • 42. Use in Linguistics 35 How does this help Open Methods? Also, most workflows are now focusing more on providing provenance solutions. This would make linguistics research more sharable, understandable and repeatable.
  • 43. Use in Linguistics Work going on this, currently: 36
  • 44. Use in Linguistics Work going on this, currently: Steiner Lydia, Peter F. Stadler, Michael Cysouw. 2011. A Pipeline for Computational Historical Linguistics. Language Dynamics and Change, p. 89-127. 37
  • 45. More Information Places to look for more information: http://notebooks.dataone.org/workflows 38
  • 46. More Information Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ 39
  • 47. More Information Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/ 40
  • 48. More Information Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/ http://www.myexperiment.org 41
  • 49. More Information Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/ http://www.myexperiment.org http://www.mendeley.com/groups/1235381/workflows-in-linguistics/ 42
  • 50. More Information Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/ http://www.myexperiment.org http://www.mendeley.com/groups/1235381/workflows-in-linguistics/ Thank you. Questions? 43