The document discusses how scientific workflows can be used in linguistics research to automate processing, analysis, and management of linguistic data. Workflows make research more reproducible by documenting methods. They could allow accessing and downloading open linguistic databases. Hypothetical examples show workflows linking text characters to dictionary definitions. Workflows may help standardize part-of-speech tags. Tracking workflows early can help share methods and ensure reproducibility.
2. Various tools, such as Kepler, Taverna, Vistrails, and many others have been designed in order to allow for scientific workflows to be created, executed, and shared among scientists and laboratories. Introduction 2
3. Scientific workflows are typically used to automate the processing, analysis, and management of scientific data. Introduction 3
4. Scientific workflows are typically used to automate the processing, analysis, and management of scientific data. They provide a way of tracing provenance and methodologies to help foster reproducible science and the publications of executable papers. Introduction 4
5. By providing front-end visualisationsand adaptations of shell scripts and manual steps, it is easier for scientists to do their work, especially when integrating grids and parallel processing or external databases. Introduction 5
6. How does this relate to Linguistics? Workflows in Linguistics 6
7. How does this relate to Linguistics? Many workflow systems I've been looking at would work in the field of corpus linguistics if we merely had open source databases online to mine. Workflows in Linguistics 7
8. How does this relate to Linguistics? Many workflow systems I've been looking at would work in the field of corpus linguistics if we merely had open source databases online to mine. They, most often, provide a way of cleaning data, and a way of processing repetitive tasks. This is directly applicable to Linguistic work. Workflows in Linguistics 8
9. How does this relate to Open Linguistics? Workflows in Linguistics 9
10. Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data. Act as a central point of reference and support for people interested in open linguistic data. Provide guidance on legal issues surrounding linguistic data to the community. Build an index of indexes of open linguistic data sources and tools and link existing resources. Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical infrastructure. Assemble best-practice guidelines / use cases to create, use and distribute data. Open Linguistics 10
11. Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data. Act as a central point of reference and support for people interested in open linguistic data. Provide guidance on legal issues surrounding linguistic data to the community. Build an index of indexes of open linguistic data sources and tools and link existing resources. Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical infrastructure. Assemble best-practice guidelines / use cases to create, use and distribute data. Open Linguistics 11
12. Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data. Act as a central point of reference and support for people interested in open linguistic data. Provide guidance on legal issues surrounding linguistic data to the community. Build an index of indexes of open linguistic data sources and tools and link existing resources. Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical infrastructure. Assemble best-practice guidelines / use cases to create, use and distribute data. Open Linguistics 12
13. Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data. Act as a central point of reference and support for people interested in open linguistic data. Provide guidance on legal issues surrounding linguistic data to the community. Build an index of indexes of open linguistic data sources and tools and link existing resources. Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical infrastructure. Assemble best-practice guidelines / use cases to create, use and distribute data. Open Linguistics 13
14. Promote the idea and definition, as specified in opendefinition.org of open data in linguistics and in relation to language data. Act as a central point of reference and support for people interested in open linguistic data. Provide guidance on legal issues surrounding linguistic data to the community. Build an index of indexes of open linguistic data sources and tools and link existing resources. Facilitate communication between existing groups. Serve as a mediator between providers and users of of technical infrastructure. Assemble best-practice guidelines / use cases to create, use and distribute data. Open Linguistics 14
21. This workflow retrieves relevant documents, based on a query optimized by adding a string to the original query that will rank the search output according to the most recent years.
25. Hypothetical Example 21 [ zhi1], [zi2], [zhi2], [shi2], [ci1] Chinese character from a text Dictionary Database
26. Hypothetical Example 22 [ zhi1], [zi2], [zhi2], [shi2], [ci1] Chinese character from a text Dictionary Database Geographical data from researcher
27. Hypothetical Example 23 [ zhi1], [zi2], [zhi2], [shi2], [ci1] Chinese character from a text Dictionary Database Geographical data from researcher
28. Hypothetical Example 24 [ zhi1], [zi2], [zhi2], [shi2], [ci1] Chinese character from a text Dictionary Database Geographical data from researcher Character - Proper dialect reading - definition
37. Use in Linguistics 30 Other use: Shims: data conversion workflows. As seen in the LexInfo slides, there are varying definitions for parts of speech (from 5 to 181 different types). Workflows could be used to standardise these after accessing the database…
39. Use in Linguistics 32 How does this help Open Methods? By keeping track of workflows and workflow systems before they start being popular, we can make sure that users upload and share their workflows to a single repository (like myExperiment.)
40. Use in Linguistics 33 How does this help Open Methods? By keeping track of workflows and workflow systems before they start being popular, we can make sure that users upload and share their workflows to a single repository (like myExperiment.) This could then be used by other linguists, along with data supplements, to produce replications, and to check methodology.
41. Use in Linguistics 34 How does this help Open Methods? Also, most workflows are now focusing more on providing provenance solutions.
42. Use in Linguistics 35 How does this help Open Methods? Also, most workflows are now focusing more on providing provenance solutions. This would make linguistics research more sharable, understandable and repeatable.
44. Use in Linguistics Work going on this, currently: Steiner Lydia, Peter F. Stadler, Michael Cysouw. 2011. A Pipeline for Computational Historical Linguistics. Language Dynamics and Change, p. 89-127. 37
45. More Information Places to look for more information: http://notebooks.dataone.org/workflows 38
46. More Information Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ 39
47. More Information Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/ 40
48. More Information Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/ http://www.myexperiment.org 41
49. More Information Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/ http://www.myexperiment.org http://www.mendeley.com/groups/1235381/workflows-in-linguistics/ 42
50. More Information Places to look for more information: http://notebooks.dataone.org/workflows https://kepler-project.org/ http://www.taverna.org.uk/ http://www.myexperiment.org http://www.mendeley.com/groups/1235381/workflows-in-linguistics/ Thank you. Questions? 43