SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Corpora, Blogs and Linguistic Variation:  Arguments for Using Structured Web Data in Corpus Development Cornelius Puschmann University of Düsseldorf [email_address] University of Paderborn 8 November 2007
Contents of this presentation ,[object Object],[object Object],[object Object],[object Object],[object Object]
What counts as evidence in linguistics?
Four central questions a researcher must answer ,[object Object],[object Object],[object Object],[object Object],[object Object]
Different schools of thought ...all have different questions and assumptions! CogSci SocioLing Functionalism
What role does corpus data play? ,[object Object],[object Object],[object Object],[object Object]
System, use and the individual
A totalizing view of language production investigation but... whose  system? whose  use? system social function cognitive mechanism genetic disposition use cultural transmission
Margin vs. center: what is shared vs. what varies If  we're interested in variation, corpus analysis is the way to go shared, recurring & patterned language individual & varying language
A (slightly) different way of looking at language ,[object Object],[object Object],[object Object],[object Object]
Using the Web for corpus investigations
The ultimate corpus ,[object Object],[object Object],[object Object],[object Object]
Web as Corpus (WaC) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Broader issues with WaC ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Web for Corpus ,[object Object],[object Object],[object Object],[object Object]
Constructing a corpus using web data ,[object Object],[object Object],[object Object],[object Object]
Things to consider ...not a whole lot of language data!
Things to consider ... better, but we need to take register into account
[object Object]
A blog is... ,[object Object],[object Object],[object Object]
A few facts ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
An example
A research example
Expression of futurity in English:  will  vs.  be going to ,[object Object],[object Object],[object Object],[object Object]
Distribution of  will  vs.  be going to  in three blogs light blue =  will dark blue =  going to
Personal pronoun frequency in the same blogs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Inanimate subjects with  be going to ,[object Object],[object Object],[object Object],[object Object],[object Object]
Observations ,[object Object],[object Object],[object Object],[object Object]
Thanks for listening!
Corpora, Blogs and Linguistic Variation:  Arguments for Using Structured Web Data in Corpus Development Cornelius Puschmann University of Düsseldorf [email_address] University of Paderborn 8 November 2007

Weitere ähnliche Inhalte

Was ist angesagt?

Iepy pydata-amsterdam-2016
Iepy pydata-amsterdam-2016Iepy pydata-amsterdam-2016
Iepy pydata-amsterdam-2016dmoisset
 
What can corpus software do? Routledge chpt 11
 What can corpus software do? Routledge chpt 11 What can corpus software do? Routledge chpt 11
What can corpus software do? Routledge chpt 11RajpootBhatti5
 
Matching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sourcesMatching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sourcesIJwest
 
Information Retrieval-4(inverted index_&_query handling)
Information Retrieval-4(inverted index_&_query handling)Information Retrieval-4(inverted index_&_query handling)
Information Retrieval-4(inverted index_&_query handling)Jeet Das
 
XML Retrieval - A Slot Filling Approach
XML Retrieval - A Slot Filling ApproachXML Retrieval - A Slot Filling Approach
XML Retrieval - A Slot Filling Approach鍾誠 陳鍾誠
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Terminology work and term databases in Estonia
Terminology work and term databases in EstoniaTerminology work and term databases in Estonia
Terminology work and term databases in EstoniaArvi Tavast
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search ComponentMario Flecha
 
Deep Dependency Graph Conversion in English
Deep Dependency Graph Conversion in EnglishDeep Dependency Graph Conversion in English
Deep Dependency Graph Conversion in EnglishJinho Choi
 
Better Cross-Channel Experiences With Metadata - Information Architecture Sum...
Better Cross-Channel Experiences With Metadata - Information Architecture Sum...Better Cross-Channel Experiences With Metadata - Information Architecture Sum...
Better Cross-Channel Experiences With Metadata - Information Architecture Sum...aungstad
 
Methods and experiences in cultural heritage enhancement
Methods and experiences in cultural heritage enhancementMethods and experiences in cultural heritage enhancement
Methods and experiences in cultural heritage enhancementFrancesca Tomasi
 
Text mining Pre-processing
Text mining Pre-processingText mining Pre-processing
Text mining Pre-processingCreditas
 

Was ist angesagt? (20)

Iepy pydata-amsterdam-2016
Iepy pydata-amsterdam-2016Iepy pydata-amsterdam-2016
Iepy pydata-amsterdam-2016
 
Topical_Facets
Topical_FacetsTopical_Facets
Topical_Facets
 
What can corpus software do? Routledge chpt 11
 What can corpus software do? Routledge chpt 11 What can corpus software do? Routledge chpt 11
What can corpus software do? Routledge chpt 11
 
Code4Lib Keynote 2011
Code4Lib Keynote 2011Code4Lib Keynote 2011
Code4Lib Keynote 2011
 
Ir 03
Ir   03Ir   03
Ir 03
 
Barbiers iclave-fr
Barbiers iclave-frBarbiers iclave-fr
Barbiers iclave-fr
 
Matching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sourcesMatching and merging anonymous terms from web sources
Matching and merging anonymous terms from web sources
 
Linked Data and Sevices
Linked Data and SevicesLinked Data and Sevices
Linked Data and Sevices
 
Information Retrieval-4(inverted index_&_query handling)
Information Retrieval-4(inverted index_&_query handling)Information Retrieval-4(inverted index_&_query handling)
Information Retrieval-4(inverted index_&_query handling)
 
Aall denver 2010
Aall denver 2010Aall denver 2010
Aall denver 2010
 
XML Retrieval - A Slot Filling Approach
XML Retrieval - A Slot Filling ApproachXML Retrieval - A Slot Filling Approach
XML Retrieval - A Slot Filling Approach
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Terminology work and term databases in Estonia
Terminology work and term databases in EstoniaTerminology work and term databases in Estonia
Terminology work and term databases in Estonia
 
Semantic Search Component
Semantic Search ComponentSemantic Search Component
Semantic Search Component
 
Deep Dependency Graph Conversion in English
Deep Dependency Graph Conversion in EnglishDeep Dependency Graph Conversion in English
Deep Dependency Graph Conversion in English
 
Better Cross-Channel Experiences With Metadata - Information Architecture Sum...
Better Cross-Channel Experiences With Metadata - Information Architecture Sum...Better Cross-Channel Experiences With Metadata - Information Architecture Sum...
Better Cross-Channel Experiences With Metadata - Information Architecture Sum...
 
02voc
02voc02voc
02voc
 
Methods and experiences in cultural heritage enhancement
Methods and experiences in cultural heritage enhancementMethods and experiences in cultural heritage enhancement
Methods and experiences in cultural heritage enhancement
 
Text mining Pre-processing
Text mining Pre-processingText mining Pre-processing
Text mining Pre-processing
 
Ir 02
Ir   02Ir   02
Ir 02
 

Andere mochten auch

Corpora in Translatıon Studies
Corpora in Translatıon StudiesCorpora in Translatıon Studies
Corpora in Translatıon StudiesMecnun Genç
 
Language And Power In Multilingual Contexts
Language And Power In Multilingual ContextsLanguage And Power In Multilingual Contexts
Language And Power In Multilingual Contextsguestc61d7d4
 
cooper language power
cooper language power cooper language power
cooper language power coopercooper
 
What can a corpus tell us about grammar?
What can a corpus tell us about grammar?What can a corpus tell us about grammar?
What can a corpus tell us about grammar?Pascual Pérez-Paredes
 
Language variation in Sociolinguistics
Language variation in SociolinguisticsLanguage variation in Sociolinguistics
Language variation in SociolinguisticsA Faiz
 
Corpus linguistics the basics
Corpus linguistics the basicsCorpus linguistics the basics
Corpus linguistics the basicsJorge Baptista
 
What can a corpus tell us about registers and genres douglas biber
What can a corpus tell us about registers and genres douglas biberWhat can a corpus tell us about registers and genres douglas biber
What can a corpus tell us about registers and genres douglas biberPascual Pérez-Paredes
 
How to Use Corpora in Language Teaching
How to Use Corpora in Language TeachingHow to Use Corpora in Language Teaching
How to Use Corpora in Language TeachingCALPER
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguisticsRaul Vargas
 
Language, power & discourse
Language, power & discourse Language, power & discourse
Language, power & discourse harpreetk08
 
Language & power part 1
Language & power part 1Language & power part 1
Language & power part 1L Lambe
 

Andere mochten auch (11)

Corpora in Translatıon Studies
Corpora in Translatıon StudiesCorpora in Translatıon Studies
Corpora in Translatıon Studies
 
Language And Power In Multilingual Contexts
Language And Power In Multilingual ContextsLanguage And Power In Multilingual Contexts
Language And Power In Multilingual Contexts
 
cooper language power
cooper language power cooper language power
cooper language power
 
What can a corpus tell us about grammar?
What can a corpus tell us about grammar?What can a corpus tell us about grammar?
What can a corpus tell us about grammar?
 
Language variation in Sociolinguistics
Language variation in SociolinguisticsLanguage variation in Sociolinguistics
Language variation in Sociolinguistics
 
Corpus linguistics the basics
Corpus linguistics the basicsCorpus linguistics the basics
Corpus linguistics the basics
 
What can a corpus tell us about registers and genres douglas biber
What can a corpus tell us about registers and genres douglas biberWhat can a corpus tell us about registers and genres douglas biber
What can a corpus tell us about registers and genres douglas biber
 
How to Use Corpora in Language Teaching
How to Use Corpora in Language TeachingHow to Use Corpora in Language Teaching
How to Use Corpora in Language Teaching
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Language, power & discourse
Language, power & discourse Language, power & discourse
Language, power & discourse
 
Language & power part 1
Language & power part 1Language & power part 1
Language & power part 1
 

Ähnlich wie Corpora, Blogs and Linguistic Variation (Paderborn)

Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1Sumit Sony
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaGiorgia Lodi
 
Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  	Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning   sstose
 
Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Rinke Hoekstra
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESijnlc
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESkevig
 
Finding out Noisy Patterns for Relation Extraction of Bangla Sentences
Finding out Noisy Patterns for Relation Extraction of Bangla SentencesFinding out Noisy Patterns for Relation Extraction of Bangla Sentences
Finding out Noisy Patterns for Relation Extraction of Bangla Sentenceskevig
 
Collaborative Ontology Building Project
Collaborative Ontology Building Project  Collaborative Ontology Building Project
Collaborative Ontology Building Project Jie Bao
 
SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"SMART Infrastructure Facility
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word CloudsMarina Santini
 
Semantic Libraries: the Container, the Content and the Contenders
Semantic Libraries: the Container, the Content and the ContendersSemantic Libraries: the Container, the Content and the Contenders
Semantic Libraries: the Container, the Content and the ContendersStefan Gradmann
 
Introduction to automated text analyses in the Political Sciences
Introduction to automated text analyses in the Political SciencesIntroduction to automated text analyses in the Political Sciences
Introduction to automated text analyses in the Political SciencesChristianRauh2
 
2nd Spinoza workshop: Looking at the Long Tail - introductory slides
2nd Spinoza workshop: Looking at the Long Tail - introductory slides2nd Spinoza workshop: Looking at the Long Tail - introductory slides
2nd Spinoza workshop: Looking at the Long Tail - introductory slidesFilip Ilievski
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdfSoha82
 

Ähnlich wie Corpora, Blogs and Linguistic Variation (Paderborn) (20)

Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1
 
Semantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenzaSemantic Interoperability - grafi della conoscenza
Semantic Interoperability - grafi della conoscenza
 
The basics of ontologies
The basics of ontologiesThe basics of ontologies
The basics of ontologies
 
Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  	Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  
 
Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04Lri Owl And Ontologies 04 04
Lri Owl And Ontologies 04 04
 
lexicographic evidence
lexicographic evidencelexicographic evidence
lexicographic evidence
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
 
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCESFINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
FINDING OUT NOISY PATTERNS FOR RELATION EXTRACTION OF BANGLA SENTENCES
 
Finding out Noisy Patterns for Relation Extraction of Bangla Sentences
Finding out Noisy Patterns for Relation Extraction of Bangla SentencesFinding out Noisy Patterns for Relation Extraction of Bangla Sentences
Finding out Noisy Patterns for Relation Extraction of Bangla Sentences
 
Collaborative Ontology Building Project
Collaborative Ontology Building Project  Collaborative Ontology Building Project
Collaborative Ontology Building Project
 
SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"
 
Oss swot
Oss swotOss swot
Oss swot
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
Semantic Libraries: the Container, the Content and the Contenders
Semantic Libraries: the Container, the Content and the ContendersSemantic Libraries: the Container, the Content and the Contenders
Semantic Libraries: the Container, the Content and the Contenders
 
Open University - TU100 Day school 1
Open University - TU100 Day school 1Open University - TU100 Day school 1
Open University - TU100 Day school 1
 
Introduction to automated text analyses in the Political Sciences
Introduction to automated text analyses in the Political SciencesIntroduction to automated text analyses in the Political Sciences
Introduction to automated text analyses in the Political Sciences
 
2nd Spinoza workshop: Looking at the Long Tail - introductory slides
2nd Spinoza workshop: Looking at the Long Tail - introductory slides2nd Spinoza workshop: Looking at the Long Tail - introductory slides
2nd Spinoza workshop: Looking at the Long Tail - introductory slides
 
NLP_guest_lecture.pdf
NLP_guest_lecture.pdfNLP_guest_lecture.pdf
NLP_guest_lecture.pdf
 

Mehr von Cornelius Puschmann

A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...Cornelius Puschmann
 
Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...
Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...
Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...Cornelius Puschmann
 
Twitter as a data source for (socio)linguistic research
Twitter as a data source for (socio)linguistic researchTwitter as a data source for (socio)linguistic research
Twitter as a data source for (socio)linguistic researchCornelius Puschmann
 
Form and Function of Digital Genres of Scholarly Communication: Results of th...
Form and Function of Digital Genres of Scholarly Communication: Results of th...Form and Function of Digital Genres of Scholarly Communication: Results of th...
Form and Function of Digital Genres of Scholarly Communication: Results of th...Cornelius Puschmann
 
Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...
Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...
Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...Cornelius Puschmann
 
Data Access, Ownership and Control in Social Web Services: Issues for Twitter...
Data Access, Ownership and Control in Social Web Services: Issues for Twitter...Data Access, Ownership and Control in Social Web Services: Issues for Twitter...
Data Access, Ownership and Control in Social Web Services: Issues for Twitter...Cornelius Puschmann
 
Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...
Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...
Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...Cornelius Puschmann
 
Wissenschaftliche Blogs: Nutzungsweisen und Nutzer
Wissenschaftliche Blogs: Nutzungsweisen und NutzerWissenschaftliche Blogs: Nutzungsweisen und Nutzer
Wissenschaftliche Blogs: Nutzungsweisen und NutzerCornelius Puschmann
 
Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...
Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...
Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...Cornelius Puschmann
 
Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...
Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...
Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...Cornelius Puschmann
 
(Academic) Community Management in the Humanities and Social Sciences for Pub...
(Academic) Community Management in the Humanities and Social Sciences for Pub...(Academic) Community Management in the Humanities and Social Sciences for Pub...
(Academic) Community Management in the Humanities and Social Sciences for Pub...Cornelius Puschmann
 
Doing A Small-Scale Diachronic Twitter User Study
Doing A Small-Scale Diachronic Twitter User StudyDoing A Small-Scale Diachronic Twitter User Study
Doing A Small-Scale Diachronic Twitter User StudyCornelius Puschmann
 
Social data: what it is, who owns it, and why you should care
Social data: what it is, who owns it, and why you should careSocial data: what it is, who owns it, and why you should care
Social data: what it is, who owns it, and why you should careCornelius Puschmann
 
Twitter zwischen Nachrichtenkanal und Mikronarrativ
Twitter zwischen Nachrichtenkanal und MikronarrativTwitter zwischen Nachrichtenkanal und Mikronarrativ
Twitter zwischen Nachrichtenkanal und MikronarrativCornelius Puschmann
 
Studying Twitter conversations as (dynamic) graphs: visualization and structu...
Studying Twitter conversations as (dynamic) graphs: visualization and structu...Studying Twitter conversations as (dynamic) graphs: visualization and structu...
Studying Twitter conversations as (dynamic) graphs: visualization and structu...Cornelius Puschmann
 

Mehr von Cornelius Puschmann (20)

Collecting Twitter Data
Collecting Twitter DataCollecting Twitter Data
Collecting Twitter Data
 
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
A Tale of Two Platforms: Emerging communicative patterns in two scientific bl...
 
Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...
Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...
Digitale Methoden in den Sozial- und Geisteswissenschaften: Chancen und Herau...
 
Twitter as a data source for (socio)linguistic research
Twitter as a data source for (socio)linguistic researchTwitter as a data source for (socio)linguistic research
Twitter as a data source for (socio)linguistic research
 
Form and Function of Digital Genres of Scholarly Communication: Results of th...
Form and Function of Digital Genres of Scholarly Communication: Results of th...Form and Function of Digital Genres of Scholarly Communication: Results of th...
Form and Function of Digital Genres of Scholarly Communication: Results of th...
 
Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...
Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...
Vernetzung, Sichtbarkeit, Information: Nutzungsmotive informeller digitaler K...
 
The Pragmatics of Retweeting
The Pragmatics of RetweetingThe Pragmatics of Retweeting
The Pragmatics of Retweeting
 
Data Access, Ownership and Control in Social Web Services: Issues for Twitter...
Data Access, Ownership and Control in Social Web Services: Issues for Twitter...Data Access, Ownership and Control in Social Web Services: Issues for Twitter...
Data Access, Ownership and Control in Social Web Services: Issues for Twitter...
 
Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...
Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...
Knowledge or Credit? The (Un)changing Face of Academic Publishing from the Ph...
 
Wissenschaftliche Blogs: Nutzungsweisen und Nutzer
Wissenschaftliche Blogs: Nutzungsweisen und NutzerWissenschaftliche Blogs: Nutzungsweisen und Nutzer
Wissenschaftliche Blogs: Nutzungsweisen und Nutzer
 
Was ist ein Wissenschaftsblog?
Was ist ein Wissenschaftsblog?Was ist ein Wissenschaftsblog?
Was ist ein Wissenschaftsblog?
 
Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...
Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...
Wissenschaftliche Blogs: Schnittstelle zur Öffentlichkeit oder virtueller Elf...
 
Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...
Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...
Beyond the stars: Interpreting discourse cohesion in Twitter as an indicator ...
 
(Academic) Community Management in the Humanities and Social Sciences for Pub...
(Academic) Community Management in the Humanities and Social Sciences for Pub...(Academic) Community Management in the Humanities and Social Sciences for Pub...
(Academic) Community Management in the Humanities and Social Sciences for Pub...
 
Doing A Small-Scale Diachronic Twitter User Study
Doing A Small-Scale Diachronic Twitter User StudyDoing A Small-Scale Diachronic Twitter User Study
Doing A Small-Scale Diachronic Twitter User Study
 
Social data: what it is, who owns it, and why you should care
Social data: what it is, who owns it, and why you should careSocial data: what it is, who owns it, and why you should care
Social data: what it is, who owns it, and why you should care
 
Twitter zwischen Nachrichtenkanal und Mikronarrativ
Twitter zwischen Nachrichtenkanal und MikronarrativTwitter zwischen Nachrichtenkanal und Mikronarrativ
Twitter zwischen Nachrichtenkanal und Mikronarrativ
 
#www2010 user activity chart
#www2010 user activity chart#www2010 user activity chart
#www2010 user activity chart
 
#s21 user activity chart
#s21 user activity chart#s21 user activity chart
#s21 user activity chart
 
Studying Twitter conversations as (dynamic) graphs: visualization and structu...
Studying Twitter conversations as (dynamic) graphs: visualization and structu...Studying Twitter conversations as (dynamic) graphs: visualization and structu...
Studying Twitter conversations as (dynamic) graphs: visualization and structu...
 

Kürzlich hochgeladen

The AES Investment Code - the go-to counsel for the most well-informed, wise...
The AES Investment Code -  the go-to counsel for the most well-informed, wise...The AES Investment Code -  the go-to counsel for the most well-informed, wise...
The AES Investment Code - the go-to counsel for the most well-informed, wise...AES International
 
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...yordanosyohannes2
 
Classical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithClassical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithAdamYassin2
 
The Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng PilipinasThe Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng PilipinasCherylouCamus
 
government_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfgovernment_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfshaunmashale756
 
SBP-Market-Operations and market managment
SBP-Market-Operations and market managmentSBP-Market-Operations and market managment
SBP-Market-Operations and market managmentfactical
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdfHenry Tapper
 
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办fqiuho152
 
Managing Finances in a Small Business (yes).pdf
Managing Finances  in a Small Business (yes).pdfManaging Finances  in a Small Business (yes).pdf
Managing Finances in a Small Business (yes).pdfmar yame
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一S SDS
 
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证rjrjkk
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHenry Tapper
 
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Sonam Pathan
 
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...Amil Baba Dawood bangali
 
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...Amil baba
 
Tenets of Physiocracy History of Economic
Tenets of Physiocracy History of EconomicTenets of Physiocracy History of Economic
Tenets of Physiocracy History of Economiccinemoviesu
 
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...Amil baba
 
Financial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and DisadvantagesFinancial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and Disadvantagesjayjaymabutot13
 
Stock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdfStock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdfMichael Silva
 

Kürzlich hochgeladen (20)

The AES Investment Code - the go-to counsel for the most well-informed, wise...
The AES Investment Code -  the go-to counsel for the most well-informed, wise...The AES Investment Code -  the go-to counsel for the most well-informed, wise...
The AES Investment Code - the go-to counsel for the most well-informed, wise...
 
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
AfRESFullPaper22018EmpiricalPerformanceofRealEstateInvestmentTrustsandShareho...
 
Classical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam SmithClassical Theory of Macroeconomics by Adam Smith
Classical Theory of Macroeconomics by Adam Smith
 
The Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng PilipinasThe Core Functions of the Bangko Sentral ng Pilipinas
The Core Functions of the Bangko Sentral ng Pilipinas
 
government_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdfgovernment_intervention_in_business_ownership[1].pdf
government_intervention_in_business_ownership[1].pdf
 
SBP-Market-Operations and market managment
SBP-Market-Operations and market managmentSBP-Market-Operations and market managment
SBP-Market-Operations and market managment
 
fca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdffca-bsps-decision-letter-redacted (1).pdf
fca-bsps-decision-letter-redacted (1).pdf
 
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
(办理原版一样)QUT毕业证昆士兰科技大学毕业证学位证留信学历认证成绩单补办
 
Managing Finances in a Small Business (yes).pdf
Managing Finances  in a Small Business (yes).pdfManaging Finances  in a Small Business (yes).pdf
Managing Finances in a Small Business (yes).pdf
 
Monthly Economic Monitoring of Ukraine No 231, April 2024
Monthly Economic Monitoring of Ukraine No 231, April 2024Monthly Economic Monitoring of Ukraine No 231, April 2024
Monthly Economic Monitoring of Ukraine No 231, April 2024
 
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
(办理学位证)加拿大萨省大学毕业证成绩单原版一比一
 
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
原版1:1复刻温哥华岛大学毕业证Vancouver毕业证留信学历认证
 
House of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview documentHouse of Commons ; CDC schemes overview document
House of Commons ; CDC schemes overview document
 
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
Call Girls Near Golden Tulip Essential Hotel, New Delhi 9873777170
 
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
NO1 WorldWide online istikhara for love marriage vashikaran specialist love p...
 
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
NO1 WorldWide Genuine vashikaran specialist Vashikaran baba near Lahore Vashi...
 
Tenets of Physiocracy History of Economic
Tenets of Physiocracy History of EconomicTenets of Physiocracy History of Economic
Tenets of Physiocracy History of Economic
 
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
NO1 WorldWide Love marriage specialist baba ji Amil Baba Kala ilam powerful v...
 
Financial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and DisadvantagesFinancial Leverage Definition, Advantages, and Disadvantages
Financial Leverage Definition, Advantages, and Disadvantages
 
Stock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdfStock Market Brief Deck for "this does not happen often".pdf
Stock Market Brief Deck for "this does not happen often".pdf
 

Corpora, Blogs and Linguistic Variation (Paderborn)

  • 1. Corpora, Blogs and Linguistic Variation: Arguments for Using Structured Web Data in Corpus Development Cornelius Puschmann University of Düsseldorf [email_address] University of Paderborn 8 November 2007
  • 2.
  • 3. What counts as evidence in linguistics?
  • 4.
  • 5. Different schools of thought ...all have different questions and assumptions! CogSci SocioLing Functionalism
  • 6.
  • 7. System, use and the individual
  • 8. A totalizing view of language production investigation but... whose system? whose use? system social function cognitive mechanism genetic disposition use cultural transmission
  • 9. Margin vs. center: what is shared vs. what varies If we're interested in variation, corpus analysis is the way to go shared, recurring & patterned language individual & varying language
  • 10.
  • 11. Using the Web for corpus investigations
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17. Things to consider ...not a whole lot of language data!
  • 18. Things to consider ... better, but we need to take register into account
  • 19.
  • 20.
  • 21.
  • 24.
  • 25. Distribution of will vs. be going to in three blogs light blue = will dark blue = going to
  • 26.
  • 27.
  • 28.
  • 30. Corpora, Blogs and Linguistic Variation: Arguments for Using Structured Web Data in Corpus Development Cornelius Puschmann University of Düsseldorf [email_address] University of Paderborn 8 November 2007