ESS Digital Sociology Conference presentation.
I provide an overview of methodological opportunities, challenges, and solutions to consider for sociologists who are thinking about delving into the world of online ethnography.
ICT Role in 21st Century Education & its Challenges.pptx
Â
Ethnography in the virtual world: Methodological opportunities and challenges
1. Ethnography in the Virtual World:
Methodological Opportunities and Challenges
Gina Marie Longo, ABD
University of Wisconsin-Madison
2. Mapping out the discussion
⢠The research
⢠The methodological opportunities of Virtual Ethnography
⢠The methodological challenges of Virtual Ethnography
⢠Some proposed solutions to methodological challenges
3. Mapping out the discussion
⢠The research
⢠The methodological opportunities of Virtual Ethnography
⢠The methodological challenges of Virtual Ethnography
⢠Some proposed solutions to methodological challenges
4. Mapping out the discussion
⢠The research
⢠The methodological opportunities of Virtual Ethnography
⢠The methodological challenges of Virtual Ethnography
⢠Some proposed solutions to methodological challenges
5. Mapping out the discussion
⢠The research
⢠The methodological opportunities of Virtual Ethnography
⢠The methodological challenges of Virtual Ethnography
⢠Some proposed solutions to methodological challenges
6. Mapping out the discussion
⢠The research
⢠The methodological opportunities of Virtual Ethnography
⢠The methodological challenges of Virtual Ethnography
⢠Some proposed solutions to methodological challenges
7. The Research Project
⢠examines how US citizensâ experiences with the spousal reunification
process affect their enacted citizenship and how they are informed by
normative conceptions of family, gender, race, and class that treat
some relationships as morally suspect.
⢠It also investigates how U.S. citizens negotiate deservingness and
marital genuineness when facing the stateâs moral evaluation, and
how these negotiations vary across gender, race, class, and region.
8. The Research Project
⢠Examines how US citizensâ experiences with the spousal reunification
process affect their enacted citizenship and how they are informed by
normative conceptions of family, gender, race, and class that treat
some relationships as morally suspect.
⢠It also investigates how U.S. citizens negotiate deservingness and
marital genuineness when facing the stateâs moral evaluation, and
how these negotiations vary across gender, race, class, and region.
9. The Research Project
⢠Examines how US citizensâ experiences with the spousal reunification
process affect their enacted citizenship and are informed by
normative conceptions of family, gender, race, and class that treat
some relationships as morally suspect.
⢠It also investigates how U.S. citizens negotiate deservingness and
marital genuineness when facing the stateâs moral evaluation, and
how these negotiations vary across gender, race, class, and region.
10. Immigration Pathways
The Data Source
⢠a large English-language self-
help forum
⢠Created in the early 2000âs
⢠Over 2.2 million conversation
threads
⢠Multiple sub-forums
⢠Over 100,000 members
Collection and Analysis
⢠Ethnographic Immersion
⢠Web scraping
⢠Quantitative Content Analysis of
themes and members
⢠Qualitative Analyses of
Conversation Threads
11. Immigration Pathways
The Data Source
⢠a large English-language self-
help forum
⢠Created in the early 2000âs
⢠Over 2.2 million conversation
threads
⢠Multiple sub-forums
⢠Over 100,000 members
Collection and Analysis
⢠Ethnographic Immersion
⢠Web scraping
⢠Quantitative Content Analysis of
themes and members
⢠Qualitative Analyses of
Conversation Threads
12. Immigration Pathways
The Data Source
⢠a large English-language self-
help forum
⢠Created in the early 2000âs
⢠Over 2.2 million conversation
threads
⢠Multiple sub-forums
⢠Over 100,000 members
Collection and Analysis
⢠Ethnographic Immersion
⢠Web scraping
⢠Quantitative Content Analysis of
themes and members
⢠Qualitative Analyses of
Conversation Threads
13. Immigration Pathways
The Data Source
⢠a large English-language self-
help forum
⢠Created in the early 2000âs
⢠Over 2.2 million conversation
threads
⢠Multiple sub-forums
⢠Over 100,000 members
Collection and Analysis
⢠Ethnographic Immersion
⢠Web scraping
⢠Quantitative Content Analysis of
themes and members
⢠Qualitative Analyses of
Conversation Threads
14. Immigration Pathways
The Data Source
⢠a large English-language self-
help forum
⢠Created in the early 2000âs
⢠Over 2.2 million conversation
threads
⢠Multiple sub-forums
⢠Over 100,000 members
Collection and Analysis
⢠Ethnographic Immersion
⢠Web scraping
⢠Quantitative Content Analysis of
themes and members
⢠Qualitative Analyses of
Conversation Threads
15. Immigration Pathways
The Data Source
⢠a large English-language self-
help forum
⢠Created in the early 2000âs
⢠Over 2.2 million conversation
threads
⢠Multiple sub-forums
⢠Over 100,000 members
Collection and Analysis
⢠Ethnographic Immersion
⢠Web scraping
⢠Quantitative Content Analysis of
themes and members
⢠Qualitative Analyses of
Conversation Threads
16. Immigration Pathways
The Data Source
⢠a large English-language self-
help forum
⢠Created in the early 2000âs
⢠Over 2.2 million conversation
threads
⢠Multiple sub-forums
⢠Over 100,000 members
Collection and Analysis
⢠Ethnographic Immersion
⢠Web scraping
⢠Quantitative Content Analysis of
themes and members
⢠Qualitative Analyses of
Conversation Threads
17. Immigration Pathways
The Data Source
⢠a large English-language self-
help forum
⢠Created in the early 2000âs
⢠Over 2.2 million conversation
threads
⢠Multiple sub-forums
⢠Over 100,000 members
Collection and Analysis
⢠Ethnographic Immersion
⢠Web scraping
⢠Quantitative Content Analysis of
themes and members
⢠Qualitative Analyses of
Conversation Threads
18. Immigration Pathways
The Data Source
⢠a large English-language self-
help forum
⢠Created in the early 2000âs
⢠Over 2.2 million conversation
threads
⢠Multiple sub-forums
⢠Over 100,000 members
Collection and Analysis
⢠Ethnographic Immersion
⢠Web scraping
⢠Quantitative Content Analysis of
themes and members
⢠Qualitative Analyses of
Conversation Threads
19. Immigration Pathways
The Data Source
⢠a large English-language self-
help forum
⢠Created in the early 2000âs
⢠Over 2.2 million conversation
threads
⢠Multiple sub-forums
⢠Over 100,000 members
Collection and Analysis
⢠Ethnographic Immersion
⢠Web scraping
⢠Quantitative Content Analysis of
themes and members
⢠Qualitative Analyses of
Conversation Threads
20. Immigration Pathways
The Data Source
⢠a large English-language self-
help forum
⢠Created in the early 2000âs
⢠Over 2.2 million conversation
threads
⢠Multiple sub-forums
⢠Over 100,000 members
Collection and Analysis
⢠Ethnographic Immersion
⢠Web scraping
⢠Quantitative Content Analysis of
themes and members
⢠Qualitative Analyses of
Conversation Threads
21. Methodological Opportunities
1. Innovative Access
ď accessing hidden or scattered communities
2. Conversation candidness
ďParticipants openly discuss a variety of topics that vary across and within subject matter
ďThere is less self-censoring and more frankness in utterances
ďRemoves the researcherâs influence on participantsâ interactions
3. Spatial and temporal comparison
ďArchived conversations demonstrate how topics and perspective shift over time
ďThe sub-forum divide allows me to analyze how topics and perspectives shift across sub-groups
of people
4. Endless research project opportunities
ďSeemingly endless possibilities for future projects
ď This is particularly useful for carving a long-term research agenda
22. Methodological Opportunities
1. Innovative Access
ď accessing hidden or scattered communities
2. Conversation candidness
ďParticipants openly discuss a variety of topics that vary across and within subject matter
ďThere is less self-censoring and more frankness in utterances
ďRemoves the researcherâs influence on participantsâ interactions
3. Spatial and temporal comparison
ďArchived conversations demonstrate how topics and perspective shift over time
ďThe sub-forum divide allows me to analyze how topics and perspectives shift across sub-groups
of people
4. Endless research project opportunities
ďSeemingly endless possibilities for future projects
ďThis is particularly useful for carving a long-term research agenda
23. Methodological Opportunities
1. Innovative Access
ď accessing hidden or scattered communities
2. Conversation candidness
ďParticipants openly discuss a variety of topics that vary across and within subject matter
ďThere is less self-censoring and more frankness in utterances
ďRemoves the researcherâs influence on participantsâ interactions
3. Spatial and temporal comparison
ďArchived conversations demonstrate how topics and perspective shift over time
ďThe sub-forum divide allows me to analyze how topics and perspectives shift across sub-groups
of people
4. Endless research project opportunities
ďSeemingly endless possibilities for future projects
ďThis is particularly useful for carving a long-term research agenda
24. Methodological Opportunities
1. Innovative Access
ď accessing hidden or scattered communities
2. Conversation candidness
ďParticipants openly discuss a variety of topics that vary across and within subject matter
ďThere is less self-censoring and more frankness in utterances
ďRemoves the researcherâs influence on participantsâ interactions
3. Spatial and temporal comparison
ďArchived conversations demonstrate how topics and perspective shift over time
ďThe sub-forum divide allows me to analyze how topics and perspectives shift across sub-groups
of people
4. Endless research project opportunities
ďSeemingly endless possibilities for future projects
ďThis is particularly useful for carving a long-term research agenda
25. Methodological Opportunities
1. Innovative Access
ď accessing hidden or scattered communities
2. Conversation candidness
ďParticipants openly discuss a variety of topics that vary across and within subject matter
ďThere is less self-censoring and more frankness in utterances
ďRemoves the researcherâs influence on participantsâ interactions
3. Spatial and temporal comparison
ďArchived conversations demonstrate how topics and perspective shift over time
ďThe sub-forum divide allows me to analyze how topics and perspectives shift across sub-groups
of people
4. Endless research project opportunities
ďSeemingly endless possibilities for future projects
ďThis is particularly useful for carving a long-term research agenda
26. Methodological Opportunities
1. Innovative Access
ď accessing hidden or scattered communities
2. Conversation candidness
ďParticipants openly discuss a variety of topics that vary across and within subject matter
ďThere is less self-censoring and more frankness in utterances
ďRemoves the researcherâs influence on participantsâ interactions
3. Spatial and temporal comparison
ďArchived conversations demonstrate how topics and perspective shift over time
ďThe sub-forum divide allows me to analyze how topics and perspectives shift across sub-groups
of people
4. Endless research project opportunities
ďSeemingly endless possibilities for future projects
ďThis is particularly useful for carving a long-term research agenda
27. Methodological Opportunities
1. Innovative Access
ď accessing hidden or scattered communities
2. Conversation candidness
ďParticipants openly discuss a variety of topics that vary across and within subject matter
ďThere is less self-censoring and more frankness in utterances
ďRemoves the researcherâs influence on participantsâ interactions
3. Spatial and temporal comparison
ďArchived conversations demonstrate how topics and perspective shift over time
ďThe sub-forum divide allows me to analyze how topics and perspectives shift across sub-groups
of people
4. Endless research project opportunities
ďSeemingly endless possibilities for future projects
ďThis is particularly useful for carving a long-term research agenda
28. Methodological Opportunities
1. Innovative Access
ď accessing hidden or scattered communities
2. Conversation candidness
ďParticipants openly discuss a variety of topics that vary across and within subject matter
ďThere is less self-censoring and more frankness in utterances
ďRemoves the researcherâs influence on participantsâ interactions
3. Spatial and temporal comparison
ďArchived conversations demonstrate how topics and perspective shift over time
ďThe sub-forum divide allows me to analyze how topics and perspectives shift across sub-groups
of people
4. Endless research project opportunities
ďSeemingly endless possibilities for future projects
ďThis is particularly useful for carving a long-term research agenda
29. Methodological Opportunities
1. Innovative Access
ď accessing hidden or scattered communities
2. Conversation candidness
ďParticipants openly discuss a variety of topics that vary across and within subject matter
ďThere is less self-censoring and more frankness in utterances
ďRemoves the researcherâs influence on participantsâ interactions
3. Spatial and temporal comparison
ďArchived conversations demonstrate how topics and perspective shift over time
ďThe sub-forum divide allows me to analyze how topics and perspectives shift across sub-groups
of people
4. Endless research project opportunities
ďSeemingly endless possibilities for future projects
ďThis is particularly useful for carving a long-term research agenda
30. Methodological Opportunities
1. Innovative Access
ď accessing hidden or scattered communities
2. Conversation candidness
ďParticipants openly discuss a variety of topics that vary across and within subject matter
ďThere is less self-censoring and more frankness in utterances
ďRemoves the researcherâs influence on participantsâ interactions
3. Spatial and temporal comparison
ďArchived conversations demonstrate how topics and perspective shift over time
ďThe sub-forum divide allows me to analyze how topics and perspectives shift across sub-groups
of people
4. Endless research project opportunities
ďSeemingly endless possibilities for future projects
ďThis is particularly useful for carving a long-term research agenda
31. Methodological Opportunities
1. Innovative Access
ď accessing hidden or scattered communities
2. Conversation candidness
ďParticipants openly discuss a variety of topics that vary across and within subject matter
ďThere is less self-censoring and more frankness in utterances
ďRemoves the researcherâs influence on participantsâ interactions
3. Spatial and temporal comparison
ďArchived conversations demonstrate how topics and perspective shift over time
ďThe sub-forum divide allows me to analyze how topics and perspectives shift across sub-groups
of people
4. Endless research project opportunities
ďSeemingly endless possibilities for future projects
ďThis is particularly useful for carving a long-term research agenda
32. Methodological Opportunities
1. Innovative Access
ď accessing hidden or scattered communities
2. Conversation candidness
ďParticipants openly discuss a variety of topics that vary across and within subject matter
ďThere is less self-censoring and more frankness in utterances
ďRemoves the researcherâs influence on participantsâ interactions
3. Spatial and temporal comparison
ďArchived conversations demonstrate how topics and perspective shift over time
ďThe sub-forum divide allows me to analyze how topics and perspectives shift across sub-groups
of people
4. Endless research project opportunities
ďSeemingly endless possibilities for future projects
ďThis is particularly useful for carving a long-term research agenda
33. Methodological Challenges (and Suggestions)
1. Data management (collecting, cleaning, and analyzing)
Suggestions: Python Computer language has many excellent packages for collecting, cleaning, and
analyzing data. It also handles large amounts of data well.
2. Programming and sampling logistics
Suggestions: For programming, I practice strategic learning using google, codeacademy, YouTube, etc.
For sampling, I use a mix of qualitative and quantitative content analysis in conjunction with pythonâs
sampling module
3. Copyright and permissions issues
Suggestions: First, know your site (and read the Privacy and Terms of Service statements very carefully).
Next, know the law and exactly what you are asking for. Finally, learn the Fair Use Act and where to get a
bit of inexpensive legal advice.
4. IRB issues and ethical considerations
Suggestions: Consider your siteâs access, and what kind of interaction you plan on having with posters. The
more open the site and less interaction with subjects you have, the more IRB will consider the data text
rather than human subjectsâ research. Pseudonyms for site and users are a personal consideration, if IRB
finds you exempt.
34. Methodological Challenges (and Suggestions)
1. Data management (collecting, cleaning, and analyzing)
Suggestions: Python Computer language has many excellent packages for collecting, cleaning, and
analyzing data. It also handles large amounts of data well.
2. Programming and sampling logistics
Suggestions: For programming, I practice strategic learning using google, codeacademy, YouTube, etc.
For sampling, I use a mix of qualitative and quantitative content analysis in conjunction with pythonâs
sampling module
3. Copyright and permissions issues
Suggestions: First, know your site (and read the Privacy and Terms of Service statements very carefully).
Next, know the law and exactly what you are asking for. Finally, learn the Fair Use Act and where to get a
bit of inexpensive legal advice.
4. IRB issues and ethical considerations
Suggestions: Consider your siteâs access, and what kind of interaction you plan on having with posters. The
more open the site and less interaction with subjects you have, the more IRB will consider the data text
rather than human subjectsâ research. Pseudonyms for site and users are a personal consideration, if IRB
finds you exempt.
35. Methodological Challenges (and Suggestions)
1. Data management (collecting, cleaning, and analyzing)
Suggestions: Python Computer language has many excellent packages for collecting, cleaning, and
analyzing data. It also handles large amounts of data well.
2. Programming and sampling logistics
Suggestions: For programming, I practice strategic learning using google, codeacademy, YouTube, etc.
For sampling, I use a mix of qualitative and quantitative content analysis in conjunction with pythonâs
sampling module
3. Copyright and permissions issues
Suggestions: First, know your site (and read the Privacy and Terms of Service statements very carefully).
Next, know the law and exactly what you are asking for. Finally, learn the Fair Use Act and where to get a
bit of inexpensive legal advice.
4. IRB issues and ethical considerations
Suggestions: Consider your siteâs access, and what kind of interaction you plan on having with posters. The
more open the site and less interaction with subjects you have, the more IRB will consider the data text
rather than human subjectsâ research. Pseudonyms for site and users are a personal consideration, if IRB
finds you exempt.
36. Methodological Challenges (and Suggestions)
1. Data management (collecting, cleaning, and analyzing)
Suggestions: Python Computer language has many excellent packages for collecting, cleaning, and
analyzing data. It also handles large amounts of data well.
2. Programming and sampling logistics
Suggestions: For programming, I practice strategic learning using google, codeacademy, YouTube, etc.
For sampling, I use a mix of qualitative and quantitative content analysis in conjunction with pythonâs
sampling module
3. Copyright and permissions issues
Suggestions: First, know your site (and read the Privacy and Terms of Service statements very carefully).
Next, know the law and exactly what you are asking for. Finally, learn the Fair Use Act and where to get a
bit of inexpensive legal advice.
4. IRB issues and ethical considerations
Suggestions: Consider your siteâs access, and what kind of interaction you plan on having with posters. The
more open the site and less interaction with subjects you have, the more IRB will consider the data text
rather than human subjectsâ research. Pseudonyms for site and users are a personal consideration, if IRB
finds you exempt.
37. Methodological Challenges (and Suggestions)
1. Data management (collecting, cleaning, and analyzing)
Suggestions: Python Computer language has many excellent packages for collecting, cleaning, and
analyzing data. It also handles large amounts of data well.
2. Programming and sampling logistics
Suggestions: For programming, I practice strategic learning using google, codeacademy, YouTube, etc.
For sampling, I use a mix of qualitative and quantitative content analysis in conjunction with pythonâs
sampling module
3. Copyright and permissions issues
Suggestions: First, know your site (and read the Privacy and Terms of Service statements very carefully).
Next, know the law and exactly what you are asking for. Finally, learn the Fair Use Act and where to get a
bit of inexpensive legal advice.
4. IRB issues and ethical considerations
Suggestions: Consider your siteâs access, and what kind of interaction you plan on having with posters. The
more open the site and less interaction with subjects you have, the more IRB will consider the data text
rather than human subjectsâ research. Pseudonyms for site and users are a personal consideration, if IRB
finds you exempt.
38. Methodological Challenges (and Suggestions)
1. Data management (collecting, cleaning, and analyzing)
Suggestions: Python Computer language has many excellent packages for collecting, cleaning, and
analyzing data. It also handles large amounts of data well.
2. Programming and sampling logistics
Suggestions: For programming, I practice strategic learning using google, codeacademy, YouTube, etc.
For sampling, I use a mix of qualitative and quantitative content analysis in conjunction with pythonâs
sampling module
3. Copyright and permissions issues
Suggestions: First, know your site (and read the Privacy and Terms of Service statements very carefully).
Next, know the law and exactly what you are asking for. Finally, learn the Fair Use Act and where to get a
bit of inexpensive legal advice.
4. IRB issues and ethical considerations
Suggestions: Consider your siteâs access, and what kind of interaction you plan on having with posters. The
more open the site and less interaction with subjects you have, the more IRB will consider the data text
rather than human subjectsâ research. Pseudonyms for site and users are a personal consideration, if IRB
finds you exempt.
39. Methodological Challenges (and Suggestions)
1. Data management (collecting, cleaning, and analyzing)
Suggestions: Python Computer language has many excellent packages for collecting, cleaning, and
analyzing data. It also handles large amounts of data well.
2. Programming and sampling logistics
Suggestions: For programming, I practice strategic learning using google, codeacademy, YouTube, etc.
For sampling, I use a mix of qualitative and quantitative content analysis in conjunction with pythonâs
sampling module
3. Copyright and permissions issues
Suggestions: First, know your site (and read the Privacy and Terms of Service statements very carefully).
Next, know the law and exactly what you are asking for. Finally, learn the Fair Use Act and where to get a
bit of inexpensive legal advice.
4. IRB issues and ethical considerations
Suggestions: Consider your siteâs access, and what kind of interaction you plan on having with posters. The
more open the site and less interaction with subjects you have, the more IRB will consider the data text
rather than human subjectsâ research. Pseudonyms for site and users are a personal consideration, if IRB
finds you exempt.
40. Methodological Challenges (and Suggestions)
1. Data management (collecting, cleaning, and analyzing)
Suggestions: Python Computer language has many excellent packages for collecting, cleaning, and
analyzing data. It also handles large amounts of data well.
2. Programming and sampling logistics
Suggestions: For programming, I practice strategic learning using google, codeacademy, YouTube, etc.
For sampling, I use a mix of qualitative and quantitative content analysis in conjunction with pythonâs
sampling module
3. Copyright and permissions issues
Suggestions: First, know your site (and read the Privacy and Terms of Service statements very carefully).
Next, know the law and exactly what you are asking for. Finally, learn the Fair Use Act and where to get a
bit of inexpensive legal advice.
4. IRB issues and ethical considerations
Suggestions: Consider your siteâs access, and what kind of interaction you plan on having with posters. The
more open the site and less interaction with subjects you have, the more IRB will consider the data text
rather than human subjectsâ research. Pseudonyms for site and users are a personal consideration, if IRB
finds you exempt.
41. Methodological Challenges (and Suggestions)
1. Data management (collecting, cleaning, and analyzing)
Suggestions: Python Computer language has many excellent packages for collecting, cleaning, and
analyzing data. It also handles large amounts of data well.
2. Programming and sampling logistics
Suggestions: For programming, I practice strategic learning using google, codeacademy, YouTube, etc.
For sampling, I use a mix of qualitative and quantitative content analysis in conjunction with pythonâs
sampling module
3. Copyright and permissions issues
Suggestions: First, know your site (and read the Privacy and Terms of Service statements very carefully).
Next, know the law and exactly what you are asking for. Finally, learn the Fair Use Act and where to get a
bit of inexpensive legal advice.
4. IRB issues and ethical considerations
Suggestions: Consider your siteâs access, and what kind of interaction you plan on having with posters. The
more open the site and less interaction with subjects you have, the more IRB will consider the data text
rather than human subjectsâ research. Pseudonyms for site and users are a personal consideration, if IRB
finds you exempt.
Hello everyone.
Today, I am going to talk about some of the methodological opportunities and challenges that I have encounter while conducting an online ethnography.
I embarked on this project by accident, really. The formation of this project begin in 2010 when I married a Moroccan man who was living in Morocco.
As a U.S. citizen, I petitioned U.S. immigration to allow my husband to join me here in the United States.
It quickly became clear that it was not an easy process.
The bureaucracy was complex, and demonstrating that we had what Homeland Security calls a âvalid and subsistingâ marriage was not that simple.
So, I turned to the internet for some direction, and happened onto a number of different websites where U.S. citizens exchanged advice about this very issue.
As I soon discovered, the process was extremely difficult for some and very easy for others based on a number of intersectional factors.
From this experience, I began an online ethnography of one such large advice-exchanging site, so I could better understand what I was seeing. However, in order to do this, I needed to learn the methodological tools and programming skills as I went along.
This presentation discusses some of the methodological experiences and insights that I have encountered along the way, as a sociologist who has recently begun working in the digital world.
So, letâs start with a road map of what I am going to focus on today.
First, I will talk a bit about the actual research project, including the data source and some of the methodological tools I have used for data collection and analysis.
Then, I will discuss the methodological opportunities of Virtual Ethnography.
Then I will move onto challenges that I experiencedâŚ
And offer some solutions for overcoming those challenges.
So, with that said, letâs talk a bit more about the projectâŚ.
First, this project examines how U.S. citizensâ experiences with spousal reunification affect their enacted citizenship, and how those processes are shaped by normative conceptions of family and intersections of race, gender, and class that treat some relationships as morally suspect and others as morally upright.
By morally suspect, I mean whether or not U.S. Immigration suspects that a union is potentially a sham for immigration papers rather than genuinely based on love.
It also examines how U.S. citizens attempt to negotiate that their spouses deserve a visa, and demonstrate that their marriages are ârealâ as the state investigates them, and how these negotiations are shaped by intersectional identities such as race, gender, class, and the immigrant spouseâs region of origin.
Now, the dataâŚ.I am using a website that I call Immigration PathwaysâŚ. First I will tell you a bit about the data sources and then a bit about what I have been doing with it.
Immigration Pathways is a large English-language self-help where people come to share information about the spousal reunification process (and other matters related to immigration such as adjustment of status, the citizenship process, and adjusting a family member to life in America).
The website was put together in early 2000âs, and over the years has added several different sub-forums, wikis, and self-reporting statistical information. The web administer and most of the moderators have been there since the beginning.
As of 2014, there was 2.2 million conversation threads. This did not include response posts to those threads. Responses to thread can range from zero responses to over 800 responses.
There are multiple sub-forums within the site. They include country specific, visa specific, and topic specific subforums, and moderators are very careful to ensure that each forum adheres to its specific issue.
There are over 100,000 members, who are mainly U.S. citizens and foreign nationals married to U.S. citizens that belong to this site.
This number ranges from those who created an account and posted once to members who have posted thousand and thousands of times.
I have approached data collection and analyses in a number of different ways.
First, I engaged in an online ethnographic immersion in order to get a sense of conversation topics, conversation organization, key players, and community structure.
Next, thread posts were scraped and collected using Beautiful Soup in Python, so I could arrange and analyze the data offline without having to rely on the siteâs clunky search feature.
I have conducted quantitative content analysis of members, key words, and topics.
Finally, I have been conducting qualitative analyses of conversation threads, to glean the latent content of posts.
Now letâs shift our focus to some of the methodological opportunities that this kind of work can have.
The first benefit is the innovative access to new subjects
Web forums and online ethnography can tap into communities that are hidden or scattered in the non-virtual world.
For example, my work focuses on marriage migration couples, and in particular, U.S. citizens who are petitioning immigration to get a foreign spouse over the border.
These subjects are scattered around the country, and have no other interlocking ties aside from the forum.
Online ethnography allows one to investigate a potentially sizable group of people who we would not otherwise find in such large numbers offline.
So, while it is not representative of the total population, we can get a firm understanding about the some of the struggles and issues within the community by coming to this large collection of people.
Next, this medium fosters a conversation candidness that benefits the researcher.
First, participants on a forum that has been around for so long and is so large discuss a host of topics ranging from the mundane to the highly-charged.
In addition to wide variation of topics discussed, such topics are brought up repeatedly. So, one can analyze how topics have changed across different members, forums, and periods.
There is, I discovered, less self-censoring and more frankness within conversations. Screen anonymity can make people bold in their assertions. So, it is an opportunity to hear what people may really think about a particular subject.
Next, observing people in virtually allows the researcher and their influence to get out of the way more. Anyone who has taken a methods course knows of the Hawthorne effect, which is when the researcher can influence subjects merely by being present.
Virtual presence is a bit different.
Members particularly in a big data forum tend to be less cautious and do not manage expectations as rigorously as when the researcher is physically present.
Next there are rich opportunities for spatial and temporal comparison
Since topics are brought up repeatedly, how they discussed can shift over time. Conversations archives provide the researcher actual time/date stamping, so these shifts can be methodically analyzed.
Moreover, researchers can also use the time/date data to research political and historical events of the time that shaped the conversations.
Also when large sites are divided by sub-forum or members are grouped into particular clusters then one can analyze the shift in topic across different spaces in addition to across time.
For example, within the site that I am using, there are region specific forums that host conversations about spousal reunification challenges specific to filing for a spouse from a particular country.
Thus, I can make comparisons of such challenges between two or more regions rather than limiting myself to one specific place or two.
Finally, online ethnographies of âbig dataâ forums can provide endless research opportunities
With so much data, numerous theoretical and empirical questions can be asked of the same data.
This is particularly useful for creating long-term research agendas, which are important at any stage of our careers whether you are a PhD on the market, trying to get tenure, or looking for paper ideas.
So, I have discovered that digital sociology is not only a very rewarding intellectual endeavor, but a savvy and practical one as well.
Now letâs change our focus to some of the methodological challenges and suggestions to meet those challenges.
The first challenge is data management.
When I began this project, I discovered that collecting the data, cleaning it, storing it, and analyzing it were going to be huge challenges.
Unlike other kinds of data that I have worked with in the past, these data seemed contain an extraordinary amount of various kinds, which made it difficult for me to handle.
Where does one begin?
I suggest using Python computer language to get started.
There are several great things about Python that can be useful.
Python has excellent packages of tools (called modules) for collecting data, for searching, editing, organizing, and exporting (in order to clean data), and for analyzing data whether you prefer statistical or qualitative analysis.
Not to mention, it is free.
The second challenge I encountered were programming and sampling logistics.
Remember, I told you earlier that I had to learn computer programming to conduct this research.
So, how did I get the programming skills I needed?
And How did I sample among such vast amounts of data?
For programming, I became a strategic learner. Once I learned the program basics, I search for the exact set of computing skills or code I needed to do a particular task.
I learned the basics of the Python program through the Python website and other people teaching Python on the web. YouTube is a great place to find tutorials.
Next, I decided what I needed to do in chronological steps and cobbled together code to address those needs. This takes a bit of trouble shooting because I had to include every logical step.
For example if I wanted to perform a word frequency search, well I needed first know how to import a file into Python and open it.
All of this can be done through searches on google, codeacademy, YouTube, and of course, asking someone who knows that language.
888888
With sampling, I used a number of different techniques. During ethnographic immersion, I used some of my old standbys such as memo-writing and field notes for drawing out themes and keywords to sample on.
Then I extended this to include a quantitative word counts and understand words collocated frequently. I was able to set up a series of purposive sample criteria.
Finally, pythonâs sample module helped customize different samples to draw from the thread population
Perhaps one of the biggest challenges I had was navigating copyright and permission issues.
One thing to remember is if something is online, it automatically enjoys copyright protection. This is the de facto situation. It is true even if the website does not say it is copyrighted or use the copyright [ Š ] symbol.
Scraping may be against the Terms of Use of some websites, and administrators may or may not be very responsive or helpful with granting permissions.
The first thing I suggest is to know your site and the Fair Use Act.
Also, read the Privacy and Terms of Service statements very carefully. Who exactly owns what on this site? Does the administrator own the utterances or just the non-user posted content?
This is what happened in my case. I contacted the web administrator, who was initially very helpful, to ask if I could use the threads for the project. After a few weeks, the administrator went on radio silence.
I wasnât sure what to do, so this is why I suggest legal advice.
Sometimes you must ask a lawyer, and the best bet is to do this via written correspondence, so you have a record.
I recognize that legal advice can be costly, so I suggest being innovative.
For example, the UW law school had some professors who study intellectual property rights and practice in this area of the law and who generously advised me on a pro-bono basis..
This inevitably leads to concerns about ethics and IRB issues. Where does IRB come in? If one is IRB exempt, then what other ethical considerations should the researcher think about?
The first thing to do prior to applying to IRB is to consider the siteâs access.
If it is closed access like Facebook or open like a publicly accessible forum, then IRB is going to be more or less stringent in their expectations.
Also, know what kind of interaction you are going to have.
If you are going to work with a publicly accessible site AND not interact with the posters, then you can ask IRB for an exemption. This is because it is no longer considered human subjectsâ research. The IRB may consider it text data.
What if you have a publicly accessible site, you donât plan on interacting with subjects, and you are IRB exempt? Do you still have an obligation to create pseudonyms for the site and the users?
Some have said, no because all the utterances are open to the public, and the users already have pseudonyms because of they use user names rather than legal ones.
Others say yes, despite all of this, we are still talking about people connected to these texts. Unlike newspaper articles or other text data, the content of such threads are more closely linked to the user who wrote them. So, some researchers say that they have an ethical duty to protect the people who uttered the text. Thus, they recommend pseudonyms.
I fall in the latter camp, and use pseudonyms. This however is a personal, albeit, important choice.