SlideShare a Scribd company logo
1 of 20
Download to read offline
1. (Jerry) sponsor’s background
2. (Jerry) sponsor’s business operations, information management, users profiles etc.
3. (Jerry) Articulate the goals the sponsor requested
4. (Jerry) Articulate the Capstone course objectives based on your project
5. (列表,Chris) What research topics (e.g. list all potential (Jerry) informatics issues: HCI research, (Jerry) security issues,
(Jerry) platform strategies, etc.)
6. (每人说)What are the problems observed in your project, and how significant they are?
7. (每人说) How many hours have you all spent on collecting data (e.g. interviewing sponsors, email communications, field
observations.).
8. (每人说) What are interesting preliminary findings?
9. (每人说) Reflect how the curriculum learning help you conduct this capstone project, what lessons have you learned so far,
and other skills you gained.
10. () Moving ahead, what’s the plan? What are the more expected results (e.g., prototype design, generate report)
Mid-term Presentation
Team 2
Lingyu Hu, Zhaode Ouyang, Ziyan Yan, Zixun Zhou
Feb 22, 2022
Mikhail Oet, PhD
Professor in Commerce and Economic Development program
Northeastern University
Our Sponsor
Our Sponsors
Department of State
Missions:
● To engage the American
people in the work of the State
Department
● To broaden the Department’s
research base in response to a
proliferation of complex global
challenges
Mission:
● To get the right information
to the right people at the
right time
Course Objectives
(1) Systematic way to research (qualitatively and quantitatively)
(2) Data collection, cleaning, & analysis
(3) Communication
(4) Team Work
To Learn:
Sponsor’s Goals
1. Collection and organization of online data related to mis-/disinformation campaigns
(English, Russian, Mandarin)
2. Analysis of online data related to mis-/disinformation campaigns
3. Identification of latent attributes of mis-/disinformation campaigns
4. Visualization of the latent attributes of mis-/disinformation campaigns
5. Detection of latent attributes of mis-/disinformation campaigns
Our Research Topics
Stage One: Data Collection Stage Two: Data Analysis
1
Platform Research
● Data Scraping Availability
● Existing Dataset Availability
Data Project Research
● BirdWatch
● Twitter Transparency
● Chinese COVID-19 Fake News Dataset
2
Data Repositories Research
● GitHub
● Kaggle
Data Cleaning Research
● SBS-ready format
● All the datasets (CED, INF, ALY)
3
Data Scraping Tools Research
● Octoparse
● BrightData
● Web Scraping
Data Sentiment Analysis
● Azure Machine Learning
4 U.S., China, Russian Research
Data Analysis Tools Research
● SBS
End Integrate data and Collect useful data
Data visualization
Building predictive models
Week 1-Week 3 Research
1. Platform Research——US, CHINA, RUSSIAN
a. 17 different Social Platforms and News Outlets
2. Data Repositories Research——GitHub, Kaggle
a. Project and Datasets
Research Result
————————————————————————
3. Data Scraping Research
a. What programming skills do crawlers need?
b. Exploration of anti-crawling mechanism.
Week 4-Week 5 Research
1. Russian Focus Research
a. Russian Platform Research
b. Russian Politics and History Research
2. Data Repositories & Project Research
a. Birdwatch
b. Twitter Transparent Project
c. GitHub Data
————————————————————————
3. Technical Exploration
a. Machine Learning
b. Natural Language Processing
c. Dashboard (Power BI)
Week 6 Research
1. Data Cleaning Research
a. DIPLAB 3 assignment
b. SBS-ready format
——————————————————
2. Data Sentiment Analysis
a. Azure Machine Learning--Tool
b. English text dataset (results)
c. Chinese text dataset (unusual results)
Jerry’s Finding
Issues for Data Scraping
1. Intellectual property right
2. Anti-crawling
Solutions:
1. Research each website before doing data scraping
2. Improve the algorithm of the data scraping script, and combine the
use of data scraping tools such as BrightData
Russian Research
- Russian citizen might be punished if they post a fake news
Data Cleaning
- Python is powerful. e.g. it can handle a 30gb csv file
- Excel max row is 1,048,576
Problems Related to the Project
Problems:
1. The complex work deviation and sponsor relationship
2. The unclear final deliverable & goal
3. Lack of correlative skills
Solutions:
1. Ask for more communication & meetings
2.
a. Create a prototype to show our sponsor what we think the deliverable looks
like
b. To understand a unclear goal is normal in a project
c. Study & experience is our goal
3.
a. Ask Professor & Sponsor for learning resources
b. Self-study
Average Time Spent (hours / week)
Reading provided materials 4
Qualitative Research 3
Web Scraping - Python 8
Data Cleaning - Python 10
Data Analysis - SBS 8
Email & Communication & Meetings 6
Report 1
Plans for Next Stage
● Use SBS to do data analysis
1. Get SBS working
2. Generating findings
● Web Scraping
1. Learn Python BeatifulSoup Library
2. Scrape website that have useful
data for analysis or model training
Zixun’s Reflection
Problems Interesting Preliminary Findings
1. How to get started?
2. What should I do with the dataset I found?
1. The data repository has a lot of datasets.
2. The relationship between text analysis and
information authenticity.
3. The training data will be what I finally try to do.
Time Cost
Interviewing Sponsors Every Wednesday 1pm-4pm
Team Meeting and Discussion Every Friday 1pm-2pm
Reading Materials and Learning 4-6 hours Every Week
Summarize Research Findings and
Prepare Presentations
1-2 hours Every Week
Zixun’s Reflection
Expected Results
1. Available integrated datasets
2. Understand the relationship between the results of text sentiment analysis and the authenticity of
information.
3. Train data and build predictive models
What have I learned?
1. The ability to explore and summarize.
2. Programming language for web scraping data
Lingyu’s reflection
Problems:
● Data records garbled
● Data crawling
Hours spend:
● Reading materials: 3hr
● Exploring tools and datasets: 3hr
● Team discussion and prepare weekly report: 3~4 hr
Lingyu’s reflection
Interesting findings:
● People have different cultures have different opinions on the same information
● The mis/disinformation are possibly operated by bots
What I had learned:
1. Teamwork and communication
2. Dealing with gabled datasets
Plan for next stage:
● Working on analysis and data visualization by using tools
● Finding potential similarities in datasets of dis/misinformation
Ziyan’s Reflection
Problems:
1.Searching for data with no way to get started.
2.Understand my work and make a good and brief report.
Time Spending:
1.Data Collect - 3 hrs per week
2.Tools Explore - 1.5 hrs per week
3.Meeting for the Project - 3 hrs per week
4.Learn from Material - 2 hrs per week
Ziyan’s Reflection
Interesting Findings:
1.There is no absolute right or wrong in many things, and different positions will lead to different answers to questions.
2.After finding the characteristics it is easy to find the disinformation.
What I Learned:
1.Learn to explore solutions to problems in unknown areas.
2.Learn tableau for data analysis.
3.Do's and don'ts when presenting to a sponsor.
● Use Tableau to analyze the current datasets (waiting for approve)
● Explore sentiment analysis
● Use SBS to do data analysis
● Learn Web scraping
Next Step
Live Data
● Social Medias
● News Websites
A Dashboard
● Power BI
● A website similar to Hamilton 2.0
Data Collection & Cleaning
● Python BeatifulSoup
● Python DataCleaning
● Bright Data
Model Training
● Machine Learning
Data Analysis
● SBS
● Sentiment Analysis
● Tableau
Datasets
● e.g. Twitter Transparency
● e.g. Weibo Datasets
Fake Checking Websites
● Human Verified
○ e.g. Politifact
● Automated Varitied
○ e.g. Duke Reporters Lab
Data Collection & Cleaning
● Python BeatifulSoup
● Python DataCleaning
● Bright Data
Trained Model
Trained Model
Data Analysis
● SBS
● Sentiment Analysis
● Tableau
Reports
Result (real/fake)
Graphical Reports
Finished
Unfinished

More Related Content

What's hot

The Big Six Research Skills
The Big Six Research SkillsThe Big Six Research Skills
The Big Six Research SkillsMrsMDXB
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data ManagementC. Tobin Magle
 
Data Science presentation for elementary school students
Data Science presentation for elementary school studentsData Science presentation for elementary school students
Data Science presentation for elementary school studentsMelanie Manning, CFA
 
Chapter1 introduction
Chapter1 introductionChapter1 introduction
Chapter1 introductionDinesh K
 
3/31/05 Funding Committee Notes
3/31/05 Funding Committee Notes3/31/05 Funding Committee Notes
3/31/05 Funding Committee Notesbutest
 
Developing your research skills
Developing your research skillsDeveloping your research skills
Developing your research skillsSamantha Halford
 
How To Become a Data Scientist in Iran Marketplace
How To Become a Data Scientist in Iran Marketplace How To Become a Data Scientist in Iran Marketplace
How To Become a Data Scientist in Iran Marketplace Mohamadreza Mohtat
 

What's hot (12)

Big 6 Powerpoint
Big 6  PowerpointBig 6  Powerpoint
Big 6 Powerpoint
 
The Big Six Research Skills
The Big Six Research SkillsThe Big Six Research Skills
The Big Six Research Skills
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
 
Data Science presentation for elementary school students
Data Science presentation for elementary school studentsData Science presentation for elementary school students
Data Science presentation for elementary school students
 
Chapter1 introduction
Chapter1 introductionChapter1 introduction
Chapter1 introduction
 
3/31/05 Funding Committee Notes
3/31/05 Funding Committee Notes3/31/05 Funding Committee Notes
3/31/05 Funding Committee Notes
 
Developing your research skills
Developing your research skillsDeveloping your research skills
Developing your research skills
 
Data Science using Python
Data Science using PythonData Science using Python
Data Science using Python
 
Doing a research project using the Big 6 Model
Doing a research project using the Big 6 ModelDoing a research project using the Big 6 Model
Doing a research project using the Big 6 Model
 
NVivo use for PhD study
NVivo use for PhD studyNVivo use for PhD study
NVivo use for PhD study
 
How To Become a Data Scientist in Iran Marketplace
How To Become a Data Scientist in Iran Marketplace How To Become a Data Scientist in Iran Marketplace
How To Become a Data Scientist in Iran Marketplace
 
Online search 1 day
Online search 1 dayOnline search 1 day
Online search 1 day
 

Similar to Sponsor Background and Project Goals for Misinformation Analysis

Presentation1.pdf
Presentation1.pdfPresentation1.pdf
Presentation1.pdfZixunZhou
 
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...ASIS&T
 
Data Interview and Data Management Plans
Data Interview and Data Management PlansData Interview and Data Management Plans
Data Interview and Data Management PlansJulie Goldman
 
Assessment Project Management in the Real World - Hour Two
Assessment Project Management in the Real World - Hour TwoAssessment Project Management in the Real World - Hour Two
Assessment Project Management in the Real World - Hour TwoJen Rutner
 
discussion_3_project.pdf
discussion_3_project.pdfdiscussion_3_project.pdf
discussion_3_project.pdfKuan-Tsae Huang
 
Applied Communication Research Class Syllabus - Spring 2019
Applied Communication Research Class Syllabus - Spring 2019Applied Communication Research Class Syllabus - Spring 2019
Applied Communication Research Class Syllabus - Spring 2019Matthew J. Kushin, Ph.D.
 
EAPP Quarter2 - Module4_ Data Collection Methods & Tools for Research.pdf
EAPP Quarter2 - Module4_ Data Collection Methods & Tools for Research.pdfEAPP Quarter2 - Module4_ Data Collection Methods & Tools for Research.pdf
EAPP Quarter2 - Module4_ Data Collection Methods & Tools for Research.pdfLeah Condina
 
EAPP Quarter2 - Module4_ Data Collection Methods & Tools for Research.pdf
EAPP Quarter2 - Module4_ Data Collection Methods & Tools for Research.pdfEAPP Quarter2 - Module4_ Data Collection Methods & Tools for Research.pdf
EAPP Quarter2 - Module4_ Data Collection Methods & Tools for Research.pdfLeah Condina
 
Implementing Archivematica, research data network
Implementing Archivematica, research data networkImplementing Archivematica, research data network
Implementing Archivematica, research data networkJisc RDM
 
Designing a synergistic relationship between undergraduate Data Science educa...
Designing a synergistic relationship between undergraduate Data Science educa...Designing a synergistic relationship between undergraduate Data Science educa...
Designing a synergistic relationship between undergraduate Data Science educa...Ciera Martinez
 
Markus id project 1
Markus id project 1Markus id project 1
Markus id project 1erinmarkus
 
Software Project Management: Project Initiation
Software Project Management: Project InitiationSoftware Project Management: Project Initiation
Software Project Management: Project InitiationMinhas Kamal
 
Data Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfData Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfmustaq4
 
Create the Foundation of an App - UX Wannabe 5
Create the Foundation of an App - UX Wannabe 5Create the Foundation of an App - UX Wannabe 5
Create the Foundation of an App - UX Wannabe 5Daeng Muhammad Feisal
 
Search Research Strategies
Search Research StrategiesSearch Research Strategies
Search Research StrategiesPatrick Woessner
 
Day 2 Am What Do We Do
Day 2 Am   What Do We DoDay 2 Am   What Do We Do
Day 2 Am What Do We Dojpward001
 
The title of the research project is Representation of the.docx
The title of the research project is Representation of the.docxThe title of the research project is Representation of the.docx
The title of the research project is Representation of the.docxwrite4
 

Similar to Sponsor Background and Project Goals for Misinformation Analysis (20)

Presentation1.pdf
Presentation1.pdfPresentation1.pdf
Presentation1.pdf
 
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...
 
Data Interview and Data Management Plans
Data Interview and Data Management PlansData Interview and Data Management Plans
Data Interview and Data Management Plans
 
Assessment Project Management in the Real World - Hour Two
Assessment Project Management in the Real World - Hour TwoAssessment Project Management in the Real World - Hour Two
Assessment Project Management in the Real World - Hour Two
 
Research Project Management
Research Project ManagementResearch Project Management
Research Project Management
 
discussion_3_project.pdf
discussion_3_project.pdfdiscussion_3_project.pdf
discussion_3_project.pdf
 
Applied Communication Research Class Syllabus - Spring 2019
Applied Communication Research Class Syllabus - Spring 2019Applied Communication Research Class Syllabus - Spring 2019
Applied Communication Research Class Syllabus - Spring 2019
 
Importance of Publications
Importance of PublicationsImportance of Publications
Importance of Publications
 
6p model of research
6p model of research6p model of research
6p model of research
 
EAPP Quarter2 - Module4_ Data Collection Methods & Tools for Research.pdf
EAPP Quarter2 - Module4_ Data Collection Methods & Tools for Research.pdfEAPP Quarter2 - Module4_ Data Collection Methods & Tools for Research.pdf
EAPP Quarter2 - Module4_ Data Collection Methods & Tools for Research.pdf
 
EAPP Quarter2 - Module4_ Data Collection Methods & Tools for Research.pdf
EAPP Quarter2 - Module4_ Data Collection Methods & Tools for Research.pdfEAPP Quarter2 - Module4_ Data Collection Methods & Tools for Research.pdf
EAPP Quarter2 - Module4_ Data Collection Methods & Tools for Research.pdf
 
Implementing Archivematica, research data network
Implementing Archivematica, research data networkImplementing Archivematica, research data network
Implementing Archivematica, research data network
 
Designing a synergistic relationship between undergraduate Data Science educa...
Designing a synergistic relationship between undergraduate Data Science educa...Designing a synergistic relationship between undergraduate Data Science educa...
Designing a synergistic relationship between undergraduate Data Science educa...
 
Markus id project 1
Markus id project 1Markus id project 1
Markus id project 1
 
Software Project Management: Project Initiation
Software Project Management: Project InitiationSoftware Project Management: Project Initiation
Software Project Management: Project Initiation
 
Data Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdfData Science Unit1 AMET.pdf
Data Science Unit1 AMET.pdf
 
Create the Foundation of an App - UX Wannabe 5
Create the Foundation of an App - UX Wannabe 5Create the Foundation of an App - UX Wannabe 5
Create the Foundation of an App - UX Wannabe 5
 
Search Research Strategies
Search Research StrategiesSearch Research Strategies
Search Research Strategies
 
Day 2 Am What Do We Do
Day 2 Am   What Do We DoDay 2 Am   What Do We Do
Day 2 Am What Do We Do
 
The title of the research project is Representation of the.docx
The title of the research project is Representation of the.docxThe title of the research project is Representation of the.docx
The title of the research project is Representation of the.docx
 

More from ZixunZhou

Weekly Meeting 8.pdf
Weekly Meeting 8.pdfWeekly Meeting 8.pdf
Weekly Meeting 8.pdfZixunZhou
 
Weekly Meeting 7.pdf
Weekly Meeting 7.pdfWeekly Meeting 7.pdf
Weekly Meeting 7.pdfZixunZhou
 
Weekly Meeting 6.pdf
Weekly Meeting 6.pdfWeekly Meeting 6.pdf
Weekly Meeting 6.pdfZixunZhou
 
Weekly Meeting 4.pdf
Weekly Meeting 4.pdfWeekly Meeting 4.pdf
Weekly Meeting 4.pdfZixunZhou
 
Weekly Meeting 3.pdf
Weekly Meeting 3.pdfWeekly Meeting 3.pdf
Weekly Meeting 3.pdfZixunZhou
 
Weekly Meeting 2.pdf
Weekly Meeting 2.pdfWeekly Meeting 2.pdf
Weekly Meeting 2.pdfZixunZhou
 
Dashboard Design.pptx
Dashboard Design.pptxDashboard Design.pptx
Dashboard Design.pptxZixunZhou
 

More from ZixunZhou (7)

Weekly Meeting 8.pdf
Weekly Meeting 8.pdfWeekly Meeting 8.pdf
Weekly Meeting 8.pdf
 
Weekly Meeting 7.pdf
Weekly Meeting 7.pdfWeekly Meeting 7.pdf
Weekly Meeting 7.pdf
 
Weekly Meeting 6.pdf
Weekly Meeting 6.pdfWeekly Meeting 6.pdf
Weekly Meeting 6.pdf
 
Weekly Meeting 4.pdf
Weekly Meeting 4.pdfWeekly Meeting 4.pdf
Weekly Meeting 4.pdf
 
Weekly Meeting 3.pdf
Weekly Meeting 3.pdfWeekly Meeting 3.pdf
Weekly Meeting 3.pdf
 
Weekly Meeting 2.pdf
Weekly Meeting 2.pdfWeekly Meeting 2.pdf
Weekly Meeting 2.pdf
 
Dashboard Design.pptx
Dashboard Design.pptxDashboard Design.pptx
Dashboard Design.pptx
 

Recently uploaded

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 

Recently uploaded (20)

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 

Sponsor Background and Project Goals for Misinformation Analysis

  • 1. 1. (Jerry) sponsor’s background 2. (Jerry) sponsor’s business operations, information management, users profiles etc. 3. (Jerry) Articulate the goals the sponsor requested 4. (Jerry) Articulate the Capstone course objectives based on your project 5. (列表,Chris) What research topics (e.g. list all potential (Jerry) informatics issues: HCI research, (Jerry) security issues, (Jerry) platform strategies, etc.) 6. (每人说)What are the problems observed in your project, and how significant they are? 7. (每人说) How many hours have you all spent on collecting data (e.g. interviewing sponsors, email communications, field observations.). 8. (每人说) What are interesting preliminary findings? 9. (每人说) Reflect how the curriculum learning help you conduct this capstone project, what lessons have you learned so far, and other skills you gained. 10. () Moving ahead, what’s the plan? What are the more expected results (e.g., prototype design, generate report)
  • 2. Mid-term Presentation Team 2 Lingyu Hu, Zhaode Ouyang, Ziyan Yan, Zixun Zhou Feb 22, 2022
  • 3. Mikhail Oet, PhD Professor in Commerce and Economic Development program Northeastern University Our Sponsor
  • 4. Our Sponsors Department of State Missions: ● To engage the American people in the work of the State Department ● To broaden the Department’s research base in response to a proliferation of complex global challenges Mission: ● To get the right information to the right people at the right time
  • 5. Course Objectives (1) Systematic way to research (qualitatively and quantitatively) (2) Data collection, cleaning, & analysis (3) Communication (4) Team Work To Learn: Sponsor’s Goals 1. Collection and organization of online data related to mis-/disinformation campaigns (English, Russian, Mandarin) 2. Analysis of online data related to mis-/disinformation campaigns 3. Identification of latent attributes of mis-/disinformation campaigns 4. Visualization of the latent attributes of mis-/disinformation campaigns 5. Detection of latent attributes of mis-/disinformation campaigns
  • 6. Our Research Topics Stage One: Data Collection Stage Two: Data Analysis 1 Platform Research ● Data Scraping Availability ● Existing Dataset Availability Data Project Research ● BirdWatch ● Twitter Transparency ● Chinese COVID-19 Fake News Dataset 2 Data Repositories Research ● GitHub ● Kaggle Data Cleaning Research ● SBS-ready format ● All the datasets (CED, INF, ALY) 3 Data Scraping Tools Research ● Octoparse ● BrightData ● Web Scraping Data Sentiment Analysis ● Azure Machine Learning 4 U.S., China, Russian Research Data Analysis Tools Research ● SBS End Integrate data and Collect useful data Data visualization Building predictive models
  • 7. Week 1-Week 3 Research 1. Platform Research——US, CHINA, RUSSIAN a. 17 different Social Platforms and News Outlets 2. Data Repositories Research——GitHub, Kaggle a. Project and Datasets Research Result ———————————————————————— 3. Data Scraping Research a. What programming skills do crawlers need? b. Exploration of anti-crawling mechanism.
  • 8. Week 4-Week 5 Research 1. Russian Focus Research a. Russian Platform Research b. Russian Politics and History Research 2. Data Repositories & Project Research a. Birdwatch b. Twitter Transparent Project c. GitHub Data ———————————————————————— 3. Technical Exploration a. Machine Learning b. Natural Language Processing c. Dashboard (Power BI)
  • 9. Week 6 Research 1. Data Cleaning Research a. DIPLAB 3 assignment b. SBS-ready format —————————————————— 2. Data Sentiment Analysis a. Azure Machine Learning--Tool b. English text dataset (results) c. Chinese text dataset (unusual results)
  • 10. Jerry’s Finding Issues for Data Scraping 1. Intellectual property right 2. Anti-crawling Solutions: 1. Research each website before doing data scraping 2. Improve the algorithm of the data scraping script, and combine the use of data scraping tools such as BrightData Russian Research - Russian citizen might be punished if they post a fake news Data Cleaning - Python is powerful. e.g. it can handle a 30gb csv file - Excel max row is 1,048,576
  • 11. Problems Related to the Project Problems: 1. The complex work deviation and sponsor relationship 2. The unclear final deliverable & goal 3. Lack of correlative skills Solutions: 1. Ask for more communication & meetings 2. a. Create a prototype to show our sponsor what we think the deliverable looks like b. To understand a unclear goal is normal in a project c. Study & experience is our goal 3. a. Ask Professor & Sponsor for learning resources b. Self-study
  • 12. Average Time Spent (hours / week) Reading provided materials 4 Qualitative Research 3 Web Scraping - Python 8 Data Cleaning - Python 10 Data Analysis - SBS 8 Email & Communication & Meetings 6 Report 1 Plans for Next Stage ● Use SBS to do data analysis 1. Get SBS working 2. Generating findings ● Web Scraping 1. Learn Python BeatifulSoup Library 2. Scrape website that have useful data for analysis or model training
  • 13. Zixun’s Reflection Problems Interesting Preliminary Findings 1. How to get started? 2. What should I do with the dataset I found? 1. The data repository has a lot of datasets. 2. The relationship between text analysis and information authenticity. 3. The training data will be what I finally try to do. Time Cost Interviewing Sponsors Every Wednesday 1pm-4pm Team Meeting and Discussion Every Friday 1pm-2pm Reading Materials and Learning 4-6 hours Every Week Summarize Research Findings and Prepare Presentations 1-2 hours Every Week
  • 14. Zixun’s Reflection Expected Results 1. Available integrated datasets 2. Understand the relationship between the results of text sentiment analysis and the authenticity of information. 3. Train data and build predictive models What have I learned? 1. The ability to explore and summarize. 2. Programming language for web scraping data
  • 15. Lingyu’s reflection Problems: ● Data records garbled ● Data crawling Hours spend: ● Reading materials: 3hr ● Exploring tools and datasets: 3hr ● Team discussion and prepare weekly report: 3~4 hr
  • 16. Lingyu’s reflection Interesting findings: ● People have different cultures have different opinions on the same information ● The mis/disinformation are possibly operated by bots What I had learned: 1. Teamwork and communication 2. Dealing with gabled datasets Plan for next stage: ● Working on analysis and data visualization by using tools ● Finding potential similarities in datasets of dis/misinformation
  • 17. Ziyan’s Reflection Problems: 1.Searching for data with no way to get started. 2.Understand my work and make a good and brief report. Time Spending: 1.Data Collect - 3 hrs per week 2.Tools Explore - 1.5 hrs per week 3.Meeting for the Project - 3 hrs per week 4.Learn from Material - 2 hrs per week
  • 18. Ziyan’s Reflection Interesting Findings: 1.There is no absolute right or wrong in many things, and different positions will lead to different answers to questions. 2.After finding the characteristics it is easy to find the disinformation. What I Learned: 1.Learn to explore solutions to problems in unknown areas. 2.Learn tableau for data analysis. 3.Do's and don'ts when presenting to a sponsor.
  • 19. ● Use Tableau to analyze the current datasets (waiting for approve) ● Explore sentiment analysis ● Use SBS to do data analysis ● Learn Web scraping Next Step
  • 20. Live Data ● Social Medias ● News Websites A Dashboard ● Power BI ● A website similar to Hamilton 2.0 Data Collection & Cleaning ● Python BeatifulSoup ● Python DataCleaning ● Bright Data Model Training ● Machine Learning Data Analysis ● SBS ● Sentiment Analysis ● Tableau Datasets ● e.g. Twitter Transparency ● e.g. Weibo Datasets Fake Checking Websites ● Human Verified ○ e.g. Politifact ● Automated Varitied ○ e.g. Duke Reporters Lab Data Collection & Cleaning ● Python BeatifulSoup ● Python DataCleaning ● Bright Data Trained Model Trained Model Data Analysis ● SBS ● Sentiment Analysis ● Tableau Reports Result (real/fake) Graphical Reports Finished Unfinished