SlideShare ist ein Scribd-Unternehmen logo
MSCA ITN/ETN No. 860721
Health Misinformation Detection in Web Content
A Structural-, Content-based, and Context-aware Approach based on Web2Vec
Rishabh Upadhyay
Gabriella Pasi
Marco Viviani
University of Milano-Bicocca
Department of Informatics, Systems, and Communication
Information and Knowledge Representation, Retrieval and
Reasoning (IKR3) LAB
MSCA ITN/ETN No. 860721
Outline
Scenario
Literature Review: Interaction-based Approaches
Literature review: Algorithm-based Approaches
The Proposed Model: Cred2Vec
Experimental Results
Conclusions and Further Developments
MSCA ITN/ETN No. 860721
Scenario: The Web, Information, and
Misinformation
▪ Web → most popular information source to obtain “general-purpose” information
and, in the last years, health-related information [1, 2]
▪ Web 2.0 → anyone can write almost “anything” without trusted external control [3]
▪ Misinformation → (Deliberately) false or incorrect information
▪ Consequences for society → Undermine people’s ability to make informed decisions
and lead to harmful consequences
MSCA ITN/ETN No. 860721
Scenario: Health Misinformation
MSCA ITN/ETN No. 860721
Literature Review: Interaction-based
Approaches
▪ Various indicators of “genuineness” → Source, Content, Design and Personal
▪ Source Indicators:
– Domain type (.org, .gov, .edu) [4,5]
– Owner Identify (Parent organisation, Educational Institutions) [5,6]
– References [7]
▪ Content Indicators:
– Content Types (Factual and Personal Information) [9]
– Writing and Language (Grammar, Simple Terms, Conciseness) [10]
– Author and Currency (Updating) [11, 12]
MSCA ITN/ETN No. 860721
▪ Design Indicators:
– Interface Design (Font, Graphics) [13]
– Interaction Design (Links, Logins) [14]
– Navigation Design and Ads [13]
▪ Personal Indicators:
– Known Websites (previously used websites) [15]
– Health Literacy [8, 14]
– Demographic Factors [16]
– Religious Beliefs [16]
Literature Review: Interaction-based
Approaches
MSCA ITN/ETN No. 860721
▪ Types of approaches: Machine Learning, Knowledge-based and Semantic-based
▪ Machine Learning (most of the approaches):
– Information: Textual, Source and Linguistic.
– Features: Bag of words, N-grams, L.I.W.C., Affective features, Readability.
– Algorithm: SVM, Naive Bayes, Random Forest, K-Nearest Neighbour, Logistic Regression, etc.
▪ Knowledge- and semantic-based methods (main approaches):
– DETERRENT model focused on healthcare misinformation detection by leveraging the medical
knowledge graph [18]
– MedCIRCLE (Collaboration for Internet Rating, Certification, Labelling and Evaluation of
Health Information) is a collaboration of medical communities to assess health information, by
making a standardized machine-readable statement about the particular health Website [19]
Literature Review: Algorithm-based
Approaches
MSCA ITN/ETN No. 860721
The Proposed Model: Cred2Vec
▪ We considered Web2Vec, a recently proposed model for spam detection in Web
pages [20]
– A related but not overlapping research issue
▪ We modified and applied it in the context of health misinformation:
– Same deep learning architecture
– Different features (i.e., Links and Domain-Specific Information)
– Different and context-aware learning strategy
▪ Three phases:
– Web parsing
– Data representation
– Feature extraction
MSCA ITN/ETN No. 860721
Phase 1: Web parsing
MSCA ITN/ETN No. 860721
Phase 2: Data representation
MSCA ITN/ETN No. 860721
Phase 3: Feature extraction
Conv Pooling BiLSTM
Attention
Layer
MSCA ITN/ETN No. 860721
Experimental Setup: Datasets
▪ Microsoft Credibility Dataset
– 1,000 Web pages in different domains such as Health, Finance and Politics
– Human-based evaluation
– Credibility → 1 to 5, where 1 stands for“very non-credible”, and 5 for “very credible"
– 104 credible and 26 non-credible Web pages
▪ Medical Web Reliability Corpus
– The dataset consists of 360 Web pages, 180 reliable and 180 unreliable
– HON accredited Web pages
– Unreliable Webpages→ disease name + "miracle cure"
– 170 reliable and 176 unreliable Web pages
▪ CLEF eHealth 2020 Task-2 Dataset
– 12,456 Web pages
– Four point scale, from 0 to 3
– 0 non-credible, and 1-2 credible
– 5,509 credible and 6,736 non-credible Web pages
MSCA ITN/ETN No. 860721
Experimental Setup: Baselines
▪ Textual-based features:
– Naive Bayes (NB) and Logistic regression (LR) with CountVec and Tf-iDF was proposed in a
research [21]
– After experimentation with all the three datasets
– We considered NB_CountVec and LR_Tf-iDF
▪ Multi-feature-based:
– SVM with Handcrafted features was proposed and used by few researchers [22]
– Links, contact-us, commercial keywords, page rank and bag-of-words
– We considered it as one of the baseline
▪ Domain-specific features:
– BioBERT, pretrained on medical data, was used to extract embeddings.
– BERT have produced good results for misinformation and fake news detection [22]
– We considered SVM with BioBERT
MSCA ITN/ETN No. 860721
▪ We considered Accuracy, F1 and AUC as our
evaluation metrics
▪ D1 - Microsoft Credibility Dataset
▪ D2 - Medical Web Reliability Dataset
▪ D3 - CLEF e-Health Task-2, 2020
▪ Statistical significance → Confidence
Intervals with 95% confidence
Results
MSCA ITN/ETN No. 860721
Conclusions and Further Developments
▪ First step for the investigation at a more general level
▪ Investigation of presence of links, structural and content features in Web pages
▪ Additional features (e.g., semantic-based) from Web pages
▪ Integration of misinformation detection in the Information Retrieval model1
1. https://theconversation.com/its-not-just-a-social-media-problem-how-search-engines-spread-misinformation-
152155
MSCA ITN/ETN No. 860721
References
1. Beaudoin, C. E., & Hong, T. (2011). Health information seeking, diet and physical activity: an empirical
assessment by medium and critical demographics. International journal of medical informatics, 80(8), 586-
595.
2. Rass, S. (2021). Judging the quality of (fake) news on the internet. Mind & Society, 20(1), 129-133.
3. Volkman, J. E., Luger, T. M., Harvey, K. L., Hogan, T. P., Shimada, S. L., Amante, & Houston, T. K. (2014). The
National Cancer Institute’s Health Information National Trends Survey [HINTS]: a national cross-sectional
analysis of talking to your doctor and other healthcare providers for health information. BMC Family
Practice, 15(1), 1-8.
4. Alsem, M. W., Ausems, F., Verhoef, M., Jongmans, M. J., Meily-Visser, J. M. A., & Ketelaar, M. (2017).
Information seeking by parents of children with physical disabilities: An exploratory qualitative study.
Research in developmental disabilities, 60, 125-134.
5. McPherson, A. C., Gofine, M. L., & Stinson, J. (2014). Seeing is believing? A mixed-methods study exploring
the quality and perceived trustworthiness of online information about chronic conditions aimed at children
and young people. Health communication, 29(5), 473-482.
6. Peddie, K. A., & Kelly-Campbell, R. J. (2017). How people with hearing impairment in New Zealand use the
Internet to obtain information about their hearing health. Computers in human behavior, 73, 141-151.
7. Sun, Y., Zhang, Y., Gwizdka, J., & Trace, C. B. (2019). Consumer evaluation of the quality of online health
information: systematic literature review of relevant criteria and indicators. Journal of medical Internet
research, 21(5), e12522.
8. Paglialonga, A., Nielsen, A. C., Ingo, E., Barr, C., & Laplante-Lévesque, A. (2018). eHealth and the hearing aid
adult patient journey: A state-of-the-art review. Biomedical engineering online, 17(1), 1-26.
MSCA ITN/ETN No. 860721
References
9. Diviani, N., Van den Putte, B., Meppelink, C. S., & van Weert, J. C. (2016). Exploring the role of health
literacy in the evaluation of online health information: insights from a mixed-methods study. Patient
education and counseling, 99(6), 1017-1025.
10. Kerr, C., Murray, E., Stevenson, F., Gore, C., & Nazareth, I. (2006). Internet interventions for long-term
conditions: patient and caregiver quality criteria. Journal of medical Internet research, 8(3), e13.
11. Marton, C. (2010). How women with mental health conditions evaluate the quality of information on
mental health websites: a qualitative approach. Journal of Hospital Librarianship, 10(3), 235-250.
12. Champlin, S., Mackert, M., Glowacki, E. M., & Donovan, E. E. (2017). Toward a better understanding of
patient health literacy: A focus on the skills patients need to find health information. Qualitative Health
Research, 27(8), 1160-1176.
13. Briones, R. (2015). Harnessing the web: how e-Health and e-Health literacy impact young adults’
perceptions of online health information. Medicine 2.0, 4(2).
14. Chang, Y. S., Zhang, Y., & Gwizdka, J. (2021). The effects of information source and eHealth literacy on
consumer health information credibility evaluation behavior. Computers in Human Behavior, 115, 106629.
15. Feufel, M. A., & Stahl, S. F. (2012). What do web-use skill differences imply for online health information
searches?. Journal of medical Internet research, 14(3), e87.
16. Hoffman-Goetz, L., & Friedman, D. B. (2007). A qualitative study of Canadian Aboriginal women’s beliefs
about “credible” cancer information on the internet. Journal of Cancer Education, 22(2), 124-128.
MSCA ITN/ETN No. 860721
References
17. Cui, L., Seo, H., Tabar, M., Ma, F., Wang, S., & Lee, D. (2020, August). Deterrent: Knowledge guided
graph attention network for detecting healthcare misinformation. In Proceedings of the 26th ACM
SIGKDD international conference on knowledge discovery & data mining (pp. 492-502).
18. Mayer, M. A., Darmoni, S. J., Fiene, M., Köhler, C., Roth-Berghofer, T. R., & Eysenbach, G. (2003).
MedCIRCLE: collaboration for Internet rating, certification, labelling and evaluation of health
information on the World-Wide-Web. In The New Navigators: from Professionals to Patients (pp. 667-
672). IOS Press.
19. Malhotra, P., Burstein, F., Fisher, J., McKemmish, S., Anderson, J., & Manaszewicz, R. (2003). Breast
cancer knowledge online portal: An intelligent decision support system perspective.
20. Feng, J., Zou, L., Ye, O., & Han, J. (2020). Web2Vec: Phishing Webpage Detection Method Based on
Multidimensional Features Driven by Deep Learning. IEEE Access, 8, 221214-221224.
21. Fernández-Pichel, M., Losada, D. E., Pichel, J. C., & Elsweiler, D. (2021, March). Reliability prediction for
health-related content: A replicability study. In European Conference on Information Retrieval (pp.
47-61). Springer, Cham.
22. Meppelink, C. S., Hendriks, H., Trilling, D., van Weert, J. C., Shao, A., & Smit, E. S. (2021). Reliable or
not? An automated classification of webpages about early childhood vaccination using supervised
machine learning. Patient Education and Counseling, 104(6), 1460-1466.

Weitere ähnliche Inhalte

Ähnlich wie GoodIT2021.pptx

The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
Mark Hawker
 
Social Media Datasets for Analysis and Modeling Drug Usage
Social Media Datasets for Analysis and Modeling Drug UsageSocial Media Datasets for Analysis and Modeling Drug Usage
Social Media Datasets for Analysis and Modeling Drug Usage
ijtsrd
 
Application Evaluation Project Part 1 Evaluation Plan FocusTec.docx
Application Evaluation Project Part 1 Evaluation Plan FocusTec.docxApplication Evaluation Project Part 1 Evaluation Plan FocusTec.docx
Application Evaluation Project Part 1 Evaluation Plan FocusTec.docx
alfredai53p
 
Key Topics in Health Care Technology EvaluationThe amount of new i.docx
Key Topics in Health Care Technology EvaluationThe amount of new i.docxKey Topics in Health Care Technology EvaluationThe amount of new i.docx
Key Topics in Health Care Technology EvaluationThe amount of new i.docx
sleeperfindley
 
Patient driven data and the mICF - mobile ICF "Functionomics"
Patient driven data and the mICF - mobile ICF "Functionomics"Patient driven data and the mICF - mobile ICF "Functionomics"
Patient driven data and the mICF - mobile ICF "Functionomics"
Olaf Kraus de Camargo
 
Schproppt doc.final
Schproppt doc.finalSchproppt doc.final
Schproppt doc.final
Iwilliams1
 
Ehealth and participatory health informatics research
Ehealth and participatory health informatics researchEhealth and participatory health informatics research
Ehealth and participatory health informatics research
Kathleen Gray
 
Person-generated health data: How can it help us to feel better?
Person-generated health data: How can it help us to feel better?Person-generated health data: How can it help us to feel better?
Person-generated health data: How can it help us to feel better?
Kathleen Gray
 
Accessing and Sharing Electronic Personal Health Data.
Accessing and Sharing Electronic Personal Health Data.Accessing and Sharing Electronic Personal Health Data.
Accessing and Sharing Electronic Personal Health Data.
Maria Karampela
 
Accessing and Sharing Electronic Personal Health Data
Accessing and Sharing Electronic Personal Health DataAccessing and Sharing Electronic Personal Health Data
Accessing and Sharing Electronic Personal Health Data
Sofia Ouhbi
 
Advancing-OSHMS High-Performance WS in OHM
Advancing-OSHMS High-Performance WS in OHMAdvancing-OSHMS High-Performance WS in OHM
Advancing-OSHMS High-Performance WS in OHM
Christopher Bell, MSHI, CAPM, CHTS-IM
 
Using eHealth to manage chronic diseases in a person-centred approach to care
Using eHealth to manage chronic diseases in a person-centred approach to careUsing eHealth to manage chronic diseases in a person-centred approach to care
Using eHealth to manage chronic diseases in a person-centred approach to care
likewildfire
 
Towards an Environmental Health Sciences Ontology: CHEAR to HHEAR and Beyond
Towards an Environmental Health Sciences Ontology:CHEAR to HHEAR and BeyondTowards an Environmental Health Sciences Ontology:CHEAR to HHEAR and Beyond
Towards an Environmental Health Sciences Ontology: CHEAR to HHEAR and Beyond
Deborah McGuinness
 
Knowledge-enhanced Learning @ Kno.e.sis
Knowledge-enhanced Learning @ Kno.e.sisKnowledge-enhanced Learning @ Kno.e.sis
Knowledge-enhanced Learning @ Kno.e.sis
Artificial Intelligence Institute at UofSC
 
Critical Success Factors in Leading Healthcare IT Projects
Critical Success Factors in Leading Healthcare IT ProjectsCritical Success Factors in Leading Healthcare IT Projects
Critical Success Factors in Leading Healthcare IT Projects
Kaali Dass PMP, PhD.
 
Mie2014 workshop: Gap Analysis of Personalized Health Services through Patien...
Mie2014 workshop: Gap Analysis of Personalized Health Services through Patien...Mie2014 workshop: Gap Analysis of Personalized Health Services through Patien...
Mie2014 workshop: Gap Analysis of Personalized Health Services through Patien...
Pei-Yun Sabrina Hsueh
 
Statistics For Health Science and Its Impacts
Statistics For Health Science and Its ImpactsStatistics For Health Science and Its Impacts
Statistics For Health Science and Its Impacts
Cashews
 
Predictive Data Mining for Converged Internet of
Predictive Data Mining for Converged Internet ofPredictive Data Mining for Converged Internet of
Predictive Data Mining for Converged Internet of
James Kang
 
Obeid generic_2017-11
Obeid generic_2017-11Obeid generic_2017-11
Obeid generic_2017-11
Jihad Obeid
 
E-health technologies show promise in developing countries
E-health technologies show promise in developing countriesE-health technologies show promise in developing countries
E-health technologies show promise in developing countries
InSTEDD
 

Ähnlich wie GoodIT2021.pptx (20)

The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
 
Social Media Datasets for Analysis and Modeling Drug Usage
Social Media Datasets for Analysis and Modeling Drug UsageSocial Media Datasets for Analysis and Modeling Drug Usage
Social Media Datasets for Analysis and Modeling Drug Usage
 
Application Evaluation Project Part 1 Evaluation Plan FocusTec.docx
Application Evaluation Project Part 1 Evaluation Plan FocusTec.docxApplication Evaluation Project Part 1 Evaluation Plan FocusTec.docx
Application Evaluation Project Part 1 Evaluation Plan FocusTec.docx
 
Key Topics in Health Care Technology EvaluationThe amount of new i.docx
Key Topics in Health Care Technology EvaluationThe amount of new i.docxKey Topics in Health Care Technology EvaluationThe amount of new i.docx
Key Topics in Health Care Technology EvaluationThe amount of new i.docx
 
Patient driven data and the mICF - mobile ICF "Functionomics"
Patient driven data and the mICF - mobile ICF "Functionomics"Patient driven data and the mICF - mobile ICF "Functionomics"
Patient driven data and the mICF - mobile ICF "Functionomics"
 
Schproppt doc.final
Schproppt doc.finalSchproppt doc.final
Schproppt doc.final
 
Ehealth and participatory health informatics research
Ehealth and participatory health informatics researchEhealth and participatory health informatics research
Ehealth and participatory health informatics research
 
Person-generated health data: How can it help us to feel better?
Person-generated health data: How can it help us to feel better?Person-generated health data: How can it help us to feel better?
Person-generated health data: How can it help us to feel better?
 
Accessing and Sharing Electronic Personal Health Data.
Accessing and Sharing Electronic Personal Health Data.Accessing and Sharing Electronic Personal Health Data.
Accessing and Sharing Electronic Personal Health Data.
 
Accessing and Sharing Electronic Personal Health Data
Accessing and Sharing Electronic Personal Health DataAccessing and Sharing Electronic Personal Health Data
Accessing and Sharing Electronic Personal Health Data
 
Advancing-OSHMS High-Performance WS in OHM
Advancing-OSHMS High-Performance WS in OHMAdvancing-OSHMS High-Performance WS in OHM
Advancing-OSHMS High-Performance WS in OHM
 
Using eHealth to manage chronic diseases in a person-centred approach to care
Using eHealth to manage chronic diseases in a person-centred approach to careUsing eHealth to manage chronic diseases in a person-centred approach to care
Using eHealth to manage chronic diseases in a person-centred approach to care
 
Towards an Environmental Health Sciences Ontology: CHEAR to HHEAR and Beyond
Towards an Environmental Health Sciences Ontology:CHEAR to HHEAR and BeyondTowards an Environmental Health Sciences Ontology:CHEAR to HHEAR and Beyond
Towards an Environmental Health Sciences Ontology: CHEAR to HHEAR and Beyond
 
Knowledge-enhanced Learning @ Kno.e.sis
Knowledge-enhanced Learning @ Kno.e.sisKnowledge-enhanced Learning @ Kno.e.sis
Knowledge-enhanced Learning @ Kno.e.sis
 
Critical Success Factors in Leading Healthcare IT Projects
Critical Success Factors in Leading Healthcare IT ProjectsCritical Success Factors in Leading Healthcare IT Projects
Critical Success Factors in Leading Healthcare IT Projects
 
Mie2014 workshop: Gap Analysis of Personalized Health Services through Patien...
Mie2014 workshop: Gap Analysis of Personalized Health Services through Patien...Mie2014 workshop: Gap Analysis of Personalized Health Services through Patien...
Mie2014 workshop: Gap Analysis of Personalized Health Services through Patien...
 
Statistics For Health Science and Its Impacts
Statistics For Health Science and Its ImpactsStatistics For Health Science and Its Impacts
Statistics For Health Science and Its Impacts
 
Predictive Data Mining for Converged Internet of
Predictive Data Mining for Converged Internet ofPredictive Data Mining for Converged Internet of
Predictive Data Mining for Converged Internet of
 
Obeid generic_2017-11
Obeid generic_2017-11Obeid generic_2017-11
Obeid generic_2017-11
 
E-health technologies show promise in developing countries
E-health technologies show promise in developing countriesE-health technologies show promise in developing countries
E-health technologies show promise in developing countries
 

Kürzlich hochgeladen

Ear and its clinical correlations By Dr. Rabia Inam Gandapore.pptx
Ear and its clinical correlations By Dr. Rabia Inam Gandapore.pptxEar and its clinical correlations By Dr. Rabia Inam Gandapore.pptx
Ear and its clinical correlations By Dr. Rabia Inam Gandapore.pptx
Dr. Rabia Inam Gandapore
 
CHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdfCHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdf
rishi2789
 
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptxPost-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
FFragrant
 
Top Travel Vaccinations in Manchester
Top Travel Vaccinations in ManchesterTop Travel Vaccinations in Manchester
Top Travel Vaccinations in Manchester
NX Healthcare
 
Netter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdfNetter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdf
BrissaOrtiz3
 
The Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic PrinciplesThe Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic Principles
MedicoseAcademics
 
The Nervous and Chemical Regulation of Respiration
The Nervous and Chemical Regulation of RespirationThe Nervous and Chemical Regulation of Respiration
The Nervous and Chemical Regulation of Respiration
MedicoseAcademics
 
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptxREGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
LaniyaNasrink
 
Histopathology of Rheumatoid Arthritis: Visual treat
Histopathology of Rheumatoid Arthritis: Visual treatHistopathology of Rheumatoid Arthritis: Visual treat
Histopathology of Rheumatoid Arthritis: Visual treat
DIVYANSHU740006
 
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
rishi2789
 
Chapter 11 Nutrition and Chronic Diseases.pptx
Chapter 11 Nutrition and Chronic Diseases.pptxChapter 11 Nutrition and Chronic Diseases.pptx
Chapter 11 Nutrition and Chronic Diseases.pptx
Earlene McNair
 
Cardiac Assessment for B.sc Nursing Student.pdf
Cardiac Assessment for B.sc Nursing Student.pdfCardiac Assessment for B.sc Nursing Student.pdf
Cardiac Assessment for B.sc Nursing Student.pdf
shivalingatalekar1
 
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USENARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
Dr. Ahana Haroon
 
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
Holistified Wellness
 
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.GawadHemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
NephroTube - Dr.Gawad
 
Muscles of Mastication by Dr. Rabia Inam Gandapore.pptx
Muscles of Mastication by Dr. Rabia Inam Gandapore.pptxMuscles of Mastication by Dr. Rabia Inam Gandapore.pptx
Muscles of Mastication by Dr. Rabia Inam Gandapore.pptx
Dr. Rabia Inam Gandapore
 
Artificial Intelligence Symposium (THAIS)
Artificial Intelligence Symposium (THAIS)Artificial Intelligence Symposium (THAIS)
Artificial Intelligence Symposium (THAIS)
Josep Vidal-Alaball
 
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic ApproachIntegrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
Ayurveda ForAll
 
Hiranandani Hospital Powai News [Read Now].pdf
Hiranandani Hospital Powai News [Read Now].pdfHiranandani Hospital Powai News [Read Now].pdf
Hiranandani Hospital Powai News [Read Now].pdf
Dr. Sujit Chatterjee CEO Hiranandani Hospital
 
pathology MCQS introduction to pathology general pathology
pathology MCQS introduction to pathology general pathologypathology MCQS introduction to pathology general pathology
pathology MCQS introduction to pathology general pathology
ZayedKhan38
 

Kürzlich hochgeladen (20)

Ear and its clinical correlations By Dr. Rabia Inam Gandapore.pptx
Ear and its clinical correlations By Dr. Rabia Inam Gandapore.pptxEar and its clinical correlations By Dr. Rabia Inam Gandapore.pptx
Ear and its clinical correlations By Dr. Rabia Inam Gandapore.pptx
 
CHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdfCHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdf
CHEMOTHERAPY_RDP_CHAPTER 1_ANTI TB DRUGS.pdf
 
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptxPost-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
 
Top Travel Vaccinations in Manchester
Top Travel Vaccinations in ManchesterTop Travel Vaccinations in Manchester
Top Travel Vaccinations in Manchester
 
Netter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdfNetter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdf
 
The Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic PrinciplesThe Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic Principles
 
The Nervous and Chemical Regulation of Respiration
The Nervous and Chemical Regulation of RespirationThe Nervous and Chemical Regulation of Respiration
The Nervous and Chemical Regulation of Respiration
 
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptxREGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
 
Histopathology of Rheumatoid Arthritis: Visual treat
Histopathology of Rheumatoid Arthritis: Visual treatHistopathology of Rheumatoid Arthritis: Visual treat
Histopathology of Rheumatoid Arthritis: Visual treat
 
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
CHEMOTHERAPY_RDP_CHAPTER 2 _LEPROSY.pdf1
 
Chapter 11 Nutrition and Chronic Diseases.pptx
Chapter 11 Nutrition and Chronic Diseases.pptxChapter 11 Nutrition and Chronic Diseases.pptx
Chapter 11 Nutrition and Chronic Diseases.pptx
 
Cardiac Assessment for B.sc Nursing Student.pdf
Cardiac Assessment for B.sc Nursing Student.pdfCardiac Assessment for B.sc Nursing Student.pdf
Cardiac Assessment for B.sc Nursing Student.pdf
 
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USENARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
 
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
8 Surprising Reasons To Meditate 40 Minutes A Day That Can Change Your Life.pptx
 
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.GawadHemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
Hemodialysis: Chapter 4, Dialysate Circuit - Dr.Gawad
 
Muscles of Mastication by Dr. Rabia Inam Gandapore.pptx
Muscles of Mastication by Dr. Rabia Inam Gandapore.pptxMuscles of Mastication by Dr. Rabia Inam Gandapore.pptx
Muscles of Mastication by Dr. Rabia Inam Gandapore.pptx
 
Artificial Intelligence Symposium (THAIS)
Artificial Intelligence Symposium (THAIS)Artificial Intelligence Symposium (THAIS)
Artificial Intelligence Symposium (THAIS)
 
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic ApproachIntegrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
 
Hiranandani Hospital Powai News [Read Now].pdf
Hiranandani Hospital Powai News [Read Now].pdfHiranandani Hospital Powai News [Read Now].pdf
Hiranandani Hospital Powai News [Read Now].pdf
 
pathology MCQS introduction to pathology general pathology
pathology MCQS introduction to pathology general pathologypathology MCQS introduction to pathology general pathology
pathology MCQS introduction to pathology general pathology
 

GoodIT2021.pptx

  • 1. MSCA ITN/ETN No. 860721 Health Misinformation Detection in Web Content A Structural-, Content-based, and Context-aware Approach based on Web2Vec Rishabh Upadhyay Gabriella Pasi Marco Viviani University of Milano-Bicocca Department of Informatics, Systems, and Communication Information and Knowledge Representation, Retrieval and Reasoning (IKR3) LAB
  • 2. MSCA ITN/ETN No. 860721 Outline Scenario Literature Review: Interaction-based Approaches Literature review: Algorithm-based Approaches The Proposed Model: Cred2Vec Experimental Results Conclusions and Further Developments
  • 3. MSCA ITN/ETN No. 860721 Scenario: The Web, Information, and Misinformation ▪ Web → most popular information source to obtain “general-purpose” information and, in the last years, health-related information [1, 2] ▪ Web 2.0 → anyone can write almost “anything” without trusted external control [3] ▪ Misinformation → (Deliberately) false or incorrect information ▪ Consequences for society → Undermine people’s ability to make informed decisions and lead to harmful consequences
  • 4. MSCA ITN/ETN No. 860721 Scenario: Health Misinformation
  • 5. MSCA ITN/ETN No. 860721 Literature Review: Interaction-based Approaches ▪ Various indicators of “genuineness” → Source, Content, Design and Personal ▪ Source Indicators: – Domain type (.org, .gov, .edu) [4,5] – Owner Identify (Parent organisation, Educational Institutions) [5,6] – References [7] ▪ Content Indicators: – Content Types (Factual and Personal Information) [9] – Writing and Language (Grammar, Simple Terms, Conciseness) [10] – Author and Currency (Updating) [11, 12]
  • 6. MSCA ITN/ETN No. 860721 ▪ Design Indicators: – Interface Design (Font, Graphics) [13] – Interaction Design (Links, Logins) [14] – Navigation Design and Ads [13] ▪ Personal Indicators: – Known Websites (previously used websites) [15] – Health Literacy [8, 14] – Demographic Factors [16] – Religious Beliefs [16] Literature Review: Interaction-based Approaches
  • 7. MSCA ITN/ETN No. 860721 ▪ Types of approaches: Machine Learning, Knowledge-based and Semantic-based ▪ Machine Learning (most of the approaches): – Information: Textual, Source and Linguistic. – Features: Bag of words, N-grams, L.I.W.C., Affective features, Readability. – Algorithm: SVM, Naive Bayes, Random Forest, K-Nearest Neighbour, Logistic Regression, etc. ▪ Knowledge- and semantic-based methods (main approaches): – DETERRENT model focused on healthcare misinformation detection by leveraging the medical knowledge graph [18] – MedCIRCLE (Collaboration for Internet Rating, Certification, Labelling and Evaluation of Health Information) is a collaboration of medical communities to assess health information, by making a standardized machine-readable statement about the particular health Website [19] Literature Review: Algorithm-based Approaches
  • 8. MSCA ITN/ETN No. 860721 The Proposed Model: Cred2Vec ▪ We considered Web2Vec, a recently proposed model for spam detection in Web pages [20] – A related but not overlapping research issue ▪ We modified and applied it in the context of health misinformation: – Same deep learning architecture – Different features (i.e., Links and Domain-Specific Information) – Different and context-aware learning strategy ▪ Three phases: – Web parsing – Data representation – Feature extraction
  • 9. MSCA ITN/ETN No. 860721 Phase 1: Web parsing
  • 10. MSCA ITN/ETN No. 860721 Phase 2: Data representation
  • 11. MSCA ITN/ETN No. 860721 Phase 3: Feature extraction Conv Pooling BiLSTM Attention Layer
  • 12. MSCA ITN/ETN No. 860721 Experimental Setup: Datasets ▪ Microsoft Credibility Dataset – 1,000 Web pages in different domains such as Health, Finance and Politics – Human-based evaluation – Credibility → 1 to 5, where 1 stands for“very non-credible”, and 5 for “very credible" – 104 credible and 26 non-credible Web pages ▪ Medical Web Reliability Corpus – The dataset consists of 360 Web pages, 180 reliable and 180 unreliable – HON accredited Web pages – Unreliable Webpages→ disease name + "miracle cure" – 170 reliable and 176 unreliable Web pages ▪ CLEF eHealth 2020 Task-2 Dataset – 12,456 Web pages – Four point scale, from 0 to 3 – 0 non-credible, and 1-2 credible – 5,509 credible and 6,736 non-credible Web pages
  • 13. MSCA ITN/ETN No. 860721 Experimental Setup: Baselines ▪ Textual-based features: – Naive Bayes (NB) and Logistic regression (LR) with CountVec and Tf-iDF was proposed in a research [21] – After experimentation with all the three datasets – We considered NB_CountVec and LR_Tf-iDF ▪ Multi-feature-based: – SVM with Handcrafted features was proposed and used by few researchers [22] – Links, contact-us, commercial keywords, page rank and bag-of-words – We considered it as one of the baseline ▪ Domain-specific features: – BioBERT, pretrained on medical data, was used to extract embeddings. – BERT have produced good results for misinformation and fake news detection [22] – We considered SVM with BioBERT
  • 14. MSCA ITN/ETN No. 860721 ▪ We considered Accuracy, F1 and AUC as our evaluation metrics ▪ D1 - Microsoft Credibility Dataset ▪ D2 - Medical Web Reliability Dataset ▪ D3 - CLEF e-Health Task-2, 2020 ▪ Statistical significance → Confidence Intervals with 95% confidence Results
  • 15. MSCA ITN/ETN No. 860721 Conclusions and Further Developments ▪ First step for the investigation at a more general level ▪ Investigation of presence of links, structural and content features in Web pages ▪ Additional features (e.g., semantic-based) from Web pages ▪ Integration of misinformation detection in the Information Retrieval model1 1. https://theconversation.com/its-not-just-a-social-media-problem-how-search-engines-spread-misinformation- 152155
  • 16. MSCA ITN/ETN No. 860721 References 1. Beaudoin, C. E., & Hong, T. (2011). Health information seeking, diet and physical activity: an empirical assessment by medium and critical demographics. International journal of medical informatics, 80(8), 586- 595. 2. Rass, S. (2021). Judging the quality of (fake) news on the internet. Mind & Society, 20(1), 129-133. 3. Volkman, J. E., Luger, T. M., Harvey, K. L., Hogan, T. P., Shimada, S. L., Amante, & Houston, T. K. (2014). The National Cancer Institute’s Health Information National Trends Survey [HINTS]: a national cross-sectional analysis of talking to your doctor and other healthcare providers for health information. BMC Family Practice, 15(1), 1-8. 4. Alsem, M. W., Ausems, F., Verhoef, M., Jongmans, M. J., Meily-Visser, J. M. A., & Ketelaar, M. (2017). Information seeking by parents of children with physical disabilities: An exploratory qualitative study. Research in developmental disabilities, 60, 125-134. 5. McPherson, A. C., Gofine, M. L., & Stinson, J. (2014). Seeing is believing? A mixed-methods study exploring the quality and perceived trustworthiness of online information about chronic conditions aimed at children and young people. Health communication, 29(5), 473-482. 6. Peddie, K. A., & Kelly-Campbell, R. J. (2017). How people with hearing impairment in New Zealand use the Internet to obtain information about their hearing health. Computers in human behavior, 73, 141-151. 7. Sun, Y., Zhang, Y., Gwizdka, J., & Trace, C. B. (2019). Consumer evaluation of the quality of online health information: systematic literature review of relevant criteria and indicators. Journal of medical Internet research, 21(5), e12522. 8. Paglialonga, A., Nielsen, A. C., Ingo, E., Barr, C., & Laplante-Lévesque, A. (2018). eHealth and the hearing aid adult patient journey: A state-of-the-art review. Biomedical engineering online, 17(1), 1-26.
  • 17. MSCA ITN/ETN No. 860721 References 9. Diviani, N., Van den Putte, B., Meppelink, C. S., & van Weert, J. C. (2016). Exploring the role of health literacy in the evaluation of online health information: insights from a mixed-methods study. Patient education and counseling, 99(6), 1017-1025. 10. Kerr, C., Murray, E., Stevenson, F., Gore, C., & Nazareth, I. (2006). Internet interventions for long-term conditions: patient and caregiver quality criteria. Journal of medical Internet research, 8(3), e13. 11. Marton, C. (2010). How women with mental health conditions evaluate the quality of information on mental health websites: a qualitative approach. Journal of Hospital Librarianship, 10(3), 235-250. 12. Champlin, S., Mackert, M., Glowacki, E. M., & Donovan, E. E. (2017). Toward a better understanding of patient health literacy: A focus on the skills patients need to find health information. Qualitative Health Research, 27(8), 1160-1176. 13. Briones, R. (2015). Harnessing the web: how e-Health and e-Health literacy impact young adults’ perceptions of online health information. Medicine 2.0, 4(2). 14. Chang, Y. S., Zhang, Y., & Gwizdka, J. (2021). The effects of information source and eHealth literacy on consumer health information credibility evaluation behavior. Computers in Human Behavior, 115, 106629. 15. Feufel, M. A., & Stahl, S. F. (2012). What do web-use skill differences imply for online health information searches?. Journal of medical Internet research, 14(3), e87. 16. Hoffman-Goetz, L., & Friedman, D. B. (2007). A qualitative study of Canadian Aboriginal women’s beliefs about “credible” cancer information on the internet. Journal of Cancer Education, 22(2), 124-128.
  • 18. MSCA ITN/ETN No. 860721 References 17. Cui, L., Seo, H., Tabar, M., Ma, F., Wang, S., & Lee, D. (2020, August). Deterrent: Knowledge guided graph attention network for detecting healthcare misinformation. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 492-502). 18. Mayer, M. A., Darmoni, S. J., Fiene, M., Köhler, C., Roth-Berghofer, T. R., & Eysenbach, G. (2003). MedCIRCLE: collaboration for Internet rating, certification, labelling and evaluation of health information on the World-Wide-Web. In The New Navigators: from Professionals to Patients (pp. 667- 672). IOS Press. 19. Malhotra, P., Burstein, F., Fisher, J., McKemmish, S., Anderson, J., & Manaszewicz, R. (2003). Breast cancer knowledge online portal: An intelligent decision support system perspective. 20. Feng, J., Zou, L., Ye, O., & Han, J. (2020). Web2Vec: Phishing Webpage Detection Method Based on Multidimensional Features Driven by Deep Learning. IEEE Access, 8, 221214-221224. 21. Fernández-Pichel, M., Losada, D. E., Pichel, J. C., & Elsweiler, D. (2021, March). Reliability prediction for health-related content: A replicability study. In European Conference on Information Retrieval (pp. 47-61). Springer, Cham. 22. Meppelink, C. S., Hendriks, H., Trilling, D., van Weert, J. C., Shao, A., & Smit, E. S. (2021). Reliable or not? An automated classification of webpages about early childhood vaccination using supervised machine learning. Patient Education and Counseling, 104(6), 1460-1466.