SlideShare ist ein Scribd-Unternehmen logo
1 von 37
Tools of Research: Reliability and Validity Louzel M Linejan Presenter
Sequence of Discussion: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Reliability
Reliability ●  r efers to how consistently data are collected (Lee 2004) ●  the degree to which a test consistently measures  whatever it measures and indicates the consistency of the scores produced (Raagas,2009). ●  the extent to which results are consistent over time and an accurate representation of the total population under study (Joppe 2000). ●  concerns with the replicability and consistency of the methods, conditions and results (Wiersa and Jurs, 2005).
Reliability ●  expressed numerically, usually as a coefficient ranging from 0.0 to 1.0; meaning the score of the respondent perfectly reflected their  true status with respect to the variable being measured. If a test is perfectly reliable, the reliability coefficient is 1.0 ; meaning the score of the respondent perfectly reflected their  true status with respect to the variable being measured. ●  no test is perfectly reliable and the scores are invariably affected by errors of measurements resulting from a variety of causes.
Methods of Estimating Reliability 1.  Stability  ( also called Test – Retest Reliability)   - the degree to which results/scores on the same test are consistent over time. The more similar the scores on the test over time, the more stable or consistent are the scores. - indicates score variation that occurs from one testing session to another.  - provides evidence that scores obtained on a test at one time (test), are the same or close to the same when the test is readministered some other time (retest).
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Methods of Estimating Reliability
[object Object],[object Object],[object Object],[object Object],[object Object],Methods of Estimating Reliability
2. Equivalence (or Equivalent Forms) - Two tests that are identical, except for the actual items included.  - The two forms measure the same variable, have the same number of items, the same structure, the same difficulty level, and the same direction for administration, scoring and interpretation.  - If there is equivalence, the two tests can be used interchangeably. The correlation between scores on the two forms will yield an estimate of their reliability.  Methods of Estimating Reliability
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Methods of Estimating Reliability
3.  Internal Consistency Reliability  (Methods of Internal Analysis)   - commonly used form of reliability which deals with one test a time. This is obtained through  Split-Half ,  Kuder-Richardson  and  Cronbach Coefficient Alpha . Each provides information about the consistency among the items in a single test.   - applicable to instruments that have more than one item  as it refers to how homogenous the items of a test are; or how well the measure of a single construct Methods of Estimating Reliability
3.  Internal Consistency Reliability  a. Split-Half Reliability - A common approach is to split a test into two reasonable equivalent halves. These independent subjects are then used as a source of the two independent scores needed for reliability’s estimation.  - simplest statistical technique; randomly splits the questionnaire items into 2 groups. A score for each participant is then calculated based on each half of the scale.  Methods of Estimating Reliability
a. Split-Half Reliability Methods of Estimating Reliability This procedure requires only 1 administration of the test. Test items are divided into 2 halves, with the items of the 2  halves are then scores independently.  The problem with this method is that there are several ways in which a set of data can be split  into two and so the results might stem from the way in which the data were split.
3.  Internal Consistency Reliability  b. Kuder-Richardson   Kuder and Richardson developed two of the most widely accepted methods for estimating reliability. These are the  K-R20 and K-R21 . These estimate the consistency reliability by determining how all items in a test relate to all other test items and the whole test. These are useful for true-false and the multiple choice items. Methods of Estimating Reliability
3.  Internal Consistency Reliability  b. Kuder-Richardson     K – R20  =  most advisable if the “proportion of correct responses to a particular item” vary a lot; provide the mean of all possible split-half coefficients Methods of Estimating Reliability K – R21  =  most advisable if the items do not vary much in difficulty, i.e., the “proportion of correct responses to a particular item” are more or less similar; may be substituted for K-R20 if it can be assumed that item difficulty levels are similar.
Methods of Estimating Reliability 3.  Internal Consistency Reliability  c. Cronbach Coefficient Alpha   used only if the item scores are other than 0 &  1. This is advisable for essay items, problem solving and 5-scaled items. ; based on 2 or more parts of the test, requires only one administration of the test.
Validity
Validity ●  degree to which a test measures what is supposed to measure and consequently, permits appropriate interpretations of test scores (Raagas,2009). ●  Validity determines whether the research truly measures that which it was intended to measure or how truthful the research results are (Joppe 2000). ●  refers to the ability of the survey questions to accurately measure what they claim to measure (Lee 2004) ●  anwers the question: Are we measuring what we want to measure? (Muijs, 2004)
Validity ●  Validity’s 3 terms of degrees:   - highly valid   - moderately valid   - generally invalid ●  The validation process begins with an understanding of the interpretation to be made from the tests or instruments.
Forms of Validity ●  CONTENT VALIDITY ●  CONSTRUCT VALIDITY ●  CRITERION-RELATED VALIDITY
Content Validity ,[object Object],[object Object],[object Object],[object Object]
Content Validity Requires Item Validity and Sampling Validity Item Validity - concerned whether the test items are relevant to the intended content area Sampling Validity - concerned with how well the test sample represents the total content area
Criterion-related Validity ,[object Object],[object Object],[object Object]
Criterion-related Validity ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Construct Validity ,[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],Factors affecting Validity
[object Object],[object Object],[object Object],Factors affecting Validity
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Factors affecting Validity
Reliability and Validity Suppose the reported reliability coefficient for a test was 0.24, this definitely is not good. Would this tell something about the validity of the test?  What if a test is so hard and no respondent could answer even a single item?  Scores would still be consistent, but not valid. If a test measures what it is supposed to measure, it is reliable, but a reliable test can consistently measure the wrong thing and be invalid. Yes, it would. It would show that the validity is not high because if it were, the reliability would be higher.
Reliability and Validity Reliability is necessary but not sufficient for establishing validity. A valid test is always reliable but a reliable test is not always valid. What if the reported reliability was 0.92, which is definitely high. Would this tell anything about validity?  “ not really”. It would only indicate that the test validity might be also high, because the reliability is high, but not necessarily; the test could be consistently measuring the wrong thing.
Thank You!
- ensure that the quality of questions we ask is clear and unambiguous. Unambiguous and clear question are likely to be more reliable, and the same goes for items on a rating scale for observers. - Another way to make an instrument more reliable is by measuring it with  more than one item. - ensure that dependent variable is measured as precisely as possinble What can we do to make our instruments more reliable?
a. Split-Half Reliability When we need to predict the reliability of a test twice as long as given test, as in the split halves method, the formula is shown below: Methods of Estimating Reliability The problem with this method is that there are several ways in which a set of data can be split  into two and so the results might stem from the way in which the data were split.
b. Kuder-Richardson     K – R20  =  most advisable if the p values vary a lot Methods of Estimating Reliability K – R21  =  most advisable if the items do not vary much in difficulty, i.e., the p values are more or less similar
c. Cronbach Coefficient Alpha   used only if the item scores are other than 0 & 1. This is advisable for essay items, problem solving and 5-scaled items. Methods of Estimating Reliability where  si   = standard deviation of a single test item and  S  = standard deviation of the total score of each examinee.
[object Object],[object Object],Reliability
[object Object],[object Object],Validity

Weitere ähnliche Inhalte

Was ist angesagt?

Validity & reliability seminar
Validity & reliability seminarValidity & reliability seminar
Validity & reliability seminarmrikara185
 
Reliability
ReliabilityReliability
ReliabilityRoi Xcel
 
Validity, its types, measurement & factors.
Validity, its types, measurement & factors.Validity, its types, measurement & factors.
Validity, its types, measurement & factors.Maheen Iftikhar
 
Reliability for testing and assessment
Reliability for testing and assessmentReliability for testing and assessment
Reliability for testing and assessmentErlwinmer Mangmang
 
Presentation validity
Presentation validityPresentation validity
Presentation validityAshMusavi
 
VALIDITY
VALIDITYVALIDITY
VALIDITYANCYBS
 
Understanding reliability and validity
Understanding reliability and validityUnderstanding reliability and validity
Understanding reliability and validityMuhammad Faisal
 
Validity of Assessment Tools
Validity of Assessment ToolsValidity of Assessment Tools
Validity of Assessment ToolsUmairaNasim
 
Reliability & validity
Reliability & validityReliability & validity
Reliability & validityshefali84
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good testALMA HERMOGINO
 
Comparison of criterion referenced and norm referenced assessment
Comparison of criterion referenced and norm  referenced assessmentComparison of criterion referenced and norm  referenced assessment
Comparison of criterion referenced and norm referenced assessmentDr. Amjad Ali Arain
 
Classical Test Theory and Item Response Theory
Classical Test Theory and Item Response TheoryClassical Test Theory and Item Response Theory
Classical Test Theory and Item Response Theorysaira kazim
 
Reliability and its types: Split half method and test retest methods
Reliability and its types: Split half method and test retest methodsReliability and its types: Split half method and test retest methods
Reliability and its types: Split half method and test retest methodsAamir Hussain
 
Reliability and validity ppt
Reliability and validity pptReliability and validity ppt
Reliability and validity pptsurendra poudel
 
Characteristics of a Good Test
Characteristics of a Good TestCharacteristics of a Good Test
Characteristics of a Good TestAjab Ali Lashari
 

Was ist angesagt? (20)

Reliability
ReliabilityReliability
Reliability
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
Validity & reliability seminar
Validity & reliability seminarValidity & reliability seminar
Validity & reliability seminar
 
Reliability
ReliabilityReliability
Reliability
 
Validity in Assessment
Validity in AssessmentValidity in Assessment
Validity in Assessment
 
Validity, its types, measurement & factors.
Validity, its types, measurement & factors.Validity, its types, measurement & factors.
Validity, its types, measurement & factors.
 
Reliability for testing and assessment
Reliability for testing and assessmentReliability for testing and assessment
Reliability for testing and assessment
 
Presentation validity
Presentation validityPresentation validity
Presentation validity
 
VALIDITY
VALIDITYVALIDITY
VALIDITY
 
Understanding reliability and validity
Understanding reliability and validityUnderstanding reliability and validity
Understanding reliability and validity
 
Validity of Assessment Tools
Validity of Assessment ToolsValidity of Assessment Tools
Validity of Assessment Tools
 
Validity & reliability
Validity & reliabilityValidity & reliability
Validity & reliability
 
Reliability & validity
Reliability & validityReliability & validity
Reliability & validity
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good test
 
Comparison of criterion referenced and norm referenced assessment
Comparison of criterion referenced and norm  referenced assessmentComparison of criterion referenced and norm  referenced assessment
Comparison of criterion referenced and norm referenced assessment
 
Classical Test Theory and Item Response Theory
Classical Test Theory and Item Response TheoryClassical Test Theory and Item Response Theory
Classical Test Theory and Item Response Theory
 
Reliability and its types: Split half method and test retest methods
Reliability and its types: Split half method and test retest methodsReliability and its types: Split half method and test retest methods
Reliability and its types: Split half method and test retest methods
 
Reliability and validity ppt
Reliability and validity pptReliability and validity ppt
Reliability and validity ppt
 
Characteristics of a Good Test
Characteristics of a Good TestCharacteristics of a Good Test
Characteristics of a Good Test
 
Validity
ValidityValidity
Validity
 

Ähnlich wie Louzel Report - Reliability & validity

RELIABILITY AND VALIDITY
RELIABILITY AND VALIDITYRELIABILITY AND VALIDITY
RELIABILITY AND VALIDITYJoydeep Singh
 
Validity, Reliability ,Objective & Their Types
Validity, Reliability ,Objective & Their TypesValidity, Reliability ,Objective & Their Types
Validity, Reliability ,Objective & Their TypesMohammadRabbani18
 
Characteristics of a good test
Characteristics of a good test Characteristics of a good test
Characteristics of a good test Arash Yazdani
 
Evaluation of Measurement Instruments.ppt
Evaluation of Measurement Instruments.pptEvaluation of Measurement Instruments.ppt
Evaluation of Measurement Instruments.pptCityComputers3
 
Valiadity and reliability- Language testing
Valiadity and reliability- Language testingValiadity and reliability- Language testing
Valiadity and reliability- Language testingPhuong Tran
 
Characteristics of effective tests and hiring
Characteristics of effective tests and hiringCharacteristics of effective tests and hiring
Characteristics of effective tests and hiringBinibining Kalawakan
 
Characteristics of Good Evaluation Instrument
Characteristics of Good Evaluation InstrumentCharacteristics of Good Evaluation Instrument
Characteristics of Good Evaluation InstrumentSuresh Babu
 
Validity of test
Validity of testValidity of test
Validity of testSarat Rout
 
reliablity and validity in social sciences research
reliablity and validity  in social sciences researchreliablity and validity  in social sciences research
reliablity and validity in social sciences researchSourabh Sharma
 
Chapter 8 compilation
Chapter 8 compilationChapter 8 compilation
Chapter 8 compilationHannan Mahmud
 
Reliability in Language Testing
Reliability in Language Testing Reliability in Language Testing
Reliability in Language Testing Seray Tanyer
 
Testing in language programs (chapter 8)
Testing in language programs (chapter 8)Testing in language programs (chapter 8)
Testing in language programs (chapter 8)Tahere Bakhshi
 
Adapted from Assessment in Special and incl.docx
Adapted from Assessment in Special and incl.docxAdapted from Assessment in Special and incl.docx
Adapted from Assessment in Special and incl.docxnettletondevon
 

Ähnlich wie Louzel Report - Reliability & validity (20)

RELIABILITY AND VALIDITY
RELIABILITY AND VALIDITYRELIABILITY AND VALIDITY
RELIABILITY AND VALIDITY
 
Qualities of good evaluation tool (1)
Qualities of good evaluation  tool (1)Qualities of good evaluation  tool (1)
Qualities of good evaluation tool (1)
 
EM&E.pptx
EM&E.pptxEM&E.pptx
EM&E.pptx
 
Validity, Reliability ,Objective & Their Types
Validity, Reliability ,Objective & Their TypesValidity, Reliability ,Objective & Their Types
Validity, Reliability ,Objective & Their Types
 
Rep
RepRep
Rep
 
Monika seminar
Monika seminarMonika seminar
Monika seminar
 
Monika seminar
Monika seminarMonika seminar
Monika seminar
 
Characteristics of a good test
Characteristics of a good test Characteristics of a good test
Characteristics of a good test
 
Evaluation of Measurement Instruments.ppt
Evaluation of Measurement Instruments.pptEvaluation of Measurement Instruments.ppt
Evaluation of Measurement Instruments.ppt
 
Valiadity and reliability- Language testing
Valiadity and reliability- Language testingValiadity and reliability- Language testing
Valiadity and reliability- Language testing
 
Validity
ValidityValidity
Validity
 
Characteristics of effective tests and hiring
Characteristics of effective tests and hiringCharacteristics of effective tests and hiring
Characteristics of effective tests and hiring
 
Characteristics of Good Evaluation Instrument
Characteristics of Good Evaluation InstrumentCharacteristics of Good Evaluation Instrument
Characteristics of Good Evaluation Instrument
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
Validity of test
Validity of testValidity of test
Validity of test
 
reliablity and validity in social sciences research
reliablity and validity  in social sciences researchreliablity and validity  in social sciences research
reliablity and validity in social sciences research
 
Chapter 8 compilation
Chapter 8 compilationChapter 8 compilation
Chapter 8 compilation
 
Reliability in Language Testing
Reliability in Language Testing Reliability in Language Testing
Reliability in Language Testing
 
Testing in language programs (chapter 8)
Testing in language programs (chapter 8)Testing in language programs (chapter 8)
Testing in language programs (chapter 8)
 
Adapted from Assessment in Special and incl.docx
Adapted from Assessment in Special and incl.docxAdapted from Assessment in Special and incl.docx
Adapted from Assessment in Special and incl.docx
 

Kürzlich hochgeladen

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Kürzlich hochgeladen (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Louzel Report - Reliability & validity

  • 1. Tools of Research: Reliability and Validity Louzel M Linejan Presenter
  • 2.
  • 4. Reliability ● r efers to how consistently data are collected (Lee 2004) ● the degree to which a test consistently measures whatever it measures and indicates the consistency of the scores produced (Raagas,2009). ● the extent to which results are consistent over time and an accurate representation of the total population under study (Joppe 2000). ● concerns with the replicability and consistency of the methods, conditions and results (Wiersa and Jurs, 2005).
  • 5. Reliability ● expressed numerically, usually as a coefficient ranging from 0.0 to 1.0; meaning the score of the respondent perfectly reflected their true status with respect to the variable being measured. If a test is perfectly reliable, the reliability coefficient is 1.0 ; meaning the score of the respondent perfectly reflected their true status with respect to the variable being measured. ● no test is perfectly reliable and the scores are invariably affected by errors of measurements resulting from a variety of causes.
  • 6. Methods of Estimating Reliability 1. Stability ( also called Test – Retest Reliability)   - the degree to which results/scores on the same test are consistent over time. The more similar the scores on the test over time, the more stable or consistent are the scores. - indicates score variation that occurs from one testing session to another. - provides evidence that scores obtained on a test at one time (test), are the same or close to the same when the test is readministered some other time (retest).
  • 7.
  • 8.
  • 9. 2. Equivalence (or Equivalent Forms) - Two tests that are identical, except for the actual items included. - The two forms measure the same variable, have the same number of items, the same structure, the same difficulty level, and the same direction for administration, scoring and interpretation. - If there is equivalence, the two tests can be used interchangeably. The correlation between scores on the two forms will yield an estimate of their reliability. Methods of Estimating Reliability
  • 10.
  • 11. 3. Internal Consistency Reliability (Methods of Internal Analysis)   - commonly used form of reliability which deals with one test a time. This is obtained through Split-Half , Kuder-Richardson and Cronbach Coefficient Alpha . Each provides information about the consistency among the items in a single test.   - applicable to instruments that have more than one item as it refers to how homogenous the items of a test are; or how well the measure of a single construct Methods of Estimating Reliability
  • 12. 3. Internal Consistency Reliability a. Split-Half Reliability - A common approach is to split a test into two reasonable equivalent halves. These independent subjects are then used as a source of the two independent scores needed for reliability’s estimation. - simplest statistical technique; randomly splits the questionnaire items into 2 groups. A score for each participant is then calculated based on each half of the scale. Methods of Estimating Reliability
  • 13. a. Split-Half Reliability Methods of Estimating Reliability This procedure requires only 1 administration of the test. Test items are divided into 2 halves, with the items of the 2 halves are then scores independently. The problem with this method is that there are several ways in which a set of data can be split into two and so the results might stem from the way in which the data were split.
  • 14. 3. Internal Consistency Reliability b. Kuder-Richardson   Kuder and Richardson developed two of the most widely accepted methods for estimating reliability. These are the K-R20 and K-R21 . These estimate the consistency reliability by determining how all items in a test relate to all other test items and the whole test. These are useful for true-false and the multiple choice items. Methods of Estimating Reliability
  • 15. 3. Internal Consistency Reliability b. Kuder-Richardson   K – R20 = most advisable if the “proportion of correct responses to a particular item” vary a lot; provide the mean of all possible split-half coefficients Methods of Estimating Reliability K – R21 = most advisable if the items do not vary much in difficulty, i.e., the “proportion of correct responses to a particular item” are more or less similar; may be substituted for K-R20 if it can be assumed that item difficulty levels are similar.
  • 16. Methods of Estimating Reliability 3. Internal Consistency Reliability c. Cronbach Coefficient Alpha used only if the item scores are other than 0 & 1. This is advisable for essay items, problem solving and 5-scaled items. ; based on 2 or more parts of the test, requires only one administration of the test.
  • 18. Validity ● degree to which a test measures what is supposed to measure and consequently, permits appropriate interpretations of test scores (Raagas,2009). ● Validity determines whether the research truly measures that which it was intended to measure or how truthful the research results are (Joppe 2000). ● refers to the ability of the survey questions to accurately measure what they claim to measure (Lee 2004) ● anwers the question: Are we measuring what we want to measure? (Muijs, 2004)
  • 19. Validity ● Validity’s 3 terms of degrees: - highly valid - moderately valid - generally invalid ● The validation process begins with an understanding of the interpretation to be made from the tests or instruments.
  • 20. Forms of Validity ● CONTENT VALIDITY ● CONSTRUCT VALIDITY ● CRITERION-RELATED VALIDITY
  • 21.
  • 22. Content Validity Requires Item Validity and Sampling Validity Item Validity - concerned whether the test items are relevant to the intended content area Sampling Validity - concerned with how well the test sample represents the total content area
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29. Reliability and Validity Suppose the reported reliability coefficient for a test was 0.24, this definitely is not good. Would this tell something about the validity of the test? What if a test is so hard and no respondent could answer even a single item? Scores would still be consistent, but not valid. If a test measures what it is supposed to measure, it is reliable, but a reliable test can consistently measure the wrong thing and be invalid. Yes, it would. It would show that the validity is not high because if it were, the reliability would be higher.
  • 30. Reliability and Validity Reliability is necessary but not sufficient for establishing validity. A valid test is always reliable but a reliable test is not always valid. What if the reported reliability was 0.92, which is definitely high. Would this tell anything about validity? “ not really”. It would only indicate that the test validity might be also high, because the reliability is high, but not necessarily; the test could be consistently measuring the wrong thing.
  • 32. - ensure that the quality of questions we ask is clear and unambiguous. Unambiguous and clear question are likely to be more reliable, and the same goes for items on a rating scale for observers. - Another way to make an instrument more reliable is by measuring it with more than one item. - ensure that dependent variable is measured as precisely as possinble What can we do to make our instruments more reliable?
  • 33. a. Split-Half Reliability When we need to predict the reliability of a test twice as long as given test, as in the split halves method, the formula is shown below: Methods of Estimating Reliability The problem with this method is that there are several ways in which a set of data can be split into two and so the results might stem from the way in which the data were split.
  • 34. b. Kuder-Richardson   K – R20 = most advisable if the p values vary a lot Methods of Estimating Reliability K – R21 = most advisable if the items do not vary much in difficulty, i.e., the p values are more or less similar
  • 35. c. Cronbach Coefficient Alpha   used only if the item scores are other than 0 & 1. This is advisable for essay items, problem solving and 5-scaled items. Methods of Estimating Reliability where si = standard deviation of a single test item and S = standard deviation of the total score of each examinee.
  • 36.
  • 37.

Hinweis der Redaktion

  1. Raagas - provides the most revealing statistical index of quality that is ordinarily available to judge a measuring instrument Lee - Reliability can be jeopardized if the wording of a survey is confusing or if the survey interviewer misinterprets a question. Wiersa and Jurs - Reliability refers to the consistency of the research and the extent to which studies can be replicated. ----Findings are reliable when various researchers using the same approach would find the same result
  2. high reliability indicates minimum error variance; reliability is necessary characteristic for validity, that is, a study cannot be valid if its lacks reliability. If a study is unreliable, we can hardly interpret the results with confidence.
  3. There are different types of reliability, each of which deals with a different kind of test of consistency. Each is determined in a different manner. Stability * after 2 nd - Test reliability is important for tests used as predictors such as aptitude tests, affective and questionnaire instruments, since these measures are based heavily on the assumption that the scores will be stable over time. Such tests would not be useful if they produced very different scores at different times. The same test is given twice on a particular topic. This would provide two score for each individual tested. The correlation between the two sets of scores will yield the rest-retest reliability coefficient.
  4. Generally, though not universally, a period of six (6) weeks is used to determine a test’s reliability.
  5. If the same group takes both tests, the average score as well as the degree of score variability should be essentially the same on both tests.
  6. The major problem involved with this method of estimating reliability is the difficulty of constructing two forms that are essentially equivalent
  7. *1 st tab -- One standard method of splitting a test has been to score the odd-numbered items and the even-numbered items separately. Then the correlation between scores on the odd and even-numbered items is calculated. Of course, splitting a test in this way implies that the scores on which the reliability is based are from half-length tests. To obtain an estimate of the reliability of the full-length test, is necessary to correct, or step up the half test correlation to the full-length correlation. Split half –If the scale is very reliable, we’d expect a person’s score to be the same on one half of the scale as the other, and so the two halves should correlated perfectly. The correlation between the two halves is the statistic computed in the split-half method – large correlations being a sign of reliability. Split half reliability works as follows: say we have an attitude to teaching measure that consists of 10 tems, first we randomly split the test into 2 (for example, the even and the odd items.) then we calculate respondents’ scores on each ‘half test’. We can then see whether the two scores are related to one another. If they are both measuring the same thing. We would expect them to be strongly related, with a correlation coefficient of over 0.8. we would expect this measure to be over 0.7 before we can say that our test is consistent.
  8.   * last tab -  If the test is reliable, the scores on the 2 halves have high positive association. An individual scoring high on one half would tend to score high on the other half, and vice versa.
  9.  
  10. - require only one administration of a test. - Applicable to binary data, for example, responses to items on achievement tests for which the response to an item is either correct or incorrect.
  11. Multipoint data – attitude scale item that has 5 response options. For this reason, CCA is commonly used to estimate consistency of attitude scale
  12. - Essentially, reliability and validity establish the credibility of the research. Reliability focuses on replicability and validity focuses on the accuracy of the findings. attaining reliability does not assure the validity of research. For example, observers could agree on the conclusions and yet the conclusions could be in error. If conclusions cannot be drawn with confidence, there are deficiencies in the research procedures and the study lacks validity.
  13. Raagas - validity is important in all forms of research and all types of tests and measures Lee - an item designed to measure customer awareness of a given service should measure awareness and not another related concept. The validation process begins with an understanding of the interpretation to be made from the tests or instruments. - In other words, does the research instrument allow you to hit "the bull’s eye" of your research object? Researchers generally determine validity by asking a series of questions, and will often look for the answers in the research of others.
  14. the validity of your results is not guaranteed by following some prescribed procedures. as Brinberg and McGrath (1985) put it, “Validity is not a commodity that can be purchased with techniques. Instead it depends on the relationship of your conclusions to the real world, and there are no methods that can assure you that you have adequately grasped those aspects of the ___ that you are studying. validity is a goal rather than a product; it is never something that can be proven or taken for granted. Validity is also relative – it has to be assessed in relationship to the purposes and circumstances of the research, rather than being a context-independent property of methods or conclusions. Finally, validity threat are made implausible by evidence, not methods; methods are only a way of getting evidence that can help you rule out these threats. in the simplest terms, findings are valid when the researcher can draw meaningful inferences from instruments that measure what they intend to measure so validity basically means “measuring what you think you’re measuring.
  15. Validity has 3 distinct aspects , all of which are important. - Content validity is also invoked when the argument is made that the measurement so self-evidently reflects or represents the various aspects of the phenomenon being researched on. This faith is supported by little more than common sense. -Content validity refers to the degree to which adequate data is collected as it relates to the construct being measured . For example, a survey measuring the image of the library in the community would need to include questions that cover the opinions, attitudes and knowledge relevant to the study. Content validity is usually not measured quantitatively but is derived from the researchers’ knowledge of the community and the relevant theoretical literature. The items in your questionnaire must relate to the construct being measured. For example, a questionnaire measuring the effectiveness of a teacher is useless if it contains items relating to ____. This validity is achieved when items are first selected; don’t include items that are blatantly very similar to other items and ensure that questions cover the full range of construct. The extent to which the content of the test items reflects whatever is under study.
  16. CV is determined by expert judgement of item and sample validity, not statistical means.
  17.   Criterion-related validity measures the ability of survey instrument or question to predict or estimate. While not often employed in library-related surveys, criterion-related validity is vital in areas such as pre-election, political surveys. Criterion validity is closely related to theory. When you are developing a measure, you usually expect it- in theory at last – to be related to other measures or to predict certain outcomes. For example, if we develop a new mathematics test, we would expect the scores pupils achieve on that test not to be totally unrelated to those they get on a state-mandated mathematics test. What is needed to establish criterion validity are two things: a good knowledge of theory relating to the concept so that we can decide what variables we can expect to be predictive by and related to it; and a measure of the relationship between our measure and those factors.   -this is whether the questionnaire is measuring what it claims to measure. In an ideal world, you could assess this by relating scores on each item to real world observations. (comparing scores on sociability items with the number of times a person actually goes out to socialize). This is often impractical and so there are other techniques such as : 1. use the questionnaire in variety of situations and see how predictive it is 2. see how well it correlates with other known measures of your construct (sociable people might be expected to score highly on extroversion scales) 3. there are statistical techniques such as Item Validity Index.
  18. CONCURRENT VALIDITY The question here is whether scores on your instrument agree with scores on other factors you would expect to be related to it. For example, if you were to measure attitudes to school, you would, from theory expect some relationship with school achievement. Likewise, when designing a measure of pupil learning in geography, you would expect there to be relationship with scores on previously existing measures of learning in that subject. PREDICTIVE VALIDITY Refers to whether or no the instrument you are using predicts the outcomes you would theoretically expect it too. For example, when we select students to study on our university courses, we will use their scores on specific test to determine whether or not they are likely to be successfully complete the course and are therefore suitable candidates. Any test we use for this purpose should therefore predict academic success. Likewise, whenever we develop a screening test for selection of employees, we expect this test to predict how well the prospective employee will do the job
  19.   The third type of validity, construct validity is used to “measure or infer the presence of abstract characteristics for which no empirical evidence seems possible. Construct validity can be one of the more difficult psychometric aspects to measure. It draws upon the skill and knowledge of the researcher when forming questions for the survey instrument. Slightly more complex issue relating to the internal structure of an instrument and the concept it is measuring 1 way… = the theory itself may be incorrect, making this approach hazardous. This is one reason why little construct validation is attempted in some researches. A more significant reason is the lack of well-established measures that can be used in a variety of circumstances. Instead, some researchers tend to develop measures for each specific problem or survey and rely on content validity. the term construct refers to the theoretical construct or trait being measured, no to the technical construction of the test items. A construct is a postulated attribute or structure that explains some phenomenon, such as individual’s behavior. Because constructs are abstract and are not considered to be real objects or events, they sometimes are called hypothetical constructs.
  20. How might you be wrong? What are the plausible alternative explanations and validity threats to the potential conclusions of your study, and how will you deal with these? How do data that you have, or that you could collect, support or challenge your ideas about what’s going on? Why should we believe the results? How will we know that the conclusions are valid? Group threats – if our experimental and control groups were different to start with, we might merely be measuring these differences. For example using government employee for one group and businessmen for other group, or professional for one group then elementary students on another. Regression to the mean – if the participants produce extreme scores on a pre-test (either very high and low), by chance they are likely to score closer to the mean on a subsequent test – regardless of anything the experimenter does to them. This is called regression to mean, and it is particularly a problem for any real-world study that investigates the effects of some policy or measure that has been introduced in response to a perceived problem. Time threats – with the passage of time, events may occur which produce changes in our participants’ behavior, we have to be careful to design our study so that these changes are not mistakenly regarded as consequences of our experimental manipulations.
  21. History –events in the participants’ lives which are entirely unrelated to our manipulations of the independent variable, may have fortuitously given rise to changes similar to those we were expecting. Maturation – participants- especially the young ones – may change simply as a consequence of development. These changes may be confused with changes due to manipulations of the independent variable the experimenter is interested in. - Pre-test & post-test = any observed change in the dependent variable might be due to a reaction to the pre-test. The pre-test might cause fatigue, provide practice, or even alert the participants to the purpose of the study. This may then affect their performance on the post-test. Reactivity and Experiment Effects – measuring a person’s behavior may affect their behavior, for variety of reasons. People’s reaction to having their behavior measured may cause them to change their behavior.
  22. Characteristics of Subjects / Respondents of the study Intelligence – those with basic intelligence learn the material faster than those who have low IQ Experience / preparation for the topic being introduced Socio – economic Researcher’s personal characteristics In case of surveys, interviews conducted by researcher may also be unduly affected by their capacity to establish rapport with respondents -this is important to exert care in determining who will be involved in the research activity. Investigators in experiments as well as surveys can be screened for their capacity to steer the process without affecting respondents. This is why training is very important. Not everyone is cut out to conduct good interviews.
  23. - Addressing issues of reliability and validity in your survey will assist in making the survey process a useful and successful endeavor. Anyone can distribute a survey, but it takes some forethought and planning to derive results that will be useful for marketing purposes. - Reliability is necessary but not sufficient condition for a valid survey. That is, a test or measuring instrument could be reliable but not valid. In that case, it would be consistently measuring something for which it was not intended. However, a test must be reliable to be valid. If it is not consistent n what it measures, it cannot be measuring that for which it is intended. If it doesnt measure something consistently, its not going to be useful or valid. - A measure is reliable to the extent that it is free from unsystematic sources of error - If the scale measures your weight correctly… then it is both reliable and valid. If it consistently overweighs you by six pounds, then the scale is reliable but not valid. If the scale measures erratically from time to time, then it is not reliable and therefore cannot be valid.   Reliability and validity can be enhanced by planning the survey research process. We could have caught the confusion around the concept of service if we had adequately pretested the survey. A pretest is a trial version of the survey administered to a similar population. Typically, as part f the pretest, respondents are interviewed about each question. Insofar as the definitions of reliability and validity in quantitative research reveal two strands: Firstly, with regards to reliability, whether the result is replicable. Secondly, with regards to validity, whether the means of measurement are accurate and whether they are actually measuring what they are intended to measure.
  24. - Essentially, reliability and validity establish the credibility of the research. Reliability focuses on replicability and validity focuses on the accuracy of the findings. attaining reliability does not assure the validity of research. For example, observers could agree on the conclusions and yet the conclusions could be in error. If conclusions cannot be drawn with confidence, there are deficiencies in the research procedures and the study lacks validity. Reliability and validity are conceptualized as trustworthiness, rigor and quality in qualitative paradigm. It is also through this association that the way to achieve validity and reliability of a research get affected from the qualitative researchers’ perspectives which are to eliminate bias and increase the researcher’s truthfulness of a proposition about some social phenomenon (Denzin, 1978) using triangulation. Then triangulation is defined to be “a validity procedure where researchers search for convergence among multiple and different sources of information to form themes or categories in a study” (Creswell & Miller, 2000, p. 126). Therefore, reliability, validity and triangulation, if they are to be relevant research concepts, particularly from a qualitative point of view, have to be redefined as we have seen in order to reflect the multiple ways of establishing truth.
  25. Another way to make an instrument more reliable is by measuring it with more than one item. When we use more than one item, individual errors that respondent can make when answering a single item (misreading/misinterpreting a question) cancel each other out. That is why we construct scales. In general, more items means higher reliability. We don’t necessarily don’t want to take this to extremes. Respondents can get bored if you keep on asking them what seems like similar questions and in an increasingly haphazard way. This will increase the risk of measurement error rather than reducing it. Also, we want to keep survey instruments short, and if we use scales with a lot of items, we wont be able to ask about many different things. Measure a construct that is very clearly and even narrowly defined. This may in some cases conflict with validity (are we measuring our concepts too narrowly) obviously, we want to try and create measurements that are both reliable and valid.
  26. One standard method of splitting a test has been to score the odd-numbered items and the even-numbered items separately. Then the correlation between scores on the odd and even-numbered items is calculated. Of course, splitting a test in this way implies that the scores on which the reliability is based are from half-length tests. To obtain an estimate of the reliability of the full-length test, is necessary to correct, or step up the half test correlation to the full-length correlation. This is the reliability index of the whole test where r is the reliability coefficient, computed using the odd and even numbered items. Since longer tests tend to be more reliable, and since split-half reliability represents the reliability of a test only half as long as the actual test, a correlation formula needs to be applied to determine the reliability r2 of the whole test. Thus, we use the formula above.    
  27. - require only one administration of a test. - Applicable to binary data, for example, responses to items on achievement tests for which the response to an item is either correct or incorrect.
  28.    to overcome this problem, Cronbach suggested splitting the data in half in every conceivable way and computing correlation coefficient for each split. The average of these values is known as Cronbach’s alpha, which is the most common measure of scale reliability  
  29. When we measure internal consistency or test-retest reliability, we may find that our test is not in fact reliable enough. Then we need to see whether we can pinpoint any particular item as being “at fault”. When looking at internal consistency, we can look at how strongly each individual item is correlated with the scale score. Any items that we weakly related to the test as a whole lower our reliability and should be removed from our instrument. When looking at test-retest reliability, we can indentify items that respondents are scoring very differently on at our two test times. These are causing lower reliability. The problem with this method is that there are several ways in which a set of data can be split into 2 and so the results might stem from the way in which the data were split. Split half – simplest statistical technique; randomly splits the questionnaire items into 2 groups. A score for each participant is then calculated based on each half of the scale. If the scale is very reliable, we’d expect a person’s score to be the same on one half of the scale as the other, and so the two halves should correlated perfectly.