• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Finally, you can manage your Google Docs, uploads, and email attachments (plus Dropbox and Slack files) in one convenient place. Claim a free account, and in less than 2 minutes, Dokkio (from the makers of PBworks) can automatically organize your content for you.


Chapter 7  Collecting Research Data with Tests and Self-Report Measures

Page history last edited by PBworks 14 years, 11 months ago

Home Page

Previous Chapter 6 Selecting a Sample

Chapter 7 Complete

p. 188-9: #2(Explain what it means for a test to yield valid interpretations from test scores, and describe five approaches to determining how valid such interpretations are)

From test (p. 191 - 192):

For a test to yield valid interpretations from test scores, there must be a high degree of evidence and theory to support it: test scores are neither valid or invalid, it is our interpretations that are valid or invalid.

5 approaches to determining the validity of such interpretations are:

1. Evidence from test content: specific content represented by the test and how well that content is sampled by test items.

2. Evidence from response processes: the processes actually engaged by the test are consistent on a particular construct.

3. Evidence from internal structure: analysis of the relationship of test items to each other.

4. Evidence from relationship to other variables: how the sample will perform on a test in relation to measures of other variables, such as how well the test predicts a sample's scores on a predictive criterion.

5. Evidence from consequences of testing: test scores and constructs have value-laden consequences when used to make decisions about individuals.

#3(Explain what it means for a test to yield reliable scores, and describe four approaches to determining test score reliability)

(Text, p. 195-198)

For a test to yield reliable scores, there should be minimal measurement error in the scores. Reliablility refers to the extent to which other researchers would obtain similar results if they used the same procedures as the first researcher on a case. Four approaches to determining test-score reliability:

1. Alternative form reliability: a different form of a test developed to measure the same construct.

2. Test-retest reliability: looks at the occasion (time) of the test, coparing an individual's scores on the same measure on different occasions.

3. Internal consistency: examines test items; the analysis of scores from a sample of individuals.

4. Inter-tester reliability: looks at the form of test; a different form of test is developed to measure the same construct.


From handout in class (March 14):

1. Split-half reliability & Kuder-Richardson 20 & 21 and Coefficient alpha - measures internal consistency, calculated by correlating scores of the same test given at two different times.

2. Test-retest - measure of equivalence, calculated by correlating scores on different forms

3. Alternate forms (equivalent forms) - measure of equivalence, calculated by correlating scores on differnt forms

4. Parallel tets - measure of equivalence and stability, calculated by correlating scores on different forms administered at two different times

1. Coefficient of equivalence - Two parallel forms of the test are administered and scores on one form are correlated with those on the other form

2. Coefficient alpha - Individual test items are analyzed by using standard formulas

3. Coefficient of internal consistency - after the test is administered, and scores on one form are correlated with those on the other form

4. Coefficient of stability - the test is administered and then readministered after a time delay, and scores from the first administration are correlated with those from the second



#4(Explain what information about test score reliability is provided by generalizability theory and the standard error of measurement)(p. 198 - 200)


1. Generalizability theory: attempts to isolate sources of systematic measurement error. ANOVA is used to analyze data to assess the effect of eah measurement error source.

2. Standard error of measurement: the probable range in which an individual's true score falls; the combined result of the true sore and the amount of measurement error.


#11(Describe procedures and criteria that have been proposed for determining the validity and reliability of performance assessment)

Eight criteria for judging the validity of performance assessments: (p. 211)

1. Consequences: are the consequences of using performance assessment reasonable? (e.g., the amount of time taken from instruction).

2. Fairness: did all students have equal opportunity to perform well? Did judges use same criteria?

3. Generalizability: will performance on one task carry over to other tasks?

4. Cognitive complexity: when judging higher-order thinking skills, does performance assessment really do this?

5. Content quality: are performance assessment tasks authentic (representative of real-life tasks and quality indicators)?

6. Meaningfulness: the extent to which other groups view performance assessment as authentic.

7. Content coverage: does performance assessment adequately represent the type and amount of material covered?

8. Cost and efficiency: is performance assessment cost-effective to administer?


Reliability of performance assessments: (p. 212)

Equally as important as validity; the extent to which the performance assessment is free of measurement error. A hermeneutic approach is generally used, i.e., how parts are related to the whole. When different interpretations arise in performance assessment, all interpretations are used to make judgments rather than eliminating those that differ.


#14(Describe how to use the test manual, the test itself, and contact with the test developer to determine if a test is appropriate for your research purposes)(p. 215-216)

The test manual tells:

  • theoretical constructs on which test is based
  • recommended uses of test
  • evidence of reliablity and validity
  • availability of norms
  • availabilitiy of short and alternate forms of the test
  • procedures for administering, scoring, and interpreting the test


The test:

Should be examined for:

  • face and content validity
  • for whom test is appropriate (e.g., grade level)

The researcher should consider taking the test to see if potential problems are evident.


The test developer:

Should be consulted to determine if developer has additional or more recent information relative to the test.



#15 (Describe the seven steps involved in developing a test for use in research)

Major steps in developing a test: (from "Pearls" and p. 217):

1. Define the constructs to be used:

Give thought to the specific construct tht test will measure and whether there is a theoretical basis for the construct

2. Defining the target population:

Give consideration to the tqarget population, defining it in detail during test contruction

3. Reviewing related tests:

Examine similar tests to generalize ideas for test format and validity

4. Developing a prototype:

Make a preliminary version of the test

5. Evaluate the prototype:

Obtain a critical review of the test prototype from experts

6. Revising the test:

After the field test, revise test to the degree necessary and field-test the revised version.

7. Collecting the data on test validity and reliability:

Obtain evidnece of reliability of test scores and validity of inferences that could be made from scores.



Home Page

NextChapter 8 Collecting Research Data with Questionnaires and Interviews

Planning Page for this Wiki Check here to see who is doing what!

Comments (0)

You don't have permission to comment on this page.