CHAPTER VIII
ANALYSIS OF ASSESSMENT TOOL
VALIDITY AND RELIABILITY TEST
A. The validity of the test
1. Definition of the validity of the test is "accuracy measure" which is owned by a test item which is an integral part of the test as a totality in measuring what it should be measured by the test item. Validity is called valid.
2. Kinds of validity test
a. The validity of prediction (Predictive validity)
The validity of prediction is accuracy (fidelity) of the test in a review of the ability of the test to predict his achievements later. For example, a physics achievement test can be said to have a high validity of prediction, if the results achieved by the students in the test can actually predict success or failure of students in a physics lesson latter
The way is to achieve the correlation between the scores achieved by the students in the test then the value achieved when the correlation coefficient obtained is quite high, it means that the validity of prediction the test is high.
b. Comparative validity
Comparative validity of a test is seen pertinence of correlation to the skills already possessed real present moment. Validity of the comparison has to do with the future.
Way to assess the validity of the comparison is to correlate the results achieved in the tests with the results achieved in similar tests that have been known to have a high validity (ex : standard test) . High and low coefficients obtained showed that the level of validity of the test will be assessed its quality.
c. Content validity
Content validity of a test is fidelity in terms of the content of the test. An achievement test as valid, if the test material is actually materials that are representative of the learning materials provided. For example, if we want to give the test to the fourth grade students of physics, then the item because of lessons to be taken from the fourth grade. If inserted therein items taken from the third grade materials such tests are not valid anymore.
Way to assess the content validity is by comparing the test material with rational analysis conducted on materials that should be used in preparing the test. When it becomes suitable means the test is valid be reviewed of content validity.
d. Structure validity ( construct validity)
The validity of a test arrangement is fidelity in terms of the test array . For example, if you want to give a test of physics , must make a clear and concise question really measure the ability of physics .
Way to assess the validity of the arrangement is by way of comparing these arrangement with the terms of the preparation of a good test . When this has been fulfilled means test is valid in terms of the validity of the arrangement
The validity of prediction and comparative validity called empirical validity, otherwise content validity and validity arrangement called logical validity.
3. Calculating Validity Test.
The formula for calculating the value of the validity of the test is:
NΣxy - (Σ x) (Σy)
r x y =
{NΣx2 - (Σx) 2} {NΣy - (Σy) 2}
NXY = correlation tests were tested with standard test validation.
X = the test scores of students who tested
y = standardized test scores of students
N = total of students
To determine the validity index based on the correlation obtained with the following conditions:
1. 0.80 ≤ r x y <1.00, the correlation is very high.
2. 0.60 ≤ r x y <0.80 high correlation.
3. 0.40 ≤ r x y <0.60 moderate correlation.
4. 0.20 ≤ r x y <0.4 low correlation.
5. 0.00 ≤ r x y <0.2 the correlation is very low.
B. Reliability Tests
1. Definition of the reliability of the test is dependability of test, it means if a tests has a dependability these tests measure the same results repeatedly. Also called steady or stability.
2. Calculating the Reliability Tests
To calculate the value of reliability tests can be used several ways such as:
a. Methods of tests - retest of reliability
This way of testing this is by the way the test is two times or more, then the results in correlated measures the reliability test are:
1) Devising tests to be measured reliability.
2) Testing out tests which is composed (phase I)
3) Calculating the tests score results of Phase I
4) Testing out the re-arranged tests (phase II)
5) Calculating the score of retest results
6) Calculating the reliability of the test according to Pearson correlation product formula.
NΣxy - (Σ x) (Σy)
n
r x y =
{NΣx2 - (Σx) 2} {NΣy - (Σy) 2}
N N
r x y = coefficient of correlation
x = score student test phase I
y = score student test phase II
N = Total of students
When you have obtained the price correlation coefficient, the price of r is compared to the price table at the level of 95%.
b. Equivalent method - form reliability
This method is a way to measure the reliability of the test premises developed two tests that have similar or similarity (equivalent), the steps taken are:
1) Developed two tests are equivalent.
2) Testing out the two tests (in the same container or in tandem).
3) Scoring results of tests that have been tested, prepared by or in conjunction.
4) Looking for a second test with a reliability coefficient for the correlation formula through correlation of product Moment.
When the obtained results show a positive correlation, then the test is reliable, and vice versa.
ANALYSIS OF TEST ITEMS
A. The purpose of analysis of test items
According to Thormdike and Hagen (1977) analysis of test items aimed at: 1). The answer is a matter of diagnostic information to examine learning lessons and failures, as well as leading towards better learning, 2). The answer to the questions that separate and repair (review) questions based on the answers that are the basis for the preparation of the better tests for next year.
So specific objectives of the test item analysis is to find about what is good and bad, and why the matter was said to be good or not good as well as look for reasons why it is not a good test items.
B. The Steps of analysis of test items
To analyze the test item measures undertaken by Stanley (1978) are:
1. Arrange the answer sheet has been calculated. Put on the highest score, then sorted to the smallest score in the bottom of the drop. The whole answer sheet called N
2. When the total of students ≤ 50 people, divide students into two groups, based on the distribution of scores obtained by students. The group with the high score is called the upper group and a group with the low score is called lower group. When the total of students > 50 people, then the determination of the group were taken by 27% of all students in the upper group and 27% for the lower group (Nu and NL).
3. Calculate the upper group who answered each question correctly (Ru) and also lower group (Rl) who answered each question correctly.
After getting values of each group, then doing calculations to determine the level of difficulty and difference power.
C. The Difficulty Level
Good questions are questions that are not too easy or too difficult. Questions that are too easy do not stimulate students to heighten "efforts to solve it". Conversely questions that are too hard will cause students being desperate and do not have the spirit to try again because beyond to their reach.
To determine the index of difficulty of each item of test items used terms:
Ru + RL
P =
Nu + NL
P = Index level of difficulty
Ru = total of upper groups on the correct answer
RL = total of lower groups the correct answer
Nu = NL = total of students or lower group
Criteria to determine the level of difficulty of each test item is:
0.00 - 0.30 about difficult
0.31 - 0.70 about being
0.71 - 1.00 easy matter
P values recommended in writing test items were 0, 30 - 0.70, but it should be remembered that the questions it does not mean a high distinguishing features.
D. Distinguishing power
Distinguishing a test question is how about the ability to distinguish intelligent students (upper group) with less intelligent students (lower group).
Distinguishing each test item can be calculated using the following formula:
Ru - RL Ru - RL
DP = =
Nu NL
Description :
DP = The differentiator
Ru = total of upper groups on the correct answer
RL = total of lower groups the correct answer
Nu = NL = total of students or lower group
Criteria for determining the distinguishing power of each test item are:
D : 0.00 - 0.20: poor
D : 0.20 - 0.40: satisfactory
D : 0.40 - 0.70: good
D : 0.70 - 1.00: excellent
Good question is a question that has distinguishing power from 0.40 to 0.70. Obtained when distinguishing power is negative, then the question should be discarded because the item is more answered by the lower group than the upper group.
E. Analysis of distractors
Distractor analysis is analysis that aims to see distractor work properly. Distractor is not selected at all (Omit) by the tester, it's bad and mean detractors too much misleading, a distractor can be said to function properly if at least 5% selected by followers of the test. Omit should not be more than 10% of the followers of the test.
EXAMPLE OF CALCULATION
Multiple choice
a
b
C
d
O
Total
Description
Upper group (Ru)
5
7
15
3
0
30
Key C
Lower group (RL)
8
8
6
5
3
30
O = omit
Total
13
15
21
9
3
60
From the pattern of responses on this can be sought
RL + Ru 15 + 6 21
1) P = = = = 0.35
Nu + NL 30+ 30 60
Ru - RL 15-6 9
2) D = = = 0.3
Nu 30 30
3) Distractors: All distractors are functioning properly because it has been more than 5% followers test. Omit also good because only 5% of the followers of the test.
Conclusion: Question on including the good question because the level of difficulty, distinguishing or distractor analysis can be sent.
F. Sensitivity of question
Sensitivity of the question is aiming to identify the items that will be used to measure the learning effect is shown by the behavior of the learning indicators that can be observed. To calculate the sensitivity of the items used the formula:
RB - RA
S = = Pb - Pa
Description : S = sensitivity of question
RA = total of students who answered correctly on the initial test
RB = total of students who answered correctly on the final test
T = total of students who join the test
PA = proportion of correct answers of initial test
PB = the proportion of correct answers of final test
Sensitivity index of question contained between 0.00 to 1.00 and the larger positive values stated questions greater sensitivity to the effects of learning.
EXERCISE
There are three questions, the pattern of the answer is as follows:
No
Group Selector
Options Answer
Key
A
b
c
d
0
2
Top
0
10
5
5
0
B
Under
1
12
3
4
0
5
Top
2
20
0
3
5
C
Under
6
3
5
6
10
7
Top
15
10
5
0
0
A
Under
6
16
7
0
0
How would you rate the question tests!
EXERCISE ANSWER KEY
10 + 12 22
Question Number 2 P = = = 0,55
40 40
10 - 12 - 2
DP = = = - 0,1
20 20
Recommendation : The difficulty level of question is either, the distinguishing is very bad, option A must be fixed because it is not desirable for tester.
5
Question Number 5 P = = 0,083
60
0 - 5
DP = = - 0,67
10
15
Omit = x 120% = 25 %
60
Recommendation : Both of he difficulty level of question and the distinguishing are bad, omit is very high, this question should be discarded.
15+6 21 7
Question Number 7 P = = = = 0,35
60 60 20
15 - 6 9
DP = = = 0,3
30 30
30 30
Recommendation: This Question is either used as the level of difficulty and the great distinguishing, Omit 0%, only the option needs to be fixed because it is not desirable for tester.
ANALYSIS OF ASSESSMENT TOOL
VALIDITY AND RELIABILITY TEST
A. The validity of the test
1. Definition of the validity of the test is "accuracy measure" which is owned by a test item which is an integral part of the test as a totality in measuring what it should be measured by the test item. Validity is called valid.
2. Kinds of validity test
a. The validity of prediction (Predictive validity)
The validity of prediction is accuracy (fidelity) of the test in a review of the ability of the test to predict his achievements later. For example, a physics achievement test can be said to have a high validity of prediction, if the results achieved by the students in the test can actually predict success or failure of students in a physics lesson latter
The way is to achieve the correlation between the scores achieved by the students in the test then the value achieved when the correlation coefficient obtained is quite high, it means that the validity of prediction the test is high.
b. Comparative validity
Comparative validity of a test is seen pertinence of correlation to the skills already possessed real present moment. Validity of the comparison has to do with the future.
Way to assess the validity of the comparison is to correlate the results achieved in the tests with the results achieved in similar tests that have been known to have a high validity (ex : standard test) . High and low coefficients obtained showed that the level of validity of the test will be assessed its quality.
c. Content validity
Content validity of a test is fidelity in terms of the content of the test. An achievement test as valid, if the test material is actually materials that are representative of the learning materials provided. For example, if we want to give the test to the fourth grade students of physics, then the item because of lessons to be taken from the fourth grade. If inserted therein items taken from the third grade materials such tests are not valid anymore.
Way to assess the content validity is by comparing the test material with rational analysis conducted on materials that should be used in preparing the test. When it becomes suitable means the test is valid be reviewed of content validity.
d. Structure validity ( construct validity)
The validity of a test arrangement is fidelity in terms of the test array . For example, if you want to give a test of physics , must make a clear and concise question really measure the ability of physics .
Way to assess the validity of the arrangement is by way of comparing these arrangement with the terms of the preparation of a good test . When this has been fulfilled means test is valid in terms of the validity of the arrangement
The validity of prediction and comparative validity called empirical validity, otherwise content validity and validity arrangement called logical validity.
3. Calculating Validity Test.
The formula for calculating the value of the validity of the test is:
NΣxy - (Σ x) (Σy)
r x y =
{NΣx2 - (Σx) 2} {NΣy - (Σy) 2}
NXY = correlation tests were tested with standard test validation.
X = the test scores of students who tested
y = standardized test scores of students
N = total of students
To determine the validity index based on the correlation obtained with the following conditions:
1. 0.80 ≤ r x y <1.00, the correlation is very high.
2. 0.60 ≤ r x y <0.80 high correlation.
3. 0.40 ≤ r x y <0.60 moderate correlation.
4. 0.20 ≤ r x y <0.4 low correlation.
5. 0.00 ≤ r x y <0.2 the correlation is very low.
B. Reliability Tests
1. Definition of the reliability of the test is dependability of test, it means if a tests has a dependability these tests measure the same results repeatedly. Also called steady or stability.
2. Calculating the Reliability Tests
To calculate the value of reliability tests can be used several ways such as:
a. Methods of tests - retest of reliability
This way of testing this is by the way the test is two times or more, then the results in correlated measures the reliability test are:
1) Devising tests to be measured reliability.
2) Testing out tests which is composed (phase I)
3) Calculating the tests score results of Phase I
4) Testing out the re-arranged tests (phase II)
5) Calculating the score of retest results
6) Calculating the reliability of the test according to Pearson correlation product formula.
NΣxy - (Σ x) (Σy)
n
r x y =
{NΣx2 - (Σx) 2} {NΣy - (Σy) 2}
N N
r x y = coefficient of correlation
x = score student test phase I
y = score student test phase II
N = Total of students
When you have obtained the price correlation coefficient, the price of r is compared to the price table at the level of 95%.
b. Equivalent method - form reliability
This method is a way to measure the reliability of the test premises developed two tests that have similar or similarity (equivalent), the steps taken are:
1) Developed two tests are equivalent.
2) Testing out the two tests (in the same container or in tandem).
3) Scoring results of tests that have been tested, prepared by or in conjunction.
4) Looking for a second test with a reliability coefficient for the correlation formula through correlation of product Moment.
When the obtained results show a positive correlation, then the test is reliable, and vice versa.
ANALYSIS OF TEST ITEMS
A. The purpose of analysis of test items
According to Thormdike and Hagen (1977) analysis of test items aimed at: 1). The answer is a matter of diagnostic information to examine learning lessons and failures, as well as leading towards better learning, 2). The answer to the questions that separate and repair (review) questions based on the answers that are the basis for the preparation of the better tests for next year.
So specific objectives of the test item analysis is to find about what is good and bad, and why the matter was said to be good or not good as well as look for reasons why it is not a good test items.
B. The Steps of analysis of test items
To analyze the test item measures undertaken by Stanley (1978) are:
1. Arrange the answer sheet has been calculated. Put on the highest score, then sorted to the smallest score in the bottom of the drop. The whole answer sheet called N
2. When the total of students ≤ 50 people, divide students into two groups, based on the distribution of scores obtained by students. The group with the high score is called the upper group and a group with the low score is called lower group. When the total of students > 50 people, then the determination of the group were taken by 27% of all students in the upper group and 27% for the lower group (Nu and NL).
3. Calculate the upper group who answered each question correctly (Ru) and also lower group (Rl) who answered each question correctly.
After getting values of each group, then doing calculations to determine the level of difficulty and difference power.
C. The Difficulty Level
Good questions are questions that are not too easy or too difficult. Questions that are too easy do not stimulate students to heighten "efforts to solve it". Conversely questions that are too hard will cause students being desperate and do not have the spirit to try again because beyond to their reach.
To determine the index of difficulty of each item of test items used terms:
Ru + RL
P =
Nu + NL
P = Index level of difficulty
Ru = total of upper groups on the correct answer
RL = total of lower groups the correct answer
Nu = NL = total of students or lower group
Criteria to determine the level of difficulty of each test item is:
0.00 - 0.30 about difficult
0.31 - 0.70 about being
0.71 - 1.00 easy matter
P values recommended in writing test items were 0, 30 - 0.70, but it should be remembered that the questions it does not mean a high distinguishing features.
D. Distinguishing power
Distinguishing a test question is how about the ability to distinguish intelligent students (upper group) with less intelligent students (lower group).
Distinguishing each test item can be calculated using the following formula:
Ru - RL Ru - RL
DP = =
Nu NL
Description :
DP = The differentiator
Ru = total of upper groups on the correct answer
RL = total of lower groups the correct answer
Nu = NL = total of students or lower group
Criteria for determining the distinguishing power of each test item are:
D : 0.00 - 0.20: poor
D : 0.20 - 0.40: satisfactory
D : 0.40 - 0.70: good
D : 0.70 - 1.00: excellent
Good question is a question that has distinguishing power from 0.40 to 0.70. Obtained when distinguishing power is negative, then the question should be discarded because the item is more answered by the lower group than the upper group.
E. Analysis of distractors
Distractor analysis is analysis that aims to see distractor work properly. Distractor is not selected at all (Omit) by the tester, it's bad and mean detractors too much misleading, a distractor can be said to function properly if at least 5% selected by followers of the test. Omit should not be more than 10% of the followers of the test.
EXAMPLE OF CALCULATION
Multiple choice
a
b
C
d
O
Total
Description
Upper group (Ru)
5
7
15
3
0
30
Key C
Lower group (RL)
8
8
6
5
3
30
O = omit
Total
13
15
21
9
3
60
From the pattern of responses on this can be sought
RL + Ru 15 + 6 21
1) P = = = = 0.35
Nu + NL 30+ 30 60
Ru - RL 15-6 9
2) D = = = 0.3
Nu 30 30
3) Distractors: All distractors are functioning properly because it has been more than 5% followers test. Omit also good because only 5% of the followers of the test.
Conclusion: Question on including the good question because the level of difficulty, distinguishing or distractor analysis can be sent.
F. Sensitivity of question
Sensitivity of the question is aiming to identify the items that will be used to measure the learning effect is shown by the behavior of the learning indicators that can be observed. To calculate the sensitivity of the items used the formula:
RB - RA
S = = Pb - Pa
Description : S = sensitivity of question
RA = total of students who answered correctly on the initial test
RB = total of students who answered correctly on the final test
T = total of students who join the test
PA = proportion of correct answers of initial test
PB = the proportion of correct answers of final test
Sensitivity index of question contained between 0.00 to 1.00 and the larger positive values stated questions greater sensitivity to the effects of learning.
EXERCISE
There are three questions, the pattern of the answer is as follows:
No
Group Selector
Options Answer
Key
A
b
c
d
0
2
Top
0
10
5
5
0
B
Under
1
12
3
4
0
5
Top
2
20
0
3
5
C
Under
6
3
5
6
10
7
Top
15
10
5
0
0
A
Under
6
16
7
0
0
How would you rate the question tests!
EXERCISE ANSWER KEY
10 + 12 22
Question Number 2 P = = = 0,55
40 40
10 - 12 - 2
DP = = = - 0,1
20 20
Recommendation : The difficulty level of question is either, the distinguishing is very bad, option A must be fixed because it is not desirable for tester.
5
Question Number 5 P = = 0,083
60
0 - 5
DP = = - 0,67
10
15
Omit = x 120% = 25 %
60
Recommendation : Both of he difficulty level of question and the distinguishing are bad, omit is very high, this question should be discarded.
15+6 21 7
Question Number 7 P = = = = 0,35
60 60 20
15 - 6 9
DP = = = 0,3
30 30
30 30
Recommendation: This Question is either used as the level of difficulty and the great distinguishing, Omit 0%, only the option needs to be fixed because it is not desirable for tester.