Evaluation of multiple-choice questions by item analysis, from an online internal assessment of 6th semester medical students in a rural medical college, West Bengal
Sharmistha Bhattacherjee, Abhijit Mukherjee, Kallol Bhandari, Arup Jyoti Rout
Department of Community Medicine, North Bengal Medical College, Darjeeling, West Bengal, India
Correspondence Address:
Dr. Arup Jyoti Rout
Department of Community Medicine, North Bengal Medical College, Darjeeling - 734 012, West Bengal
India
Source of Support: None, Conflict of Interest: None
CheckDOI: 10.4103/ijcm.ijcm_1156_21
Background: Properly constructed single best-answer multiple choice questions (MCQs) or items assess higher-order cognitive processing of Bloom's taxonomy and accurately discriminate between high and low achievers. However, guidelines for writing good test items are rarely followed, leading to generation and application of faulty MCQs. Materials and Methods: During lockdown period in 2020, internal assessment was taken through online mode using Google Forms. There were 60 'single response type' MCQs, each consisting of single stem and four options including one correct answer and three distractors. Each item was analyzed for difficulty index (Dif I), discrimination index (DI), and distractor efficiency (DE). Results: The mean of achieved marks was 42.92± (standard deviation [SD], 5.07). Dif I, DI, and DE were 47.95± (SD 16.39) in percentage, 0.12± (SD 0.10), and 18.42± (SD 15.35), respectively. 46.67% of the items were easy and 21.66% were of acceptable discrimination. Very weak negative correlation was found between Dif I and DI. Out of total 180 distractors, 51.66% were nonfunctional one. Conclusion: Item analysis and storage of MCQs with their indices provides opportunity for an examiner to select MCQs of appropriate difficulty level as per the need of assessment and decide their placement in the question paper.
Keywords: Bloom's taxonomy, difficulty index, discrimination index, distractor efficiency, item analysis, multiple-choice questions
Assessment of students by multiple-choice question (MCQ or item) is an well acceptable method for its (1) objectivity, (2) comparability, and (3) minimized assessor's bias.[1]
In India, single best-answer MCQs have been commonly used for medical entrance and university examinations.[2] It is a popular tool of assessment because such tests can be taken for a large number of students, easily scored, help in controlling cheating, and enable teachers to cover a wider range of syllabus. These types of questions were twice more reliable in evaluation of the students' knowledge compared to short-answer questions.[3] Properly constructed MCQs assess higher-order cognitive processing of Bloom's taxonomy (interpretation, synthesis, and application of knowledge) instead of just testing recall of isolated facts and are thus able to accurately discriminate between high and low achievers.[4],[5]
One best response type MCQs consist of a stem, one correct or best response (key), and few more wrong choices (distractors).[6] The main challenge in preparing MCQs is to construct good test items, which requires good depth of knowledge of the subject, understanding of the objectives of assessment, and good skills in writing the items.[7],[8] Obviously, there are many guidelines for writing good test items but they are rarely followed, leading to the generation and application of faulty MCQs.[9]
Item analysis is a process, which examines student responses to individual test items (questions) in order to assess the quality of those items and of the test as a whole. It is especially valuable in improving items, which will be used again in later tests.[10]
Due to countrywide lockdown in 2020 owing to COVID pandemic, conventional internal assessment (offline answering of long and short answer type questions) was not conducted in a rural medical college, West Bengal. Hence, the Department of Community Medicine had decided to take the test in online mode using MCQs, followed by item analysis.
Materials and MethodsNinety-eight MBBS students of 6th semester appeared for an internal assessment on August 14, 2020, through online mode using Google Forms. There were 60 “single response type” MCQs consisting 1 mark each without any negative marking for wrong answer/s. The time allotted was 80 min. The MCQs were constructed by all teachers in the department. All MCQs had single stem, one correct answer (key), and three incorrect alternatives (distractors). Each item was analyzed for difficulty index (Dif I), discrimination index (DI), and distractor efficiency (DE). The data so obtained were entered in MS Excel 2019 and analyzed. Scores of 98 students were arranged in descending order and were divided into three groups. The first group consisting of 1/3rd of total students with higher marks (top third) are labeled as high achievers and the 2nd group consisting of 1/3rd of total students with lower marks (bottom third) are labeled low achievers. Middle 1/3rd was discarded.
Calculations were made using the following formulae:[11],[12]
Dif I = (h + l)/n × 100DI = 2 (h– l)/n.Where;
h = Number of students answering correctly in high achievers' group = 33 students
l = Number of students answering correctly in the low achievers' group = 33 students
n = Total number of students in both groups including nonresponders = 66 students
InterpretationDifficulty Index (Dif I)Difficulty index describes the percentage of students who answered the item correctly and ranges between 0 and 100%. The higher the Dif I value; the lower is the difficulty (easy) and the lower the Dif I value; the greater is the difficulty of an item. Dif I >70% is considered as easy items, <30% as difficult and in-between percentage are acceptable.
Discrimination index
DI is the ability of an item to distinguish between high and low achievers. It ranges from 0 to ≥0.4. Higher the DI, better the discrimination among high and low achievers.[13] Negative DI means defective item/wrong key and the students of lower ability answer more correctly than those with higher ability.
Distractor efficiency
Students who have not mastered the subject should choose the distractors more often, whereas the well-prepared students should discard them more frequently while choosing the correct option. Any distractor that has been selected by <5% of the students is considered to be a non-functional distractorsr (NFD).[14] Items containing no NFDs have 100% DE, while items with 3 NFDs have no DE.
ResultsSixty MCQs with their 240 options (60 correct options and 180 distractors) were analyzed. The mean of achieved marks was 42.92± (standard deviation [SD] 5.07). Dif I, DI, and DE were 47.95± (SD 16.39) in percentage, 0.12± (SD 0.10), and 18.42± (SD 15.35), respectively [Table 1]. Items that can be categorized as difficult are found to be 15%, whereas 46.67% of the items were easy [Table 2]. Items with poor DI were 70% and 21.66% were of acceptable discrimination. Negative discrimination showed by 6.67% of the items [Table 3]. Very weak negative correlation was found between Dif I and DI [Figure 1]. Out of total 180 distractors, 51.66% were nonfunctional one. 1 NFD and 2 NFDs were found in 35% of items each. 16.67% items had all the three distractors as NFDs, whereas only 13.33% items had no NFD [Table 4].
Table 1: Distribution of items according to mean±standard deviation of outcome variables (n=60)Table 2: Distribution of items according to their difficulty index (n=60)Table 3: Distribution of items according to their discrimination index (n=60)Figure 1: Distribution of items according to correlation between difficulty index and discrimination indexTable 4: Distribution of items according to their distractor efficiency (n=60) DiscussionOne-best multiple-choice questions
A large portion of curriculum is assessed in a short period of time requiring less effort on behalf of the student, although it takes a lot of effort and time spent by the examiner to make high quality one-best MCQs, as compared to descriptive questions. One-best MCQ is an efficient tool in identifying the strengths and weaknesses in students, as well as providing guidelines to teachers on their educational protocols.[15]
Difficulty index
Dif I, also called ease index, describes the percentage of students who correctly answered the item. It measures 'How difficult or easy the questions were?' Too difficult items (DIF I ≤30%) will lead to deflated scores, while the easy items (DIF I >70%) will result into inflated scores and a decline in motivation.[16]
Two studies had shown that their mean of DIF I were 39.4 ± 21.4 and 52.53 ± 20.59, respectively.[1],[17] The mean Dif I of the present study was somewhere in between those two findings. The reason behind most of the items being easy could be most of the questions were from 'must know' part of the syllabus so proportion of marking the correct option was soaring in both high and low achievers.
Too easy items should be placed either at the start of the test as “warm-up” questions or removed altogether, similarly too difficult items should be reviewed for possible confusing language, areas of controversies, or even an incorrect key.[18]
Discrimination index
The difficulty and discrimination indices are often reciprocally related. While questions with high Dif I (easier questions) are considered as poor discriminators, questions with low Dif I (harder questions) are considered as good discriminators.[19] In the present study, most of the items were of poor discrimination. As we have found that Dif I was mostly easy, assuming that those items were attempted correctly by every student, it renders poor discrimination.
In negative DI, students of lower ability answer questions correctly than those with higher ability. Reasons for negative DI can be wrong key, ambiguous framing of question, or generalized poor preparation of students.[20] The present study was also not free from wrong key, but the proportion remained below 7%. Another reason may be a student of lower ability by guess selects correct response, while a good student suspicious of an easy question takes harder path to solve and end up being less successful.
Distractor efficiency
It is actually a relationship between the total test score and the distractor chosen by the students.
More nonfunctional distractors (NFDs) in an item increases DIF I (makes item easy) and reduces DE, conversely item with more functioning distractors decreases DIF I (makes item difficult) and increases DE. The present study showed that more than half of the distractors were NFDs (reduced DE) and most of the test items were easy to answer (increased DIF I). Possible explanation may be inability of the teachers to choose good distractors. However, near similar results were reported by Namdeo and Sahoo with 53.4% NFDs.[21] However, in contrast, Gajjar et al. reported only 11.4% NFDs, while Hingorjo et al. reported a mean DE of 81.4%, which is much higher than present mean of DE.[1],[18]
ConclusionMCQs cover wide area of the subject in a short period of time, are preferred method of objective assessment, and selection of good MCQs can obviously judge knowledge of the students. Item analysis is a simple procedure for evaluation of validity and reliability of MCQs. Item analysis and storage of MCQs with their indices provides opportunity for an examiner to select MCQs of appropriate difficulty level as per the need of assessment and decide their placement in the question paper.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
References
Comments (0)