Item Writing Guidelines

Guidelines for Developing the MCQs and the MCQ Examination

Download IBA User Guide (by SPD)

The current plan for the Computer-Based National Competency-Based MCQ examinations for each health profession is to present a 200 item examination, consisting of 150 real items plus 50 pilot items.  Three and one half hours will be allotted to the examination.  The items below present guidelines for preparing the MCQs for this examination.  The first items listed relate to ‘what to test’, followed by items that relate to ‘how to test’.

  1. What to Test:  Items should be written which test the competencies listed in the statements of Competency Standards for each health profession.  In this context, a competency is something that a graduating student should be able to do in relation to providing safe and effective patient care – a competency requires the application of knowledge, not simply the recall of knowledge.  Competencies relate to being able to effectively gather information from patients (e.g., through history taking, physical examination, ordering tests), to correctly interpreting these data as a basis for making good clinical decisions about patient’s problems and their management, to conducting procedures related to patient assessment and treatment, etc.  The MCQ examination and the OSCE will test graduating students’ competence (ability) to perform in these ways, as they are the skills they must possess in their future clinical performance. 
    • What to Test:  Relative to item 1, all MCQs should present scenarios, most often describing a patient, and should test competencies relative to the assessment and management of that patient.  This is also true of questions which test skills related to the scientific basis of medicine.  Examples of how to develop scenario-based questions testing scientific concepts can be found on the USMLE Step 1 website, http://www.starttest.com/nbme/launchtest.aspx?cmd=launchtest&programid=30&res=1280x800&color=24, as the Step 1 examination is a test of the sciences underlying medicine.  All questions on this examination present scenarios.  The National Competency-Based examinations should adopted a specific format of the patient descriptions, which should be followed in all questions (e.g., patient name, sex, and age, place of patient encounter, presenting complaint, patient data [history, physical examination findings, laboratory results, diagnoses, treatment, …).  What data is presented in the stem of the question will be determined by what the question is testing.  If a question is testing data gathering skills, the stem may be brief.  If it is testing data interpretation (e.g., diagnosic) skills or management skills, the stem is typically longer as it needs to provide more information.
  2. What to Test:  It is important that each question tests important content relative to the Competency Standards.  This issue is linked directly to the content validity of the examinations.  Question writers should always consider this issue as a question author, and perhaps even more importantly, other members of the profession should emphasize this issue in the question review process; ‘Does this question test an important concept at the level of a graduating student?’  
  3. How to Test:  All MCQs should use the ‘one best response format’.  This format is referred to as an ‘A-Type MCQ’.  In this format, the question stem is followed by 4 or 5 (or more) options.  The student is asked to select the ‘best’ response to the question.  The options therefore are not either true or false.  Rather, one is simply better than the others -- it can be the ‘most likely’ (e.g., Which artery is most likely occluded?), or ‘most effective’ (e.g., Which drug [or which investigation] would you recommend for this patient?). Examples of questions in this format are presented in Appendix B.   
  4. How to Test:  There are many specific guidelines related to question development which are discussed in the resources cited in the previous section.  These guidelines, stated as questions, include:
    1. Is the question clear and unambiguous?
  • Are the options long, complicated, or double?
  • Are numerical data are not stated consistently?
  • Are vague terms used in the options (e.g., rarely, usually)?
  • Are the options are in non-logical order?
  • Is “None of the above” used as an option?
  • Is the question stem tricky or unnecessarily complicated?
  • Does the question use a negative stem (e.g., “Which of the following is not …?”, or “All of the following are true, except …? 
  1. Is the question free of technical flaws? 
  • Does the question present grammatical cues – do one or more distracters not follow grammatically from the stem?
  • Does the question present logical cues – is there for example a subset of the options that are collectively exhaustive?
  • Does the question use absolute terms such as "always” or "never" in some options (in the health professions, seldom are such options correct)?
  • Is the correct answer longer, more specific, or more complete than other options?
  • Is a word or phrase in the stem also included in the correct answer?
  • Convergence strategy: Does the correct answer include the most elements in common with the other options?
  1. Does the question pass the test of the ‘Cover the options rule’? 
  • A quality of a good MCQ is that students should be able to answer the question without reading the list of options it presents.  The test of this quality it to cover (place your hand over) the option list and ask yourself if you can still answer the question?

 

In summary, the MCQ writers and MCQ review panels should always ask the following five questions when preparing their questions: 

  1. Does the question test a competency listed in the Competency Standards, does it test the application of knowledge and not the recall of knowledge, and does it use a vignette?
  2. Does the question test an important concept?
  3. Is the question clear and unambiguous?
  4. Is the question free of technical flaws?
  5. Does the question pass the ‘Cover the options rule’?

 

If an MCQ ‘passes the test’ posed by these five questions, it is likely a good question.   

The guidelines above relate to the preparation of individual MCQs.  The items below relate to the preparation of the National Competency-Based MCQ examination.

 

  1. The items chosen for an MCQ examination should meet the blueprint specifications for the examination.
  2. The items chosen for the MCQ examination should meet the guidelines discussed above for developing MCQs.
  3. The items should be pretested (piloted) with graduating students to determine their measurement characteristics.  Pilot testing should occur using a minimum of 30 students (ideally up to 100 students), and the pilot test should occur under real testing conditions.  If MCQs to be used on the National Competency-Based MCQ examination are piloted tested as a formative or practice examination for students that does not count towards their grades, the students will perform differently (i.e., will not take it seriously).  This will affect the statistics used to describe the items’ performance and measurement qualities.  The statistics from pilot testing the item should include:
    1. The difficulty index (percentage of students answering the question correctly
    2. The discrimination index (the relationship between performance on the item and performance on the test as a whole – often calculated as a point-biserial correlation coefficient).
    3. The distribution of response across the options presented in the question.

The test development teams in each discipline to review and refine the items should use these statistics.  Questions that are found to be very easy or very difficult should be reviewed with respect to their importance, and with respect to the effectiveness of each of the incorrect options as distracters.  If a question is judged to test an important competency, then it may be retained even though it is easy or difficult.  If incorrect option found to be selected by few if any students, more appealing replacements for it should be sought.   Questions found to have negative discrimination indices should be reviewed to determine if they are misleading better students (that is, to determine why the better students on the test-as-a-whole are answering the question incorrectly more often than the weaker students?).

  1. The ‘rule of thumb’ for numbers of questions on an MCQ examination and the time permitted to take the examination is one minute per question.  If students taking the National Competency-Based MCQ examinations are allocated 3.5 hours, this would suggest that an examination of up to 200 items is appropriate.  This could permit using 150 items that will count, and 50 items that are being piloted.  Since computer will present the examination, the time students take to answer each question can be monitored, permitting an evaluation of the suitability of the number of questions on the examination.  Because all questions will present vignettes which will require more reading time, it may be determined that 200 questions are too many, and for example 180 questions are a more appropriate number. 
  2. MCQs for the National Competency-Based Examinations should be stored in computer item banks.   NACEHealthPro develops the item banking system used.  A plan for a system of item banking MCQs for the National Competency-Based Examinations is presented in the in the “Profile of National Competence Examination for Health Profession” document prepared by NACEHealthPro.  Over time, this plan may be refined to meet the needs of each of the four health professions. 
  3. All MCQs contained in the item banking system should be classified in terms of several parameters, including:
    • The name, discipline (e.g. biochemistry) and school of the item writer
    • Item content classifications, by indexing the item to parameters used in the Competency Standards
    • A listing of the examinations on which the items has been used,
    • The answer key
    • The item statistics – difficulty index, discrimination index, distribution of responses to the options, the item-passingpassing index’.
  4. As time passes, it is expected that more and more students will pass the National Competency-Based Examinations, as a consequence of schools improving the quality of their programs relative to the competencies listed in the National Competency Standards documents.  It may be believed that a time will come when National Competency-Based Examinations are no longer needed.  This is in fact not the case.  It will always be the case that some students will fail to acquire the necessary competencies to pass the examinations – it is essential that these students be identified, provided with remedial education, and re-tested.  It will also always be true that schools will need ongoing feedback on their students’ performance on the examinations, as a basis for improving their programs.  National Competency-Based Examinations should remain a continuous and important feature of the education and assessment of students entering the health professions in Indonesia.
  5. Orientation materials should be prepared for students on the new MCQ computer-based test (CBT).  Such orientation materials can be readily presented online.  They can discuss the role of the examination in testing the Competency-Based Standards, the rules of taking the examinations, give examples of how competency statements in the Standards are tested by specific items, and can provide students will practice tests on which students can obtain their scores.  A review of the Medical Council of Canada’s website and the United States Medical Licensing Examination website provide examples of such orientation materials
  6. To limit the usefulness of students’ attempts to cheat by viewing other students’ computer screens, a computer-based examination can present the questions in a random order to each student.  In addition, several equivalent forms of the MCQ examination can be developed.  Typically, for a 150 item examination, 100 questions would be different in each form, and 50 questions the same, the latter questions being used to equate the difficulty of each form.

 

Setting Standards for the National Competency-Based MCQ Examinations

One of the major challenges in developing National Competency-Based Examinations is to define the passing score on the examination.   To define this score, it is necessary to determine the minimally acceptable level of performance on the examination.  The process of defining this minimally acceptable level of performance is called standard setting.  There are three basic approaches to defining the standard for an examination:

  • Arbitrary – e.g., ‘The passing score is 50%.’  This is a common way of setting standards.
  • Normative – e.g., ‘The bottom 10% of the students will fail.’  Or, ‘Students whose scores are 2 standard deviations below the mean will fail.’  In this approach the standard is set relative to the performance of a group of students.
  • Criterion referenced – a group of experts define the minimally acceptable level of performance relative to the content or skills being tested

There is a clear trend in health professions education to move to ‘criterion-referenced’ systems of setting standards, and this is especially true for national competency-based examinations such as those that will be developed in this project.  Criterion-referenced systems of standard setting entail eliciting the judgments of practitioners and teachers in each health profession as to what constitutes minimally acceptable levels of performance.  This process is accomplished by their analysis of the specific items on an examination.

There are several criterion-referenced standard setting techniques for MCQs; the Angoff technique, the Ebel technique, and the Nedelsky technique are the three most common, and among them the Angoff technique is the technique used most commonly.  The steps in this technique will be illustrated below in an example from medicine.

  1. The first step in setting standards by the Angoff Technique is to select a group of experts (typically 5-8 teachers and practitioners) who have worked with graduating students as teachers and supervisors, and who understand the level of competence expected of graduating students.  It is also important that these individuals are experts in the content of the questions being reviewed.  This means that different groups can (and should) be used for different sets of questions, which has the advantage of involving larger numbers of people in the standard setting process. In the context of a National Examination, it is also important that these experts are selected across regions and schools.
  2. The standard setting group reviews each MCQ separately.  For example,  

A 63-year-old man presents to his general practice doctor with a 6-month history of fatigue.  Blood work reveals a normal white blood cell and platelet count, hemoglobin of 95g/L (N 145-175), an MCV of 71 (N 82-100) and a ferritin of 4 (N 20-300.)  Which ONE of the following is the MOST LIKELY cause?

  1. An iron-poor diet
  2. Diminished secretion of gastric acid
  3. Bleeding from the gastrointestinal tract*
  4. Impaired absorption in the duodenum and jejunum
  5. Increased shedding of mucosal cells lining the small intestine
  1. With respect to graduating medical students, the group of experts is asked, “What is the chance that a ‘minimally competent’ student will answer this question correctly?”  To answer this question, the teachers must think about minimally competent students they have known and how they perform.  They then review the options presented in the question, and ask themselves what the chances are that a minimally competent student could differentiate between the correct option and the incorrect options.  Based on this analysis, each teacher then writes down his or her estimate of the chance (a percentage) that a minimally competent student would get the item correct.  If there were 5 teachers in the group, the numbers might be something like 40, 60, 50, 70, and 65. 
  2. The members of the group then review and discuss the distribution of percentages they have assigned, they explain and defend their judgments, and ask others to do the same. 
  3. If the MCQ has been previously used on examinations, the group is often shown data describing how students performed on the item – the item’s difficulty index and distribution of responses across the options in the item.  This is an optional, but recommended step.
  4. Each group member is then asked to once again record his or her estimate of the chance (percentage) that a minimally competent student will answer the question correctly.  Some group members may write down the same percentage as they did earlier, but others may have been influenced by the discussion or by the item performance data, and change their estimates.  The new set of numbers is typically closer together and might be something like 55, 55, 60, 65, and 65.
  5. These estimates can then be averaged to define the ‘passing index’ for the item, or the median value can be selected.  By either technique, for this question the passing index would be 60%.
  6. The passing indices are then averaged across all items on the examination to define the passing/cutting score (a percentage score) for the examination.

 

The process described above is conducted before an examination is administered.  However, it is always wise to review the passing indices for items following the administration of the examination.  It may be that there is a major discrepancy between the passing indices of some items and students’ performance on an item.  Such discrepancies can be reviewed by groups of experts with a view to refining their estimates of borderline student performance for specific items.  And of course, passing scores defined by the Angoff Technique will change from one test to another, as they will be a function of the difficulty of the specific set of items on test.