We present an investigation of the accuracy and consistency of pediatric ECG interpretations among pediatric cardiology fellows, trainees, and faculty enrolled in a large multicenter educational registry. The database provided by the pECGreview repository is the largest source of pediatric trainee ECG interpretations currently available and provides a resource for comparing multiple interpretations of the same ECG. For example, the 59 to 92 interpretations for each of the 19 ECGs in this cross-sectional study is larger than many previous studies of pediatric ECG interpretations which evaluated on average only 10 ECGs each. Longitudinal studies using the pECGreview database have the potential for providing interpretations of up to 156 ECGs during the course of a pediatric cardiology fellowship or other training program.
The primary finding of the current study was that the use of the Accuracy methodology provides the first, validated, quantitative means to assess the quality of ECG interpretations of individuals interpreting a series of standardized ECG images.
This study demonstrates between fair-to-good and excellent interrater agreement, validating the adequacy of our methods. Although there may be some variation in mean Accuracy scoring between evaluators, even single rater reliability was good, and the observed variation had little effect on ranking of individual participants with as few as five pECG interpretations. The primary difference among the evaluators was related to the one electrophysiologist who tended to score the interpretations “harder” by 8%. However, since this difference was consistent between the various participants, it effected the absolute scores but had little effect on the participant ranking. The finding that single rater reliability was good even between evaluators with widely different levels of training (ranging from medical student to experienced pediatric electrophysiologist) provides critical validation for the use of a single evaluator in future large scale longitudinal studies of pECGreview responses. The identified variations in evaluator scores do emphasize the need for standardized criteria and continuous quality improvement in pECG interpretation training.
Previous studies have used a range of methodologies to examine the accuracy of pECG interpretations in a variety of different clinical settings and training statuses but have consistently demonstrated an alarmingly low accuracy rate. One study, using a two-point scale for 17 ECGs, found that the accuracy rate was better for senior pediatric residents, 64.1%, in comparison to residents, scoring 55.0% on average [11] Two similar studies have used a three or four-point scale and 10 ECGs each to evaluate various populations. Among pediatric attendings the mean “knowledge score” ranged from 47.7 to 69.7% [2] For pediatric residents and attendings, Kayar et al., found that the “rate of correctly defining the distinction between normal and abnormal was 93.7%” but the “rate of detecting pathologies in ECGs accurately and correct identification of specific diagnosis was 56.7%” [12]. The accuracy rate in that study did not improve following an intervention [12].
Other approaches have also been reported. A study of six attending-level pediatricians and two nurse practitioners demonstrated that after educative programs, accuracy rates for ECG findings improved drastically, suggesting the potential of continued education [13]. However, the participants in that study evaluated the same series of 11 ECGs for both the pre- and post-test assessments. Another study evaluated the interpretation skills by having the participants match 10 ECGs to a list of 10 diagnosis [1]. This multiple-choice approach is limited in that it is a poor correlate to the clinical situation where the number of potential interpretations is large and must be developed by the interpreter.
We also report poor interpretation skills results as derived from the pECGreview database. The overall accuracy of responses for this cohort, primarily fellows in pediatric cardiology training programs, was limited although somewhat better than the results described above, with 66% scored as generally correct. In 22% of interpretations there was an under or over diagnosis of a minor ECG finding and in 12% there was under or over diagnosis of a major ECG finding. However, it is important to stress that direct comparisons of ECG interpretation accuracy between these studies are limited due to the wide range of participant training, ECGs diagnoses, and scoring methods.
Comparison of the accuracy of the responses between the 19 ECGs used in this study as depicted in Fig. 5 is primarily significant for the marked differences in both mean accuracy and the standard deviation of the accuracy responses. By observation is clear that some pECG tracings were found to be easier to interpret correctly than others. However, the variation in response accuracy does not appear to be a simple dependence on the ease of interpretation.
The current study also provides the first reported ranking of the interpretations skills of individual participants and may be a helpful tool for participants by providing objective feedback on their ECG interpretation skills. This approach also provides a framework for future descriptive and interventional studies to improve pediatric ECG interpretation skills for health care providers at all levels.
LimitationsOur study’s strengths lie in the use of a multicenter registry, with a large number of participants and pECGs with multiple interpretations and a panel of evaluators for each ECG interpretation. One limitation is that participants in pECGreview are not required to submit interpretations, so the cohort of participants is variable for each ECG. In addition, the Accuracy score assessment is a subjective scoring system resulting in demonstrable systemic bias (see Fig. 4). However, our analysis demonstrates that this approach to Accuracy scoring is sufficiently robust to result in reliable data. This cross-sectional study was not designed or powered to address the effect of the length of time in a cardiology fellowship training program or the extent of participation in pECGreview on the Accuracy score. This could introduce bias, as participants may potentially lack confidence in their interpretation for a particular ECG and so only respond to ECG’s that they feel they know the correct interpretation. That would lead to an overestimation of the observed accuracy. Another limitation of the ECG-of-the-week format includes the testing of ECG interpretation outside of a true clinical context. Some ECG’s are presented without the aid of a computer-generated interpretation, a feature often available in clinical settings, which is known to enhance diagnostic accuracy [14]. However, previous research has shown that computer-assisted interpretations of pediatric ECGs can be in disagreement with pediatric cardiologists, particularly in rhythm diagnosis, and among others the recognition of right bundle branch block, right ventricular hypertrophy, and QT analysis [14].
Comments (0)