Intra-and inter-observer variability of point of care ultrasound measurements to evaluate hemodynamic parameters in healthy volunteers

This study showed good inter-observer reproducibility and good intra-observer repeatability for CO, SV and IVC diameter measurements.

Ultrasound image acquisition and interpretation are operator dependent [20]; only a few studies describe intra- and inter-observer variabilities for hemodynamic ultrasound parameters. They are mainly based on the reinterpretation of the same ultrasound images by a different observer, but not on the same operator's repeatability. They generally focus on only one or a few ultrasound parameters [12, 14, 15]. This is in contrast to our study, which also assesses the repeatability and reproducibility of POCUS measurements. Moreover, to determine the inter-observer reproducibility, we used the intraclass correlation coefficient, which allows the analyses of both the degree of correlation and the level of agreement of the measurements made by the different observers. Furthermore, we have analyzed the feasibility of simultaneously performing both cardiac function and hemodynamic status parameters. Previous feasibility studies mainly focused on only one parameter [12, 13]. Thus, this is the first study where intra- and inter-observer variabilities for hemodynamic ultrasound parameters were studied simultaneously.

Of all parameters regarding cardiac function, stroke volume and cardiac output showed the best test characteristics. This seems to be unexpected because the measurement of these parameters presents certain challenges: any inaccuracies in the measurement of LVOT diameter are taken to the second power in the continuity equation [21], and inconsistent placement of the pulsed Doppler sample volume in the LVOT is a consistent source of error [22]. Previous studies suggested that cardiac output can only be reliably measured by more experienced observers [23]. In our study, the sonographers already had experience performing qualitative cardiac function ultrasound, which may explain why a short introduction training was enough. Also, cardiac output was automatically determined by the GE Venue R1 machine using an automated VTI tracing system. This is in line with other feasibility studies showing the increasing quality of ultrasound images with an increasing level of training [13]. Hence, even though technical difficulties are described in the literature, we obtained reliable CO and SV measurements after a short training program. As already known in the literature, automatic tools correlate closely with manual measurements for LVOT-VTI measurements [24]; the vantage we could have was that the automatic method could allow realizing measurements within a much shorter time than the standard manual tracing method [25]. We think we could obtain the same result by calculating the LVOT-D measurement manually, as we did, and also tracing the LVOT-VTI measurement manually. The automatic tool helped us trace the wave and make calculations (using the equation), but we did not have any advantage in angle alignment. The vantages we had were about saving time, making the process more executable in the emergency environment, and reducing the possibility of calculation errors.

We found a moderate reproducibility and a high exclusion rate of the MAPSE and the TAPSE ultrasound images, which we did not expect as these parameters are measured in only one ultrasound window and thereby allow brief examination. We revealed that obtaining MAPSE and TAPSE images for specialists in acute internal medicine might be more challenging than for more trained cardiac examiners like cardiologists. The differences in reliability between our result and the literature might be due to the difference in the study population. The previous studies analyzed bigger sample sizes and a more diverse study population, with patients with both physiological and pathological values. It can be argued that the diversity in outcomes for the individual patients, in its extremity, led to a higher correlation and agreement of data and, therefore, better reproducibility [26, 27].

We found a high exclusion rate for the CBF measurements due to inadequate tracing of the automatic VTI line functions. The intra-observer variability of the CBF was only moderate, in line with other studies demonstrating a large range of reproducibility [28]. Some explanations for this difference in reproducibility might be the difference in the location of the measurement [29] and the physiological carotid artery diameter changing during systole and diastole [30]. Because the VTI curve and the tracing were frequently not aligned, it is possible that fewer images would have been excluded by manually tracing the VTI.

Identifying the IVC diameter and collapsibility is part of the essential ultrasound examination at the ED [31, 32] to assess the hemodynamic status and guide fluid resuscitation. Previous studies have shown that the IVC-CI and IVC-D ultrasound measurements can easily be performed with minimal training [33]. Although our intra-observer variability and inter-observer reproducibility for the IVC-D were both good, IVC-CI intra-observer variability and inter-observer reliability were only moderate. This corresponds with the findings of previous studies [34]. We think that the reliability of the IVC measurements might have been influenced by the automatic trace of the vessel in M-mode and partially explains the high number of excluded images. We suppose IVC-CI measurements showed more variation than IVC-D measurements due to a slight variation in breathing. On the other hand, the automated tracing of the IVC-CI measurements could be more vulnerable due to variance of interest (because it is calculated from two measurements) than the IVC-D measurements, which are only affected by natural variation (unwanted variance). We don’t think there was a difference between automated and manual tracing of IVC diameter. The automatic tool traced the vessel; manual tracing would give the same result. The vantage was that the automatic tool eliminated variation due to human error when measuring IVC.

There are limitations to our study. First, it has a relatively small sample size. We might have found different results if we had analyzed a larger population. Moreover, it was only sometimes possible for all three observers to examine each participant due to scheduling difficulties, so only some subjects received measurements from all three observers. Second, this is a feasibility study conducted on healthy volunteers, and practicability may differ in real patients in acute care settings: measurements in healthy subjects can be more accessible because they can maintain the asking posture and guarantee optimal preparation. Moreover, the measurements would be in the physiological range, with a low variance rate. Third, a measurement bias due to the observer bias [35] may have occurred if the individual observers tried to match measurements to their previous examinations, considering they were aware that their measurements were being studied [36]. Fourth, we used automated measuring tools, which the ultrasound machine we have in our emergency department was equipped with; but those functions are only readily available in some emergency departments. In addition, using manual measurements (instead of auto tools) could make the measurement acquisition process longer and, therefore, more challenging to measure in the field of an emergency department. Fifth, the three sonographers have several years of experience in qualitative POCUS ultrasound and could perform quantitative measurements after a brief instruction and a two-hour training session. If more limited experience sonographers had performed the same POCUS measurements, the study would have given other results. Sixth, ICC was calculated only when reliable measurements could be obtained. Because the ICC depends on N: the lower the N, the higher the ICC when the mean and standard deviation remain the same. By selecting only patients in whom both base parameters were considered acceptable (e.g., for CO and SV), we had a lower N. This means that these numbers are based on a slightly different data set, which impacts the ICC due to relatively lower numbers and can contribute to an artificially higher ICC.

留言 (0)

沒有登入
gif