Thirty individuals with clinician-confirmed moderate to severe CD (n = 20 adults 18–80 years of age and 10 adolescents 15–17 years of age) participated in the interviews. Participants were recruited between January 2020 and September 2020 from clinical sites in Chicago, Illinois; New Orleans, Louisiana; St. Louis, Missouri; and Los Angeles, California. Participants’ ages ranged from 15.1 to 75.4 years (mean = 36.6 years [standard deviation (SD) = 19.2]). Nineteen participants (n = 19/30, 63.33%) had moderate CD, and 11 participants (n = 11/30, 36.67%) had severe CD, as determined by their clinicians during interview screening. Additional demographic and health information is provided in Table 4.
Table 4 Participant- and clinician-reported demographic and health informationCognitive Debriefing ResultsEach instruction, item, and response option was interpreted as intended by at least 76.67% of participants. All participants (n = 30/30, 100.00%) interpreted the CSS instructions as intended and most participants (n = 28/30, 93.33%) interpreted the CSS recall period (“the past week”) as intended. Specifically, the two participants who were determined to have misinterpreted the recall period provided responses to CSS items in reference to the previous 5–8 days, rather than an exact week (7 days). Each concept (i.e., the sign, symptom, or impact that each item is designed to measure) and response option was interpreted as intended by at least 25 of the 30 participants (≥ 83.33%). Results for each of the CSS items are further summarized in Table 5 regarding participant interpretations and experience with each item concept.
Table 5 Cognitive debriefing summary table: itemsAll participants (n = 30/30, 100.00%) reported the CSS was easy to complete, and most participants (n = 28/29, 96.55%) reported that the CSS was relevant to their experience of CD overall. Each item concept was experienced by at least 79.17% of participants, either within the 7-day recall period or prior to the recall period. Items 10 (diarrhea), 11 (blood in your stool), and 13 (vomit or throw up) were most frequently reported as the most important questions to ask individuals with CD.
When asked if there were any additional symptoms or impacts that should be included in the instrument, 10 participants suggested 14 additional concepts for inclusion (whether you have had surgery, comorbid conditions, feeling cold, having a fever, presence of mouth ulcers, anal fissures, bone pain, feet swelling, losing one’s voice, gynecological complications, headaches, watery eyes, mental state/mental health, and cramping). However, no concept was suggested as missing by more than one participant, and several suggestions are not considered to be appropriate for assessment in a patient-reported symptom questionnaire. These findings suggest that there are not any particularly common or salient CD symptoms missing from the CSS.
Psychometric Evaluation Using Phase 3 Clinical Trial DataA total of 850 patients from the ADVANCE study were included in the psychometric and score interpretation analysis. This sample size is considered sufficient for these analyses [23]. Participants’ ages ranged from 16 to 79 years (mean = 37.5; SD = 13.3); 45.88% of the sample was female.
Score DistributionsQuality of completion for the psychometric analysis population was high across the timepoints, with the number of participants with missing data ranging from 14 to 37. In general, respondents used the entire range of the response scale for the CSS items and item and total scores trended toward improvement over time. Items 11 (blood in your stool), 12 (constipated), and 13 (vomit or throw up) demonstrated floor effects at baseline (> 40% of participants endorsing that they “never” experienced the symptom over the 7-day recall period). Refer to Fig. 1 for all score distributions at baseline and week 12.
Fig. 1Score distributions of CSS items and total score at baseline and week 12 for the ADVANCE study
Item–Total and Inter-item CorrelationsThe magnitude of the item–total correlations between each item and the total CSS score ranged between 0.26 and 0.79, which were over the thresholds for acceptable item–total correlations (i.e., ≥ 0.3 [24]) with the exception of the relationship between item 12 (constipated) and the total score (r = 0.26).
Inter-item correlations were weak to moderate (r = 0.07–0.57) across items; this can be expected for a multi-symptom questionnaire. The strongest correlation was observed between item 3 (abdominal pain) and item 4 (felt tired or lacking energy) (r = 0.57). Inter-item correlations can be found in Table 6.
Table 6 Item–total and inter-item Spearman correlations for the CSS at ADVANCE week 12 (N = 819)ReliabilityOverall Cronbach’s α for the CSS total score ranged from 0.76 to 0.87 from baseline to week 12, which provides support for acceptable internal consistency of the questionnaire’s items. Removal of any individual item did not substantially improve internal consistency reliability.
For test–retest reliability, ICCs did not consistently exceed the acceptable threshold for test–retest reliability. The ICC comparing the CSS total score among patients defined as stable using the PGIS between baseline and week 4 was 0.48 (95% confidence interval [CI] 0.08–0.69). Between week 4 and week 12, ICC = 0.70 (95% CI 0.61–0.76) for PGIS stable patients. Among PGIC stable patients between week 4 and week 12, ICC = 0.58 (95% CI 0.43–0.68).
ValidityAll concurrent measures correlated with the CSS in the hypothesized directions, and the strengths of associations between the CSS and most concurrent measures were as anticipated. The CSS total score correlated more strongly with patient-reported symptom-related measures, such as the PGIS, IBDQ bowel symptom and systemic symptom domains, and IBDQ total scores compared to more distal measures (i.e., EQ-5D-5L, SF-36v2®, WPAI:CD) and outcomes that included clinician-reported items (i.e., CDAI), as presented in Table 7.
Table 7 Spearman correlation coefficients between CSS total score and concurrent assessments at baseline, week 4, and week 12 for the ADVANCE studyThe CSS additionally exhibited validity according to known-groups analysis, as CSS scores successfully differentiated between groups of clinically distinct patients. CSS total scores demonstrated a 10.09-point difference between groups classified as remission versus non-remission on the CDAI, an 11.95-point difference in remission versus non-remission groups using the IBDQ, and a monotonic decrease by PGIS group, all of which were statistically significant (p < 0.001). Results from the known-groups analysis can be referenced in Fig. 2.
Fig. 2Known-groups comparisons for CSS total score at week 12 for the ADVANCE study. CDAI Crohn’s Disease Activity Index, CI confidence interval, CSS Crohn’s Symptom Severity Scale, IBDQ Inflammatory Bowel Disease Questionnaire, n sample size, PGIS Patient Global Impression of Severity
Sensitivity to ChangeThe magnitude of correlations was observed to be moderate to strong between the CSS change score and change scores on most concurrent measures (r = 0.35–0.70). Correlations between change in scores for the CSS and more conceptually similar measures (e.g., the SF-36v2® Physical Component Summary) were strongest. The EQ-5D-5L self-care and mobility domains, which were more conceptually dissimilar to the CSS overall measurement constructs (compared to the other concurrent measures), correlated weakly with the CSS change score (r = 0.20 and r = 0.25, respectively), which was anticipated.
Additionally, change scores of most items on the CSS had moderate correlations (r ≥ 0.50 for most items) with the changes in the CSS total score from baseline to week 12, except for item 12 (constipated; r = 0.21). This was likely due to the limited endorsement of item 12 by trial participants (i.e., at baseline, 69.06% of participants indicated they “never” experienced constipation during the recall period). Results suggest that changes in individual CSS items contribute proportionally to the overall change score. In other words, there is no major concern that certain items are solely responsible for changes observed in the CSS total score over time.
Score InterpretationResults from anchor-based methods and supportive analyses suggested estimates of MWPC between − 6 and − 11 points in the CSS (Tables 8, 9, 10). Participants who achieved the CSS MWPC (i.e., at least 6 points improvement) had statistically significantly (p < 0.001) higher rates of CDAI clinical remission/response and endoscopic remission/response, as well as greater improvements in patient-reported quality-of-life measures from baseline to week 12 (Table 11). This finding supports the clinical relevance of the estimated MWPC threshold.
Table 8 Anchor-based estimates of CSS score MWPC by PGIS-stratified anchor categories from baseline to week 12 for the ADVANCE studyTable 9 Anchor-based estimates of CSS score MWPC by PGIC-stratified anchor categories from baseline to week 12 for the ADVANCE studyTable 10 Percentile change in CSS total scores from baseline to week 12 by PGIS response groups per empirical cumulative distribution function curve for ADVANCE Table 11 Responder evaluation of CSS total scores using an estimated meaningful within-person change of ≤ − 6 points (N = 819)Figure 3 presents the eCDF plots for the CSS change score distributions in the ADVANCE study between baseline and week 12, grouped by change categories in PGIS responses during the same timepoints. Patients reporting any kind of worsening were grouped together in this analysis. Figure 3 shows that, overall, the CSS change score distributions were distinct by anchor groups, though there was some overlap between “no change” and “worsened” groups at the upper end of the change score distribution. The change score distributions presented in Fig. 3 and the percentile change presented in Table 10 align with the mean-change anchor-based estimates.
Fig. 3Empirical cumulative distribution function for change in CSS total score by PGIS change response groups from baseline to week 12 for the ADVANCE study (N = 819). CSS Crohn’s Symptom Severity Scale, PGIS Patient Global Impression of Severity, PT points
Figure 4 presents kernel-smoothed PDF curves for the changes in CSS change-score distributions, based on PGIS change categories between baseline and week 12. PDF curves show that improvement groups align with the (negative) change scores presented in the eCDF plots (Fig. 3) and anchor-based analyses.
Fig. 4Probability density function for change in CSS by PGIS change response groups from baseline to week 12 for the ADVANCE study (N = 819). CSS Crohn’s Symptom Severity Scale, PGIS Patient Global Impression of Severity, PT points
Figure 5 shows the ROC curve for participants with a 1-point change on the PGIS between baseline and week 12. The thresholds suggested by Youden’s J were more sensitive to the degree of change and ranged between changes of 7 and 11 points on the CSS. The sum-of-squares thresholds were less variable and ranged between 7 and 9 points.
Fig. 5Receiver operating characteristic curve for CSS total score by PGIS 1-point improvement from baseline to week 12 for the ADVANCE study. AUC area under the curve, CI confidence interval, CSS Crohn’s Symptom Severity, PGIS Patient Global Impression of Severity, SE sensitivity, SP specificity, ROC received operating characteristic. [1] Youden’s J. [2] Point on the ROC that minimizes the sum of squares
Comments (0)