Clinical practice guidelines and consensus for the screening of breast cancer: A systematic appraisal of their quality and reporting

4.1 Main findings

As in BC treatment guidelines, our current review showed a very diverse quality and reporting between BC screening guidances. More than three quarters of these guidelines could not be endorsed as they are currently presented, so their quality and reporting were even worse than in a complimentary review by our team about the quality and reporting of BC treatment CPGs and CSs (Maes-Carballo, Mignini, et al., 2020). Studying the methods of evidence analysis, the guideline documents that deployed systematic reviews had better quality and reporting. CSs had worse quality and reporting, less editorial independence and higher risks of bias than CPGs. The reporting of the quality tool referral use as AGREE II or RIGHT during the guidance elaboration improved quality and reporting.

Screening guidances had lower quality than treatment CPGs and CSs in all the domains except for applicability, although it remains poor. Rigour of development and editorial independence scored very low. The health questions, the end-population applied, the clarity and identification of the different recommendations were well described in more than three parts of the guidances. However, the external review and the updating procedure were specified in less than a third. Treatment and screening CPGs and CSs reporting results were more similar. Recommendations, review, quality assurance and funding, declaration and management of interests improved slightly, whereas basic information, evidence and other information scored worse.

More than three parts of the guidances were well identified and described the aim in the title; the primary population and subgroups were well-specified, and recommendations were clear and separated by subgroups if it was needed. On the other hand, in more than three parts, abbreviations and acronyms were not usually provided; the selection criteria and role of the contributors were also scarce; the development decisions were not usually described; and the external review, the quality assurance and the funding sources were not adequately described. Finally, limitations and external validity of the recommendations were not presented appropriately.

4.2 Strengths and limitations

This non-language-restriction systematic review gave a broad view of the screening scenario guidance literature with a big large number of CPGs and CSs. Being English and Spanish the most widely spoken languages (Amano et al., 2016), most of the societies (China Anti-Cancer Association, 2019; Huang et al., 2019; National Health Commission of the People's Republic of China, 2019; Uematsu et al., 2018; Ditsch et al., 2020; (AGO) AGO, 2019a; Migowski, Stein, et al., 2018; Migowski, Silva, et al., 2018; Migowski, Dias, et al., 2018; Urban et al., 2017; (AGO) AGO, 2019c presented guideline versions in English and Spanish. One strength of this review is that the authors were fluent in both. Two well-developed assessment tools, AGREE II instrument (Brouwers et al., 2016) and RIGHT statement (Chen et al., 2017) were used to assess quality and reporting. To our knowledge, there were no other appraisals of BC screening guidelines applying both AGREE II and RIGHT. AGREE II is an instrument to measure the quality of the guidelines, whereas RIGHT studies the reporting. However, some of their items overlap (the general and specific aims, the target population and end-users of guidances, the use of systematic reviews to generate recommendations, the evidence and feasibility of these and the editorial independence). See Appendix S2. As previously mentioned (Maes-Carballo, Mignini, et al., 2020), this review demonstrated a correlation between quality and reporting of the CPG or CSs. As any other tools, AGREE II have inherent limitations. It did not include statement of the patient's values and preferences, and they did not measure the strength of the recommendations, which are also recognised as important components to guideline quality.

The subjective character of the data extraction concerning quality and reporting domains and items can be taken as a possible weakness of our review as it may confer bias. For reducing this problem, we chose two experienced BC specialist clinicians who studied the appraisal tool manuals and set up a common comprehension of the grading procedure before the duplicate analysis was undertaken. An independent arbitrator was assigned to solve diversions between reviewers within the individual items, although his work was minimal as the reviewer agreement was excellent (ICC > 90%).

There is a lack of clear rules on the domain and item weighting in scoring tool manuals (Alonso-Coello et al., 2010), so the overall assessments calculated in our review may have limitations. The RIGHT statement (Chen et al., 2017) indicates avoiding obtaining an average score in each guide because it is not clear that the items could be weighted equitably, and a resume score could reduce the quality of the analysis. However, we find them useful to make a comparison between guides because they facilitate in a simplified way to be able to know in which areas CPGs and CSs have remarkable results and in which they do not. It permits to show if there is a correlation between quality and reporting in each guide. There are no thresholds provided to classify high, moderate and poor quality and reporting in the AGREE II (Brouwers et al., 2016) or RIGHT (Chen et al., 2017) manuals. However, we have used formerly published cut-offs (Hoffmann-Esser et al., 2018; Maes-Carballo, Mignini, et al., 2020; Oh et al., 2014) for easier and powerful analysis. We would recommend caution in interpretation as global scores may vary among recommended guides because the domains do not weigh equally in their contribution towards overall quality and reporting.

The CPGs and CSs included were from 2017 onwards, so there is a possibility that some guidelines from distinguished organisations might be excluded. A recent systematic review revealed that updates should be done in <3 years, supporting the choice of our search time threshold (Vernooij et al., 2014). Even though we only included CPGs and CSs, which met all the inclusion criteria, there was diversity between CPGs and CSs included in our review. This is an important observation, and this type of heterogeneity may be inevitable as the guidelines diverge in their development, structure, context, objectives and so forth (Pentheroudakis et al., 2008). Therefore, considering the strengths of our review, the deficient quality and reporting of the guidance documents, the lack of use of systematic reviews for the synthesis of evidence and the almost non-existent following of tools for quality and reporting improvement during their writing are powerful observations.

4.3 Implications

Quality and reporting in BC screening guidelines have not been systematically analysed previously. As we have stated before, the classification of documents selected into CPG or CS was based on their titles, subtitles and methods as reported by the authors. CPGs are ideally based on a systematic review of current evidence ((IOM) IoM, 2011), although this practice is not universal. A CS is typically developed by an independent panel of experts, generally multidisciplinary, convened to review the evidence-based literature on a specific procedure but with a lower and less strict development methodology (Jacobs et al., 2014). CSs are generally intended for controversial areas of breast management (where the evidence is still incomplete), and recommendations are based on the perspective of experts. Therefore, they are more likely to have less editorial independence and endorse a specific product with lower quality and higher risks of bias (Jacobs et al., 2014). The avoidance of a systematic review to collate evidence in a CS is a serious methodological deficiency that predisposes them to bias.

This review observed that there was a large scope of improvement even for CPGs and CSs with high overall scores as all have deficient areas. On the other hand, our team had been working in a complementary study (Maes-Carballo, Mignini, et al., 2020) about analysis of quality and reporting in BC treatment guidelines, so both studies, with more than 100 guideline documents analysed, have been correlated in the present article. The analysis of these two aspects of BC care management allowed obtaining a broad vision of all process peculiarities and confronting the weaknesses of each one. Comparing the screening versus treatment guidelines, there is a clear decrease in quality in all the domains except for domain 5 (applicability), which have improved, although punctuation was still poor. Domains 3 (rigour of development), 5 (applicability) and 6 (editorial independence) scored very low. So main goals should be direct to improve all these domains and especially to provide a clear and efficient procedure for updating the guideline (item 13) and to settle an external review by experts (item 14; Appendix S6). Regarding the reporting in guidelines, results between treatment and screening CPGs and CSs were more comparable. Besides domains 4 (recommendations), 5 (review and quality assurance) and 6 (funding, declaration and management of interests) were slightly improved, domains 1 (basic information), 3 (evidence) and 7 (other information) scored worse. New efforts have to be directed to improve these weak areas, particularly describing the selection of all the contributors and their roles (item 9a), specifying the process of formulating a recommendation (item 15), and if costs and resources were considered (item 14b), explaining if there were an external review (item 16) and a quality assessment (item 17) and describing the founding sources (items 18a and 18b) and the limitations of the process (item 22; Appendix S10). Only five CPGs and no CSs have specified the following of AGREE II (Brouwers et al., 2010; Brouwers et al., 2016) instrument in their development, although RIGHT statement (Chen et al., 2017) was never used. There is still a discussion on the cut-off points to define tolerable scores and the weighting of the items and domains. As has been highlighted before, this question should be confronted in future researches. More studies should be also needed to measure the quality of the recommendations. One suggestion to address this issue should be to investigate the similarity of the cited articles supporting the recommendations and compare the differences of direction (favour or against) and strength (strong or weak) of recommendations between guidelines of higher and lower quality and between guidelines and CSs. Nowadays, where the search for quality patient care is a must, it could not be permissible or justifiable that some guidances do not even meet the basic quality and reporting criteria. These deficiencies decrease the quality of healthcare provider.

Comments (0)

No login
gif