Rapid reviews methods series: Guidance on team considerations, study selection, data extraction and risk of bias assessment

WHAT IS ALREADY KNOWN ON THIS TOPIC

Compared with full systematic reviews, rapid reviews (RRs) often omit dual processes or use other methodological shortcuts. While this helps accelerate the review process, unreflective use of shortcuts might introduce bias and/or inaccuracies to RRs.

WHAT THIS STUDY ADDS

This paper presents considerations and recommendations for team composition, study selection, data extraction and risk of bias assessment in a RR.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

Considerations and recommendations in this paper should help to conduct RRs quickly while minimising potential errors or bias, so decision-makers in research, clinical practice and health policy can make evidence-based decisions in a resource efficient manner.

Introduction

This paper is part of a series from the Cochrane Rapid Review Methods Group providing methodological guidance for rapid reviews (RRs).1–3 It aims to address considerations around the team composition and the acceleration of study selection, data extraction and risk of bias (RoB) assessment in RRs and to provide templates for practical use.

According to a recent scoping review, study selection, data extraction and RoB assessment are often the most resource-intensive steps during the production of a systematic review (SR).4 These are error-prone steps that require subjective judgement. In full SRs, it is considered best practice that two people independently screen potentially relevant studies, extract data and assess RoB of included studies.5 6

In a RR, methodological shortcuts can be employed to accelerate the timeline. According to a scoping review, 23% of RRs had one person extract data, 6% applied single-reviewer screening, 7% had one person assess RoB and another 7% omitted RoB assessment entirely.7 These methodological shortcuts might lead to a gain in resource efficiency,4 especially if the search yields many records for screening or if many studies are included in the review. However, if not implemented properly accelerated methods could even increase the workload, for example, if a single screener is overinclusive on title/abstract level this could lead to an increased workload when screening full texts. Teams must aim to minimise potential bias in the accelerated approaches taken.

In the following sections, we first address general considerations that are not unique to RRs, but nevertheless important when performing a RR. Study selection is the first step of the review process, where team size, experience and organisation play an important role regarding efficiency. We provide recommendations on piloting, followed by considerations specific to study selection, data extraction and RoB assessment. Table 1 gives an overview of the recommendations, which are discussed in more detail in the following sections.

Table 1

Overview of recommendations for RR conduct

General considerationsTeam characteristics and organisation

RRs may seem ‘easier’ to conduct than SRs because they are perceived to have less methodological rigour. However, in our experience, the review team must include sufficient SR methodological experience to properly plan, conduct, analyse and report a RR and, most importantly, be aware of potential limitations due to the methodological shortcuts.

During study selection, novices to evidence synthesis tend to make more incorrect decisions about the inclusion and exclusion of records than more experienced reviewers who have already worked on several evidence syntheses.8 9 Although the data abstractors’ experience may matter less than initially thought and adjudication leads to reduced errors, skilled extractors will be key to minimising error rates for RRs.10 11 Data extraction and RoB assessment requires training and experience. It is, therefore, important that team expertise and organisation are considered carefully.

In our experience, RR teams should not be too large (ideally 3–5 people), as larger teams can increase inefficiencies. However, a large team can be beneficial during the study selection phase if the number of records is large and the time to complete the review is limited. Conversely, for tasks such as data extraction and RoB assessment, limiting the number of reviewers may increase homogeneity and efficiency. It can be more beneficial if not all team members participate in all review steps.

The review team may also be organised to work on different stages of the review in parallel rather than working as a team on each stage. For example, while one part of the team screens titles/abstracts, other members can screen potentially relevant full texts or start with the data extraction and RoB assessment of the included studies.12 It helps to perform the data extraction and RoB assessment simultaneously, using the same people, so studies only need to be evaluated once. We also recommend using collaborative platforms (eg, Microsoft Teams, Google drive) and/or SR software (eg, Abstrackr, Covidence, DistillerSR, Rayyan) to share documents and manage the review (eg, protocol, screening forms, reports, meeting notes). Videoconference tools can facilitate conflict resolution and regular team meetings.

Piloting

Piloting exercises in RRs allow team members involved in a certain task (eg, study selection) to test the tools and processes of this task on a small proportion of records. This helps ensure that all team members have a common understanding of the task and perform the task consistently and correctly. Piloting is important in RRs, especially if certain tasks rely on a single person’s judgement. If, for example, a researcher extracts the data inconsistently (eg, sometimes number of people analysed, sometimes number of people randomised), this increases the workload for the person verifying the data extraction and could lead to distorted results. Piloting is especially relevant to avoid such problems.

For study selection, we recommend creating a standardised screening form that clearly explains the eligibility criteria10 11 (Example of screening form—see online supplemental appendix 1). The entire screening team should pilot the form using the same records to test whether all team members share a common understanding of the inclusion and exclusion criteria. As with title/abstract screening, we recommend a pilot exercise using the same full-text articles with the entire screening team.10 11 The number of records used in the pilot may depend on several factors, including the total search yield, the complexity of the topic and the experience of the screening team.

For data extraction, we recommend creating and pilot testing a data extraction template. This form should limit data fields to essential data items as discussed with the knowledge users and defined in the protocol.10 11 A list of data items usually extracted into a data extraction form is available in online supplemental appendix 2. This template can be created as a spreadsheet or web-based form or set up in a SR management software. All people involved in data extraction should pilot the data extraction template using the same studies and then compare their results. This can help increase the data extraction accuracy. Pilot testing RoB assessment tools for content is not usually necessary since published and validated tools exist. However, assessing some studies as a team to discuss discrepancies in judgements might be useful too.

Study selection

Critical appraisal tools for SRs list dual-reviewer screening of titles/abstracts and full-texts as a quality criterion.5 6 Dual-reviewer screening means that two reviewers independently assess all records for eligibility, first based on titles and abstracts, then on the full-texts for records included at the title/abstract level. Further, any conflicting judgements about the inclusion or exclusion of papers should be resolved by discussion or consulting a third person.13 14 For RRs, teams can follow this dual approach if the volume of evidence to be reviewed and resources permit. Otherwise, we recommend the following accelerated approaches to study selection.

Reducing the number of human judgements involved

We recommend dual assessment on a proportion of records, for example, 20% .10 The proportion of records might vary depending on the complexity of the topic and/or the number of records yielded by the search. After this dual-screening phase, reviewers must discuss and resolve conflicting decisions and assess how well they agreed. We recommend continuing with single-reviewer screening (ie, each record is screened by one person) of the remaining titles/abstracts only when reviewer agreement is high (at least 80% agreement)15 16 during the dual-assessment phase.10 The team should feel confident that everyone performing single-reviewer screening is able to make correct judgments. In cases where reviewer agreement is low, the review team should proceed with dual-reviewer screening until a better agreement has been achieved.

Although single-reviewer screening of all titles/abstracts may be a practical solution for certain RRs, we do not recommend this for RRs in general. This approach has been shown to miss 13% of relevant studies and is mainly dependent on the reviewer’s experience.17–19 One study also showed that accuracy of single-reviewer screening was lower in a complex review including multiple study designs than in a pharmacological review including solely randomised controlled trials.18

Single-reviewer screening could, however, be a valid approach to exclude records with multiple exclusion reasons or a clear objective exclusion reason (eg, wrong age group). The other records not fulfilling these criteria could be screened dually.20 Another option for title/abstract screening could be to perform single screening and let a second person check all excludes. However, in our experience, this does not save much time and is often difficult to implement in SR software.

We recommend the same approach for full-text screening. After screening about 20% of the full-texts dually and achieving good agreement between reviewers, the team can proceed with single-reviewer screening of the full texts.10 If time and resources permit, a second person could verify the excluded full texts. Review teams will identify incorrectly included studies during the data extraction phase.

Supportive software

A wide range of software tools exists to support study selection (see www.systematicreviewtools.com). According to Harrison et al, these tools vary significantly in terms of cost, scope and intended user audience.21 However, most use similar principles: all identified records can be uploaded to a web platform, distributed between screeners and screened simultaneously. Decisions are automatically documented. Most tools provide a platform where both title/abstract and full-text screening can be conducted. More details on supportive software can be found in another paper of this series.22 Several applications (eg, Abstrackr,23 DistillerSR,24 EPPI-Reviewer,25 Pico Portal,26 Rayyan,27 RobotAnalyst28 and SWIFTActive-Screener29) have incorporated artificial intelligence (eg, active machine learning) to aid in study selection. While fully automating the study selection is not optimal, semiautomation (eg, one human screener+machine learning) is promising, at least in reviews of intervention studies30–33 and could be implemented in RRs. Artificial intelligence also ranks abstracts by relevance, which can be useful for prioritising studies during screening so that studies with a high likelihood for inclusion are displayed in order to be reviewed first. There is some guidance related to the point at which prioritised records no longer need to be screened or screened by only one reviewer, as some software displays the predicted inclusion rate during the screening process. Empirical evidence has shown that a predicted inclusion rate of 95% corresponds to finding around 98%–100% of relevant references during title/abstract screening.32 34

Crowdsourcing

Crowdsourcing is outsourcing tasks to a large community of people, usually via the internet. This can take a variety of formats,35 36 for example, microtasking, which involves breaking a task down into microformats to create a simple, discrete classification or categorisation task. This crowdsourcing mode is particularly well suited to tasks that involve processing large amounts of information or data, such as an SR’s title/abstract screening stages.

Using a crowd in the review production process is challenging from a technical point of view. Tools are emerging to support this contribution model better, but they are currently in their infancy. Cochrane, for example, has been using crowdsourcing for study selection via its citizen science platform Cochrane Crowd (https://crowd.cochrane.org). Cochrane Crowd performed well in study selection. Across four RRs, the title/abstract screening was completed within 48–53 hours and achieved a sensitivity of 94%–100% (compared with the gold standard of dual-reviewer screening).36 Currently, only Cochrane authors can access the Cochrane Crowd via Cochrane’s Screen4Me service. There exist alternatives to Cochrane Crowd, such as Amazon Mechanical Turk (AMT) (https://www.mturk.com). AMT is a microtasking platform where task proposers can create microtasks and provide micropayment (piece-rate payment, eg, £0.05 per record assessed) as a reward for task responders. However, reporting experience using AMT for this task is limited.

Data extraction

Critical appraisal tools for SRs require teams to strive to random errors in data extraction—ideally through dual-reviewer data extraction.5 6 Cochrane methods guidance for SR conduct requires the data extraction to be done independently by two investigators (mandatory for outcome data, highly desirable for study characteristics data), seeking unpublished resources to complete the data extraction and using a piloted data extraction sheet.13 In RRs, the following accelerated approaches for data extraction can be considered.

Reducing the number of human judgements involved

One accelerated method of extracting data is having one person extract the data, with a second person verifying the data for accuracy and completeness.10 11 Dual, independent data extraction has been reported to take longer per study than the data extraction verification and does not yield significantly different results.37 As extraction errors are frequent, single-data extraction, especially of outcome data, without verification by a second person is discouraged.38 To reduce the time spent during data verification, it is helpful for the initial extractor to highlight the extracted data in the electronic versions of the included papers. Extraction should also be limited to only the most important data fields to address the RR question, as determined in the review protocol.

Where available, reviewers can also extract data directly from existing SRs rather than from their included studies.10 11 In a case study on medical treatment for premature ejaculation, this approach did not alter the conclusions of the RR.39 However, this approach requires high-quality SRs with good reporting. Teams could also use data repositories, to download data from completed SRs or upload it for future reviewers (eg, the SR data repository from AHRQ40 or Mendeley Data).41 The use of such repositories could help increase reuse of data, however, the upload of data is time-consuming. Further details on how to address issues of finding multiple SRs, poor quality SRs and when/if to update an existing SR are addressed in another paper of this series.42

Supporting software

A wide range of software tools to support data extraction is available (see www.systematicreviewtools.com). The most helpful tools support all steps of the review process (eg, Covidence, DistillerSR), as information and details may be shared across the review processes. To the best of our knowledge, tools that automatically extract reliable data do not exist, but some tools can save time by assisting reviewers in the extraction process (eg, the ExaCT tool automatically detects and highlights data items).43 In RRs that include studies in multiple languages, translation software such as DeepL or Google Translate can also be helpful.44

RoB assessment

Current guidance for SR conduct requires RoB assessments to be done by two people independently,13 using published assessment tools, such as the Cochrane Risk of Bias Tool 2.0 for randomised controlled trials.45 Showing support for judgements and incorporating these judgements into the synthesis are also required.13 This approach is encouraged in the RR processes if the timeline and number of included studies permit; if not, the following accelerated approaches for RoB assessment can be considered.

Reducing the number of human judgements involved

Table 2 gives an overview of study design specific RoB tools recommended by Cochrane; the list is not exhaustive as non-Cochrane RR may use other validated tools. One accelerated approach is to use tools that are less complex and, therefore, faster to complete (eg, Cochrane Risk of Bias Tool 1.0 vs 2.0) and to limit the assessment (for outcome-specific questions) to only the most important outcomes, as determined in the review protocol.10 11 Another approach is for one reviewer to perform the RoB assessment and for a second reviewer to verify the judgements.10 11 Complete omission of RoB assessment is discouraged, as this information informs the interpretation of the evidence and review implications.

Table 2

Risk of bias (RoB) assessment tools recommended by Cochrane

Supporting software

A wide range of software tools and complex spreadsheets exists to support RoB assessment (see www.systematicreviewtools.com). Machine learning tools are also available, such as Robot Reviewer (www.robotreviewer.net), which assesses the RoB and extracts supporting information in randomised controlled trials automatically (for some of the RoB questions in the Cochrane Risk of Bias Tool V.1.0). Such software can assist during RoB assessment but cannot yet replace humans.46 47

Conclusion

Streamlining study selection, data extraction and RoB assessment can save time and resources during the RR process. However, shortcuts may come with increased risk (eg, missing one or more relevant studies, increasing data extraction errors). Therefore, piloting the steps of the review process with the team members that will perform them is essential in RRs. Every review team should include sufficient SR methodological experience to conduct the RR properly and be aware of potential limitations of methodological shortcuts. Novices to evidence synthesis should have a direct line of communication with experienced team members to resolve issues early in the process. Review teams should consider that it is unnecessary to employ methodological shortcuts at all stages of an RR and that the accelerated methods maybe differ from RR to RR. In a RR, for example, that identifies only a small number of records, dual-reviewer screening and resolving conflicts might save more time than single-reviewer screening with an overinclusive screener. Review teams should not be discouraged by an increased workload when using supportive software for the first time.48 After a learning curve has occurred, the use of software increases efficiency.

Data availability statement

Data sharing not applicable as no datasets generated and/or analysed for this study.

Ethics statementsPatient consent for publicationAcknowledgments

We would like to thank Sandra Hummel for administrative support.

留言 (0)

沒有登入
gif