Automated pipelines for rapid evaluation during cryoEM data acquisition

In recent years, cryo-electron microscopy (cryoEM), especially single particle analysis, has experienced an increase in popularity among researchers interested in determining high-resolution structures of biomolecules. Moreover, recent hardware advances in cryoEM have made it possible to obtain atomic resolution reconstructions [1,2]. Lastly, fast, robust, and automated data processing protocols are necessary to keep up with the growing number of scientists who are new to the field.

Automated data collection has made obtaining large datasets for structural biologists easier, faster, and cheaper. The throughput for a 24-h session at the microscope is on the order of 10,000 micrographs [3]. However, quality control is necessary so that a session can be concluded based on results, rather than a set time, in order to maximize the use of the microscope. For cryoEM reconstructions to reach high-resolution, good-quality data is imperative. During data collection, researchers typically monitor the quality of the incoming data by observing metrics such as frame drift during motion correction [4], CTF estimation [5], and ice thickness [6]. Based on these metrics, adjustments to the data collection parameters can be made to optimize data quality. Although these metrics provide a reasonable overview of the data quality, there are limitations imposed by the sample itself. Molecules may adopt a preferred orientation [7,8] in the thin layer of ice, be unstable at the air–water interface or be biochemically heterogenous [9, 10, 11]. All these factors impact the attainable resolution of the map but are difficult to assess without further data processing beyond the simple metrics described above. As a result, even when these metrics are satisfactory, datasets can fail to produce a map at a desired resolution [12], necessitating additional data collection or adjustments to sample preparation. Conversely, sometimes a reconstructed map of the desired resolution can be achieved well before the end of the scheduled data collection session, which reduces the efficiency of resource use if the session is not terminated. Thus, assessing when to pause or end data collection is critical, as multiple unoptimized sessions of data collection, or collection of excessive data, is a waste of resources and reduces the productivity of high-demand, high-end instrumentation. It is thus usually more efficient to collect on high-end instruments using EM grids that have been previously screened on lower-end instruments.

Recently, software packages capable of automated on-the-fly data processing have been developed [13, 14, 15, 16, 17, 18∗∗, 19, 20]. On-the-fly data processing software is not only able to perform motion correction and CTF estimation but also produce 2D class averages, ab-initio models, and 3D reconstructions while data is being collected, requiring minimal human input. The feedback that these software packages can provide allows researchers to make more decisions during data collection. This leads to a direct, positive impact on data collection sessions and an improvement in instrumentation efficiency. Increasing data collection and instrumentation efficiency not only improves data quality and accelerates the rate at which samples are collected, but also improves the utilization of other limited resources crucial for data processing including data storage, computational power, and human time spent on data processing. All of these are limited resources and expanding them can be cost-prohibitive for many.

A wide range of software packages are currently available for rapid on-the-fly cryoEM structure determination. Among them are Warp [20], cryoSPARC Live [13], RELION Live [18,19], and several others. These include TranSPHIRE [14], CryoFlare [17], and Scipion [16]. Each software package employs different strategies for structure determination, resulting in different performances, computational resources needed, and levels of automation. Here we will begin by describing the minimal metrics typically used to assess data quality during data collection, followed by delineating what constitutes an ideal software system capable of on-the-fly data processing and how researchers benefit from having this capability. Lastly, we present an overview of our experience using the four live processing software packages. It's important to emphasize that this work should not be construed as an endorsement for any specific software. Instead, it serves as an example highlighting the unique differences among each of these packages.

留言 (0)

沒有登入
gif