Single-particle cryo-electron microscopy (cryo-EM) is a leading technique for determining the three-dimensional structure of macromolecules at near-atomic resolutions (Frank, 1996, Cheng et al., 2017). In this method, a solution containing the macromolecule of interest is rapidly frozen on a carbon film within a thin layer of vitreous ice, approximately preserving the macromolecule’s native state. The frozen-hydrated sample is then imaged with an electron microscope, acquiring numerous two-dimensional projections of the macromolecule. To achieve a reconstruction at the required resolution, many projections—sometimes numbering in the millions—must be collected.
One key challenge in cryo-EM imaging is the low signal-to-noise ratio (SNR) of the captured images. Because the high-energy electron beam damages the sample, cryo-EM typically employs a low electron dose. While this approach preserves high-frequency content effectively, it reduces the number of electrons interacting with the specimen, resulting in a low SNR (Brilot et al., 2012, Grant and Grigorieff, 2015).
A second challenge is beam-induced motion (BIM). Traditionally, a single experimental image, known as a micrograph, was recorded from each hole in the sample grid (see Fig. 1(a)). However, during imaging, both the ice and the specimen are displaced. The carbon support film deforms, shrinking the hole where the specimen is frozen, while radiation-induced chemical changes simultaneously increase pressure in the ice layer. This induces a drum-like motion, including translations of up to a few nanometers and rotations of a few degrees, affecting the ice and embedded macromolecules (Brilot et al., 2012, Campbell et al., 2012). Consequently, BIM blurs micrographs, severely limiting achievable resolution.
To mitigate this, direct detector technology was developed to capture multiple frames at very short exposures, minimizing BIM within each frame and assuming all motion occurs between frames. However, recording multiple exposures per hole requires significantly reducing the electron dose per exposure compared to that used for a single micrograph. As a result, each frame exhibits an extremely low SNR. To address this, it is standard practice to correct motion between frames and average the motion-corrected frames to enhance SNR (Grant and Grigorieff, 2015, Zheng et al., 2017) (see Fig. 1(b)).
Unfortunately, reducing the raw signal from movies containing multiple frames into a single averaged image imposes several limitations on the reconstruction process. First, the initial frames of the sequence experience the least radiation damage, thus preserving high-resolution structural details most effectively (Brilot et al., 2012). Averaging these initial frames with the radiation-damaged projections of subsequent frames results in a loss of high-frequency information. Second, existing motion correction methods implicitly assume that differences between consecutive frames are limited to translations. In reality, other deformations also occur. Consequently, averaging across these frames leads to additional information loss, potentially compromising the reconstruction quality. While techniques like Bayesian polishing (Zivanov et al., 2019) refine per-particle motion post-picking, they rely on a reconstruction from motion-corrected micrographs, potentially missing any data that was lost during motion correction.
In this paper, we present an approach that directly recovers particle positions from every frame in a cryo-EM movie, challenging the traditional dependence on motion correction and Bayesian polishing. Our motivation is twofold: to introduce a novel particle-picking method, inspired by Structure-from-Motion (SfM) techniques in computer vision, and to enable future reconstruction methods to leverage raw frame data, potentially achieving higher resolution. Notably, our approach serves as an independent first step in the cryo-EM computational pipeline, requiring no prior motion correction, CTF estimation, or initial reconstruction.
Due to the extremely low signal-to-noise ratio (SNR) of the raw frames (Fig. 5.a), particle picking on the frame level presents a significant challenge. Moreover, complex particle movements, such as translations, rotations, and deformations, further complicate consistent identification in the absence of prior motion correction. These obstacles require a fundamentally different, robust, and adaptive picking strategy that moves beyond conventional picking approaches. To address these challenges, our approach relies on two key observations about cryo-EM image sequences: first, the displacement of individual macromolecules between frames is spatially bounded, and second, the pixel size of tomographic projections remains approximately constant (Brilot et al., 2012). By leveraging these insights, we utilize the approximate temporal consistency of particle position and shape to recover their locations. Specifically, we generate multiple weak hypotheses for each macromolecule’s position independently in each frame, achieving consensus through locally adaptive denoising and a voting scheme. Our method draws inspiration from Structure-from-Motion (SfM) in computer vision. In SfM, feature points are extracted from each frame of a movie—often using techniques like SIFT (Lowe, 2004) —and matched across frames based on descriptor similarity and temporal motion consistency. A 3D model is then constructed using these locations from all frames. Similarly, we identify tomographic projections at the frame level and use temporal consistency to pinpoint their positions across frames. As shown in the experimental section, this emphasis on temporal consistency reduces the percentage of outlier projection images and successfully recovers ground-truth particle coordinates.
In summary, this paper establishes a novel approach that operates independently of motion correction, CTF estimation, and initial reconstruction, achieves a lower outlier rate, and provides a crucial component for enabling direct reconstruction from frame data through a frame-based particle picker that functions without motion correction. By doing this, we address a key challenge that allows for a meaningful departure from the conventional pipeline. While particle picking is well-studied (Heimowitz et al., 2018, Zivanov et al., 2018, Redmon et al., 2016, Eldar et al., 2020), to our knowledge, full reconstruction directly from every frame is an unexplored paradigm, and our method provides the first vital step toward this goal.
Comments (0)