STmut: a framework for visualizing somatic alterations in spatial transcriptomics data of cancer

Assembling exome and spatial transcriptomic sequencing data

The manuscript covers two index cases, for which we performed point mutation, copy number, and allelic imbalance analyses, as well as an extension cohort of nine tumors, for which we only performed copy number analysis. The source of tumors from each cohort is described below.

Index cases

Whole-exome DNA sequencing data and spatial transcriptomics data were generated by Ji and colleagues and made publicly available as previously described [18]. Briefly, after the isolation of genomic DNA, it was prepared for sequencing, and libraries were enriched with exome baits (Agilent SureSelect Human All Exon V6). Separate tumor sections were placed on 10X Visium arrays (slide serial number: V19T26-101), hybridized, and prepared for sequencing according to the manufacturer’s protocols. There were two replicates (sequential sections of tissue) from each tumor biopsy, which were processed for spatial transcriptomics. The data from each replicate was processed in parallel and integrated as described below. The DNA sequencing data is available at NCBI Gene Expression Omnibus (GEO) (accession number GSE144237). The spatial transcriptomics data is available from the NCBI GEO database (accession number GSE144239).

Extension cohort

The extension cohort consisted of nine tumors. Three tumors were sequenced from the dermatopathology archive at the University of California San Francisco (UCSF). Two of these UCSF tumors were cutaneous squamous cell carcinomas, each adjacent to an actinic keratosis. An actinic keratosis is a benign neoplasm from which cutaneous squamous cell carcinomas can originate. The final UCSF tumor was a melanoma adjacent to a nevus. A nevus, also known as a common mole, is a benign neoplasm from which melanomas can arise. We separately microdissected, sequenced, and called somatic alterations from the benign portions and malignant portions of these tumors. Our sequencing and somatic mutation calling workflow is detailed elsewhere [35]. In addition to bulk-cell DNA sequencing, we also performed spatial transcriptomic analyses, using the FFPE-Visium platform, on separate sections of the three UCSF tumors. The cutaneous squamous cell carcinomas were prepared with the CytAssist platform (10X Genomics), and the melanoma was prepared using the manufacturer’s instructions. The remaining six tumors in the extension cohort were treatment-naïve melanoma lymph node metastases from Pozniak et al.’s [25] study. Spatial transcriptomic analysis of these tumors was performed with the fresh-frozen Visium platform, as described [25]. There was no matching DNA sequencing data from the Pozniak tumors.

Calling somatic alterations from DNA sequencing dataIndex cases

We previously performed a meta-analysis of exome sequencing studies covering cutaneous squamous cell carcinoma where we called somatic point mutations, copy number alterations, and allelic imbalances from these two tumors, among others [19]. A candidate list of somatic point mutations was generated with MuTect (v4.1.2.0, default parameters except for “–minimum-allele-fraction 0.04”) by comparing the tumor sequencing alignments to patient-matched reference alignments. This list was filtered to generate a final list of somatic mutations, as described (https://github.com/darwinchangz/ShainMutectFilter). The point mutation calls are available as part of this manuscript in Additional file 1. Copy number was inferred with CNVkit (v0.9.6, default parameters) [36], and a candidate list of germline polymorphisms was generated with FreeBayes (v1.3.1–19, “–min-repeat-entropy 1 –experimental-gls –min-alternate-fraction 0.05 –pooled-discrete –pooled-continuous –genotype-qualities –report-genotype-likelihood-max –allele-balance-priors-off”) by identifying variants when comparing the normal sequencing alignments to the reference genome. A final list of germline, heterozygous SNPs was inferred by identifying those variants that overlapped with known 1000 genome SNPs and which had variant allele frequencies between 40 and 60%. The raw copy number calls (cnr and cns files produced by CNVkit) and a list of germline, heterozygous SNPs (patient4_hg38_SNPs.txt and patient6_hg38_SNPs.txt files produced with our filtering) are available in the GitHub repository associated with this manuscript (https://github.com/limin321/stmut/tree/master/ResourceFiles/FigureS1SourceData). The somatic mutant allele frequencies and allelic imbalance measurements were used to infer tumor cellularity in these tumors as previously described [19]. The bioinformatic estimates of tumor cellularity were consistent with the histopathology of these tumors.

Extension cohort

The extension cohort consisted of nine tumors, as described above. Three of the tumors in this cohort came from our institution and had bulk-cell DNA sequencing data to accompany the spatial transcriptomic data. In each case, we separately microdissected the malignant tissue, benign precursor tissue, and uninvolved tissue. The uninvolved tissue was used as a source of patient-matched “normal.” We called somatic point mutations and somatic copy number alterations from these tumors, as previously described [35]. The cutaneous squamous cell carcinomas shared point mutations with the actinic keratoses adjacent to them, and the melanoma shared point mutations with the nevus adjacent to it. These observations suggest the neoplasms were phylogenetically related, but since point mutation analyses were not possible on the spatial transcriptomic data (because it was prepared with the FFPE-Visium platform), the point mutations were not further analyzed. We also inferred the copy number from each tissue using CNVkit [36] (v0.9.9, default parameters). Copy number alterations were observed in the malignant tissues but not in their precursors or in the surrounding normal tissue. The copy number alterations from the bulk-cell DNA sequencing of each region are shown in the top heatmaps of Fig. 3A, C, and E. Copy number inference from spatial transcriptomic data is described below.

Aligning spatial transcriptomics sequencing data to the transcriptome

Fastq files were aligned to the hg38 genome using the Space Ranger pipeline (spaceranger-1.3.0, default parameters) by 10X Genomics, as previously described [18]. This pipeline produces a single bam file with sequencing reads aggregated from all spots. Next, we split this bam file into individual bam files for each spot using the subset-bam script by 10X Genomics (https://github.com/10XGenomics/subset-bam). This script outputs hundreds to thousands of individual bam files, depending on the number of spots, each with sequencing reads matching the barcode tag for individual spots.

Visualizing somatic point mutation reads in spatial transcriptomics data

At this point, somatic point mutations had been identified from DNA sequencing data, and the sequencing alignments from the spatial transcriptomics data had been split into individual bam files based on the spatial barcode tag in each read, resulting in hundreds of bam files per spatial transcriptomics run (one bam file per tissue-covered spot). We next used the mpileup function from samtools (v0.1.19, with parameters “-f GRCh38_genome.fa spot_bam -r chr:Start–End”) to count mutant and reference reads over the somatic mutation sites (defined from the DNA sequencing data) in each of the bam files corresponding to an individual spot. Our script loops through each somatic mutation site from each bam file and is available on GitHub (https://github.com/limin321/stmut) along with an instructional video walking through them on YouTube (https://www.youtube.com/watch?v=pvs_b1ALygA). After counting individual mutant sites from each spot’s bam file, we summarized the mutant allele and reference allele counts within each spot.

Spots were combined into the following groups, as indicated in the legend of Fig. 1: spots with two or more mutant reads, spots with one mutant read, and spots with no mutant reads. Spots with only one mutant read were considered likely to be tumor spots because the probability of a false positive is equivalent to the error rate during the sequencing process, which is low. Nevertheless, these spots were manually inspected to eliminate obvious artifacts. We removed a total of three spots (all from patient 6 replicate 2) that had issues. These mutant reads were in the incorrect orientation and/or had numerous mismatches throughout the read length. Including them would not have affected the conclusions of this manuscript.

Spots with zero mutant reads were further subdivided, as indicated in Fig. 1, based on the number of reference reads, ranging from one reference read to five or more reference reads. Since most somatic point mutations are heterozygous, tumor cells can produce reference reads when the wild-type allele is sampled during sequencing. Therefore, a small number of reference reads does not indicate that the spot in question had no tumor cells; however, the probability that there are no tumor cells underlying a spot increases as the number of reference reads increases in the absence of mutant reads.

Once spots were grouped, we imported their barcodes into the Loupe browser (10X Genomics) and selected customized color schemes to visualize the spots from each group, as shown in the legend of Fig. 1. Two images were exported—a “spots only” image and an “H&E only” image. The tumors from patients 4 and 6 had two replicates each. To merge the data from the replicates, we subtracted the background from the “spots only” image and overlayed the spots from both replicates onto the “H&E only” image of each tissue in Adobe Illustrator.

As a tool for comparison, we also used a program, scReadCounts [11], which was designed to work with single-cell sequencing data, to count mutant reads in spots from spatial transcriptomic data. When spots were treated as single cells, scReadCounts (v1.3.2 default parameters) could be run on spatial transcriptomic data. The output of scReadCounts was not immediately compatible with our scripts, but it could be parsed to produce similar plots as shown in the manuscript. scReadCounts found the exact same spots with mutations as STmut. A small number of spots without mutant reads (i.e., with only reference reads) were detected by STmut but missed by scReadCounts.

Quantifying background signals on a Visium array

As part of the Space Ranger workflow, there is a step in which the user defines the spots overlying tissue. Removing non-tissue spots improves gene expression clustering and principal component analyses by eliminating data points without true signals; however, we sought to use the read coverage over non-tissue spots as a proxy of background signals that may arise from diffusion of mRNA during hybridization.

Towards this goal, we ran the Space Ranger workflow a second time and selected all spots as overlying tissue. UMI counts per gene per spot were exported using the mat2csv command (a function within the Space Ranger software distribution), producing a table from which we could count the number of reads per spot. A heatmap showing the number of reads per spot is shown in Additional file 2: Fig. S3A (note the exponential scale). We also split the aggregate bam file into individual bam files using the 10X Genomics subset-bam script and counted the number of somatic mutant reads per spot, as described above. A Loupe projection showing the localization of mutant spots is shown in Additional file 2: Fig. S3A.

We grouped spots into three categories—non-tissue spots, benign tissue spots, and tumor tissue spots as shown in Additional file 2: Fig. S3. After grouping, we calculated the total number of reads per spot, the number of mutant reads per spot, and the surface area of spots from each group. A table summarizing these statistics is shown in Additional file 2: Fig. S3B. We specifically highlight the number of mutant reads per square millimeter in benign tissue versus non-tissue areas in the bar graph to the right of Additional file 2: Fig. S3B. The error bars correspond to 95% confidence intervals (Poisson test).

Inferring somatic copy number alterations in spatial transcriptomics data

The copy number was inferred from each spot of the patient 6 tumor biopsy. We did not attempt copy number analyses of the patient 4 tumor because the DNA sequencing data did not predict there to be any alterations.

To infer copy number alterations from each spot, we first generated a matrix of unique molecular identifier (UMI) counts from each gene/spot using the mat2csv command from the spaceranger software distribution. We combined the data from replicates 1 and 2 of patient 6 into a single matrix to be processed together.

We used the import-RNA command [24] in the CNVkit (v0.9.9, default parameters) package [36] to convert the UMI counts to logarithmic ratios of gene expression (centered based on the median signal within the dataset itself). This command also filtered out genes with poor expression across the spots, and it assigned a weight to each gene, upweighting genes that are better able to provide copy number information. The weight is an important feature of CNVkit (v0.9.9) that differentiates it from other methods to infer copy numbers from RNA sequencing data. Briefly, CNVkit (v0.9.9) calculates a weight for each gene that is proportional to that gene’s correlation between expression and copy number from cancer genome atlas data—the net effect is that genes whose expression is known to concord with copy number in independent datasets are given more weight. CNVkit (v0.9.9) further modifies the weight based on the variability of gene expression and the absolute level of gene expression within the dataset being analyzed—genes with relatively stable expression and relatively higher expression are given more weight. Collectively, a gene with a high weight can provide a more reliable copy number estimate than a gene with a low weight.

The standard approach to inferring copy number information from RNA sequencing data is to calculate a moving average of expression over a window of genes [15, 22, 23]. We borrowed this concept, but we also sought to incorporate the weights, assigned by CNVkit (v0.9.9). When we originally developed the import RNA command for CNVkit (v0.9.9), we used pre-existing segmentation algorithms that were able to incorporate the weight values for each gene [24]. These segmentation algorithms worked well on bulk RNA sequencing data [24]; however, they did not test well on spatial transcriptomics data because they were originally designed for DNA sequencing data. Therefore, for this manuscript, we wrote an R-script to calculate the weighted median of expression from genes on the same chromosomal arm (https://github.com/limin321/stmut) along with an instructional video walking through them on YouTube (https://www.youtube.com/watch?v=QIDp9TLICuU), offering arm-level copy number inferences across the genome for each spot.

Before proceeding further, we filtered out spots with no UMIs on 2 or more chromosomal arms. We attempted to rescue these spots by combining data from groups of adjacent spots that had been filtered out. After combining data from adjacent spots that had been filtered out, we re-analyzed the data in a second pass. The groups of combined spots had more reads than the individual spots within each group and therefore were less likely to be filtered out on the second pass. When creating groups of spots, we only combined data from adjacent, contiguous spots. In addition, we only combined data from spots assigned to the same gene expression cluster to prevent combining spots encompassing dramatically different populations of cells. Individual spots were grouped together until their total UMI count exceeded 5000 UMIs—typically two to ten spots per group. We have included our grouping script in the GitHub software distribution: https://github.com/limin321/stmut/.

Next, we re-centered the copy number estimates. When CNVkit generated logarithmic ratios of gene expression, it used the median expression of a gene across all spots as its reference point. Consequently, without re-centering, a copy number alteration would appear as a low-level gain (or loss) in tumor spots and a concomitant low-level loss (or gain) in non-tumor spots. Non-tumor spots were inferred by their histology and the gene expression clusters for which they were assigned. For instance, spots assigned to “cluster 4” of patient 6 replicate 1 using the 10X Space Ranger software expressed immune-related genes and tended to overlie lymphocytes—thus, they were classified as non-tumor spots. Any spot with an ambiguous identity was left out of the reference pool. Once we settled upon a reference, we calculated the median copy number signal over each chromosomal arm from the reference pool and subtracted this signal from all spots.

Comparing the copy number alterations inferred from spatial transcriptomics data to the copy number alterations inferred from patient-matched bulk-cell DNA sequencing data

After inferring copy number alterations from spatial transcriptomics data, we sought to compare them to the copy number inferences from the matched DNA sequencing of the tumor. Below is a detailed description of how we performed this comparison for patient 6. Similar analyses were also carried out on the tumors from the extension cohort.

From the DNA sequencing data of patient 6, we identified gains of 1p, 3q, 8q, 9q, 11q, 14q, 17q, 20p, and 20q as well as losses of 3p, 4q, 5q, 10p, 10q, 13p, 13q, and 21q (Additional file 2: Fig. S1B). We calculated a score to identify the spots with copy number profiles that were more similar to the DNA sequencing reference point. The score was calculated as the sum of copy number signals over the regions of known gain minus the sum of copy number signals over regions of known deletions. We also weighed the copy number signals so that they were proportional to the number of genes on each arm—this reduced the influence of small chromosomal arms, whose signal often stemmed from a small number of genes and tended to have more variability.

$$\mathrm=\mathrm}_\mathrm$$

$$\mathrm=\mathrm/\mathrm$$

$$\mathrm= (\mathrm) - (\mathrm)$$

$$\mathrm= }_}\times }_}+}_}\times }_}+}_}\times }_}+}_}\times }_}+}_}\times }_}+}_}\times }_}+}_}\times }_}+}_}\times }_}+}_}\times }_}$$

$$\mathrm=}_}\times }_}+}_}\times }_}+}_}\times }_}+}_}\times }_}+}_}\times }_}+}_}\times }_}+}_}\times }_}+}_}\times }_}$$

A spot with a copy number profile that is more similar to the DNA sequencing reference will have a positive score. However, a positive score can arise by random chance. Thus, to better put these scores in context, we permuted the copy number signals from each spot. Permuting the copy number signals from each spot effectively provides a random sampling of copy number alterations that could, in theory, be observed. After permuting the data, we calculated similarity scores on the permuted data to provide a theoretical distribution of scores that could occur by random chance. We produced 138,400 permuted scores (100-fold more data points than the observed data, which covered 1384 spots). The histogram of permuted scores and observed scores are shown in Additional file 2: Fig. S4A. Our permutation script is available on GitHub—https://github.com/limin321/stmut/blob/master/FigTableScripts/FigTables.md#figure-s4.

We further calculated a false discovery rate for each spot. We counted the number of permuted data points at a given spot’s score or higher and divided by 100 to normalize the size of the permuted dataset relative to observed data—this number was considered the number of false positives at a given score. The total positives were counted from the observed data at a given score or higher. The q-value was calculated by dividing the number of false positives by the number of total positives.

Benchmarking copy number inferences against InferCNV and STARCH

In addition to generating copy number calls with CNVkit-RNA, we also generated calls using InferCNV [15] and STARCH [17]. We ran InferCNV (v1.10.1) under default conditions. A previous study also used InferCNV to make copy number calls on the exact same dataset [16]. In that study, the authors used a reference pool of single-cell RNA sequencing data from patient-matched normal tissue to center their data. Under the default conditions, our data was centered relative to the median signal within the dataset itself. Given these differences in centering strategies, the amplitude of some copy number alterations differs between our analysis and those from Erickson and colleagues [16]. Nevertheless, the most prominent copy number inferences were similar in both our analysis as well as the Erickson analysis.

The highest amplitude copy number calls made by InferCNV (v1.10.1) were not made by CNVkit-RNA (v0.9.9) nor were they evident in the copy number inferred from the DNA sequencing data. We investigated the genes at the center of each alteration, and we noted that they tended to encode clusters of lineage-specific genes. For example, amplifications were predicted in keratinocyte cell populations over genes involved in keratinization. As another example, amplifications were predicted in immune cells over genes involved in immune functions. Given that these copy number alterations were not observed in the DNA sequencing data and that they can easily be explained by the high expression of these genes in certain cell types, we suggest that these are most likely false positives.

The main reason why CNVkit-RNA(v0.9.9) did not make these same calls is because CNVkit-RNA downweighted most lineage-specific genes when inferring copy numbers. Also, CNVkit-RNA only attempted chromosomal arm-level inferences. Of note, the typical spot from this sample only had ~ 1300 UMIs, which corresponds to ~ 700 detected genes (~ 15 genes per chromosomal arm). Given the sparse gene coverage, we elected to restrict our analyses to chromosomal arm-level inferences.

To be sure, there was a set of copy number alterations inferred in the DNA sequencing data as well as in the tumor spots by CNVkit-RNA(v0.9.9), InferCNV (our analysis), and InferCNV (Erickson et al. analysis). Examples include loss of 3p, gain of 3q, loss of 4q, loss of 5q, gain of 11p, loss of 13, and gain of 20. As such, we believe that InferCNV (v1.10.1) can be used to detect copy number alterations in spatial transcriptomics data; however, users should be aware of false positives induced by neighborhoods of co-regulated genes.

To benchmark STARCH, we created a virtual environment with Python 3 on UCSF C4 Cluster to run STARCH. No version information is available on STARCH GitHub. One of the inputs requires a gene mapping file. The GRCh38 reference was used to create this file by mapping the HUGO gene name to chromosomal positions. To better benchmark STARCH, we set n_clusters parameter from 2, 3, 4, and 5 and got outputs as expected. Then, we generated heatmaps from one of the outputs assigning each spot to one of n_clusters clones.

Measuring allelic imbalance in spatial transcriptomics data

To measure allelic imbalance in spatial transcriptomic data, it is imperative to generate a high-quality list of germline heterozygous SNPs to be interrogated. For instance, if a homozygous SNP were mistakenly input into the heterozygous SNP list, then 100% of reads in the spatial transcriptomic data would map to a single allele, implying that mono-allelic expression was occurring. Artifactual SNP calls also pose a challenge and must be removed. The RNA libraries are prepared for sequencing in a different manner than DNA sequencing libraries, and the RNA reads are aligned to the genome with different software. Consequently, artifactual SNPs, which were called in DNA sequencing data, will not necessarily be present in RNA sequencing data, which would, once again, imply mono-allelic expression was occurring. Using a highly specific list of heterozygous SNPs will alleviate these issues, but we nonetheless recommend users to manually inspect sequencing alignments supporting any notable results.

To ensure the quality of our heterozygous SNP calls, we required SNPs to have at least 10-fold coverage in the normal DNA sequencing data, to have variant allele frequencies between 40 and 60% mapping to each allele, and to have been observed in the 1000 Genomes Project in more than 1% of participants. The requirement for high coverage in our reference bam as well as the strict range of allowable allele frequencies ensured that the candidate variants from our data were well supported. The requirement that the variant also be observed in greater than 1% of 1 K genome participants ensured that the variant had been observed in another high-quality dataset, though we likely missed SNPs that are rare in the general population.

While the heterozygous SNPs were defined from the donor’s normal DNA sequencing data, we also counted the number of reads mapping to the ref and alt allele in the tumor DNA sequencing data, and we renamed the more abundant allele in the tumor DNA sequencing data as the “major allele.” This was a meaningful designation when there was a clear-cut allelic imbalance in the DNA sequencing data. However, for much of the genome, the allelic imbalance was not present, or it was too subtle to definitively identify the more abundant allele. Therefore, the “major allele” designation was arbitrary for many SNPs—an assignment based on whichever allele was randomly sampled at greater frequency during DNA sequencing of the tumor.

Once we generated a list of germline heterozygous SNPs, we counted the expression of each SNP’s allele in each spot’s bam file using the mpileup command in the samtools software distribution (v0.1.19, default parameters). Our approach to counting reads mapping to each SNP allele was the same as the approach we used to count reads mapping to mutant and wild-type alleles at somatic mutation sites, as described above. The specific scripts related to these analyses are available here: https://github.com/limin321/stmut/blob/master/FigTableScripts/FigTables.md#figure-4 along with an instructional video walking through them on YouTube (https://www.youtube.com/watch?v=diZDaFUahzc).

Most SNPs had no expression mapping to either allele because they did not reside in the sequenced portion of an expressed gene. Nevertheless, SNPs are relatively common, so there were 1772 SNPs from donor 4 and 2071 SNPs from donor 6 with at least one read of coverage over the SNP site in at least one spot. A list of SNPs and their coverage in each spot is available in the GitHub repository here: https://github.com/limin321/stmut/tree/master/ResourceFiles/Figure4SourceData.

For each SNP from each spot, we plotted the fraction of reads mapping to the major allele versus the total coverage. When coverage is low, one would expect a broader spread in allele frequencies, due to random sampling biases and transcriptional bursts [37], and this is indeed what we observed. At higher coverage, read ratios tended to stabilize at one-to-one ratios mapping to the major/minor alleles. We used these plots to identify SNPs with disproportionate expression of a single allele. A SNP from the immunoglobulin locus of patient 6 primarily expressed the minor allele (Fig. 4C–E), as discussed in the main text. In addition, two SNPs in S100A8 of patient 4 primarily expressed the major allele, but we concluded that these were most likely mapping artifacts. We discuss why these were most likely mapping artifacts in the “Mapping artifacts in SNPs from patient 4” section.

Coverage over most other SNPs was too low to recognize the allelic imbalance in the spatial transcriptomics data. Therefore, we explored allelic imbalance in a hypothesis-driven manner. We identified a region with allelic imbalance over chromosomal arm 3q from the DNA sequencing data of the patient 6 tumor. No tumor spots from patient 6 had greater than 32X coverage over a heterozygous SNP from this chromosomal arm; however, when we visualized the read distribution of all SNPs from this arm, there was a skew towards reads mapping to the major allele.

We also investigated allelic read coverage of SNPs on the X-chromosome of patient 4. Patient 4 was female, and therefore, one would expect mono-allelic expression over heterozygous SNPs on the X-chromosome due to inactivation. We observed mono-allelic expression of X-chromosome SNPs for all but two SNPs. The two outliers occurred in genes known to escape X-chromosome inactivation, as discussed in the main manuscript.

Of note, the tumor from patient 6 also came from a female donor, and we observed mono-allelic expression for all SNPs on the X-chromosome with coverage. However, read depths were extremely low, and coverage across spots was too sparse to perform similar analyses as shown for patient 4.

Mapping artifacts in SNPs from patient 4

In patient 4, there were two SNPs (Chr1:153419253[G/A] and Chr1:153418150[G/A]) that appeared to primarily express the major allele. Upon further inspection, these SNPs are most likely to be mapping artifacts. Both SNPs map to the S100A8 gene. S100A8 is one of 24 genes in the S100 gene family, most of which cluster on chromosome 1q. The genes in this family are extremely homologous, sharing approximately 50% similarity in amino acid sequences [38], making it challenging to unequivocally map sequencing reads to the appropriate genes in this family. This challenge is exacerbated by the 3′ sequencing strategy, utilized by 10X Genomics. Sequencing data consists of 120-bp single-end reads, but many reads are soft-clipped, reducing their effective length, because they extend into the template switching oligo or the poly-A tail. Considering these challenges, we noted that the reads mapping to the major allele of these SNPs mapped similarly well to other S100 genes. In addition, the Chr1:153419253[G/A] SNP was 112 base pairs away from the poly-A tail, yet there was only 112X coverage over the poly-A tail while there was 67,000 × coverage over the SNP site. We did not observe such precipitous drops in coverage over any other gene. Local spikes in read coverage, such as this, are common features of alignment artifacts in RNA sequencing data. Based on this body of evidence, we determined that further evidence was needed to conclude that mono-allelic expression was occurring in the S100A8 gene.

留言 (0)

沒有登入
gif