Field-crop transcriptome models are enhanced by measurements in systematically controlled environments

Systematic measurement of rice transcriptome dynamics across 73 growth chamber conditions

We conducted a massive transcriptome analysis of rice leaves grown in 73 conditions, combining photoperiod length and temperatures in light and dark periods (Fig. 1 and Additional file 2: Table S1). We developed a cost- and space-effective growth chamber (GC), enabling multi-environment tests in controlled conditions (Fig. 1a). By utilizing the GC, we grew two cultivars of rice (Koshihikari and Takanari), both of which were used for transcriptome analysis to construct our previous model [6] (Fig. 1b, c). Koshihikari is a leading japonica cultivar in Japan, while Takanari is a high-yield indica cultivar with some indica-specific features, such as a high photosynthetic rate [30]. Both cultivars are typically grown in the field under a photoperiod of approximately 12 to 14 h and temperatures ranging from 15 to 35 °C in Japan. Before the experiment, we grew all the rice plants for 14 days at 25/21 °C under 12L/12D photoperiod (Fig. 1c). Subsequently, we transferred the plants to the GCs with different conditions, where they were left to acclimate for 2 days. The leaves were sampled on the third day. Although having replicates at the time points would have been ideal, the number of samples was limited due to space limitations in the GCs. Given this limitation, we prioritized increasing the number of time points over biological replication, as this provides more valuable information and is more beneficial for model training. Therefore, in each condition, we sampled rice leaves from one plant per cultivar every 3 h for 8 times (1168 total samples: 8 time points × 2 cultivars × 73 conditions) and conducted RNA-Seq analysis (Additional file 2: Table S2). During the RNA-Seq analysis, replacement of the labeling of samples occurred. By predicting air temperature at the sampling time point from transcriptome data, we corrected the labeling of samples (Additional file 1: Figs. S1 and S2). After omitting samples with different genotypes from the labeling of the sample (Additional file 1: Fig. S1), 1167 samples were used for further analysis (Figs. 1c and 2a). By incorporating the transcriptome data across 73 conditions into the model, we aimed to dissect the environmental factors affecting rice transcriptome dynamics and to improve the prediction of transcriptome dynamics from meteorological data (Fig. 1b).

Fig. 1

Systematic approach analyzing rice transcriptome dynamics across 73 growth chamber conditions for plant environmental response insights. a GC and the preculture condition. Front view of GC (upper left), rice grown in GC (middle left), the parallel control of GC (right), and the preculture condition (bottom). b Schematic diagram of this study. In our previous field transcriptomics study, we developed a statistical model to predict the transcriptome dynamics of rice leaves in the field. The main factors considered in the model included circadian clock, temperature, solar radiation, and plant age. In this study, we obtained diel transcriptome data from the 73 controlled conditions where irradiance and temperature were independently varied. Training the model with the transcriptome data from controlled and field conditions allowed us to dissect causal environmental factors and more accurately predict transcriptome dynamics in the field. c Experimental design of sampling and transcriptome analysis across 73 growth chamber conditions

Fig. 2

Systematic measurement of rice transcriptome dynamics across 73 growth chamber conditions. a Expression levels of OsGI (Os01g0182600) in 73 conditions. In each subpanel, expression levels (log2(rpm + 1)) of Koshihikari and Takanari at each condition are shown in red and blue, respectively. White and gray zones indicate light and dark periods, respectively. The subpanels are arranged according to the light/dark temperature. Within each group, the 8L/16D, 12L/12D, and 16L/8D photoperiod conditions are arranged from left to right, whereas the 0L/24D and 24L/0D conditions are displayed separately. The x-axis, y-axis, and their scale values in the subpanels are the same as those in the legend. b Principal component analysis of transcriptome in 73 conditions. In each subpanel, PCA of Koshihikari and Takanari at each condition are shown in red and blue, respectively. Solid and transparent dots indicate samples at light and dark conditions, respectively. For each cultivar, the dots are connected by lines in the order of sampling time. The subpanels are arranged in the same order as in a. The x-axis, y-axis, and their scale values in the subpanels are the same as those in the legend. Genome-wide average amplitudes of diel oscillation of gene expressions at each condition, which is classified by c light period length, d light period temperature, and e dark period temperature. The number of differentially expressed genes (DEGs) between Koshihikari and Takanari at each condition, which is classified by f light period length, g light period temperature, and h dark period temperature. In c–h, conditions that share the same letter are not significantly different from each other, as determined by multiple comparisons using the Steel–Dwass test (p ≥ 0.05)

Independent effects of light period length and temperature on rice gene expression and cultivar-specific responses

Our massive transcriptome data under controlled conditions allow us to disentangle the independent effect of light period length and temperature on rice gene expression. To summarize the diel transcriptome dynamics, we conducted principal component analysis (PCA) (Fig. 2b). Transcriptomes in continuous dark conditions (0L/24D) were separated from the other conditions. Diel changes in the transcriptome became pronounced under short-day conditions (8L/16D) and with increasing dark temperatures. We also evaluated transcriptome similarity between conditions using t-SNE, another method of summarizing transcriptome data (Additional file 1: Fig. S3). Transcriptomes in continuous dark conditions (0L/24D) were separated from the other conditions, which is consistent with the results of PCA (Fig. 2b). The other samples were mainly classified by cultivar and light/dark (Additional file 1: Fig. S3a). Samples at similar time points (Additional file 1: Fig. S3b) and temperature (Additional file 1: Fig. S3c) were clustered closely. To evaluate the diel changes in the transcriptome, we calculated the amplitude of diel oscillation of gene expression (Fig. 2c–e and Additional file 1: Fig. S4). Under the typical conditions for Koshihikari and Takanari in the field (e.g., 25/20 °C, 30/20 °C, and 30/25 °C [light/dark temperatures] under 12L/12D conditions), the amplitudes (log2(rpm + 1)) of diel oscillation were 3.37, 3.48, 3.55 for Koshihikari and 3.49, 3.70, 3.53 for Takanari, respectively. Under 73 conditions, the amplitudes of diel oscillations varied more widely. The amplitude of diel oscillation tended to be lower under continuous dark (0L/24D) and continuous light (24L/0D) conditions, 2.60 and 2.72 for Koshihikari and Takanari under 0L/24D, and 1.99 and 2.16 under 24L/0D, respectively, compared with the other conditions (Fig. 2c). On the other hand, the amplitude of diel oscillation was significantly higher in 8L/16D (with median values of 3.61 and 3.71 for Koshihikari and Takanari, respectively) than in 12L/12D (3.21 and 3.30) and 16L/8D (3.14 and 3.20) for both cultivars; the amplitude was higher in short-day conditions. The amplitude of diel oscillation was neither different between cultivars nor affected by light period temperature (Fig. 2d). However, it was affected by dark period temperature, where the amplitude of diel oscillation tended to be lower at 15 °C (with median values of 2.80 and 2.94 for Koshihikari and Takanari, respectively) than in the other conditions and significantly lower than that at 35 °C (3.34 and 3.64) (Fig. 2e). This is consistent with the previous study, which showed that the amplitude of diel oscillation of gene expression was lower in winter than in summer in Arabidopsis halleri subsp. gemmifera [7]. Further investigation is required to clarify whether 15 °C during the daytime also decreases the amplitude of the diel oscillation of gene expression.

To clarify the difference in the environmental response of Koshihikari and Takanari, we compared the number of differentially expressed genes (DEGs) between the two cultivars at each condition. Because Koshihikari and Takanari belong to different subspecies (japonica and indica, respectively), some genes showed cultivar-specific gene expression [6, 31,32,33]. To elucidate the effect of environmental conditions, we excluded cultivar-specific genes from the DEGs (Fig. 2f–h). Cultivar-specific genes were defined as those where the between-cultivar difference in the average expression (log2(rpm + 1)) calculated across the 73 conditions was greater than 2, and the average expression in the cultivar with lower expression was less than 0.5. The number of cultivar-specific genes in Koshihikari and Takanari was 483 and 110, respectively. Under typical conditions for Koshihikari and Takanari in the field (25/20 °C, 30/20 °C, and 30/25 °C [light/dark temperatures] under 12L/12D), the number of DEGs was relatively low. The number of DEGs with higher expressions in Koshihikari was 291, 570, and 1048, while those with higher expressions in Takanari were 94, 206, and 350, respectively. The number of DEGs tended to be higher in 0L/24D (with median values of 1350 and 1195 for Koshihikari and Takanari, respectively) and 24L/0D (1364 and 1166) than in the other conditions (Fig. 2f). The absence of environmental fluctuations under constant conditions may exaggerate cultivar differences. The number of DEGs with higher expression in Koshihikari was significantly higher than that with higher expression in Takanari in 8L/16D (with median values of 657 and 315 for Koshihikari and Takanari, respectively) and 12L/12D (1269 and 508) conditions, but the difference between the cultivars was not seen in the other conditions (Fig. 2f). Although the number of DEGs higher in Koshihikari or Takanari was not significantly different at any light or dark temperatures (Fig. 2g, h), the number of DEGs tended to be higher in Koshihikari than in Takanari in the low-temperature conditions: 20 °C in the light period and 15 °C in the dark period. This difference might result from the cold tolerance of the japonica cultivar compared with indica cultivars [34]. Collectively, our results demonstrated that diel transcriptome dynamics and their cultivar differences were more pronounced under atypical field conditions. We also highlighted the effect of light period length on the amplitude of gene expression and the different responses to a low temperature between Koshihikari and Takanari.

Co-expression gene network analysis identifies genes responsive to temperature

We analyzed the co-expression gene network using WGCNA [35]. The co-expression gene network was composed of 22 modules (clusters of co-expressed genes) with the number of genes per module ranging from 20 to 1416 (Fig. 3a). To find modules that have different expression patterns between Koshihikari and Takanari, we compared the mean value of eigengenes between the two cultivars, which was defined as the first principal component of the genes within a module (Fig. 3b). Module 1 was preferentially expressed in Koshihikari. In module 1, genes annotated for defense response (GO:0006952) and tetrapyrrole binding (GO:0046906) were significantly enriched (Fig. 3a and Additional file 2: Table S3). This result is consistent with the previous studies showing the preferential expression of genes related to disease resistance [32, 33] and porphyrin and chlorophyll metabolism [33] in japonica rice, and the differences in blast resistance genes between Koshihikari and Takanari [36].

Fig. 3

Identification of genes responsive to environmental conditions. a Co-expression network constructed by WGCNA. Representative gene ontology (GO) enriched in each module is shown. b A histogram showing the differences in mean expression of module eigengenes of Koshihikari and Takanari. c A histogram showing Pearson’s correlation coefficient between the expression of module eigengenes and air temperature. Expression levels of a gene encoding Hsp 70 (Os01g0840100) in module 5 d at 35/20 °C (L/D) and 20/35 °C (L/D) conditions in Koshihikari and e 73 conditions in Koshihikari and Takanari at each air temperature or light/dark condition. f Gene ontology and KEGG pathway significantly enriched in the temperature-responsive module 5. g Gene network and hub genes of the temperature-responsive module 5. h A histogram showing Pearson’s correlation coefficient between the expression of each gene and air temperature

To detect specifically temperature-responsive modules, we calculated Pearson’s correlation coefficient (r) between temperature and the expression level of the eigengenes (Fig. 3c). Module 5 showed a strong positive correlation with temperature (r = 0.69), suggesting that module 5 is composed of temperature-responsive genes (Fig. 3c–e and Additional file 1: Figs. S5 and S6). On the other hand, the other modules showed a low correlation with temperature. The expression of the genes belonging to the modules tended to be constant (e.g., module 2) or regulated by the light/dark cycle (e.g., modules 4, 8, 17, and 18) (Additional file 1: Fig. S5).

To characterize the modules, we tested the enrichment of genes with gene ontology (GO) annotation and KEGG pathway in the modules. Genes annotated for chaperone binding (GO:0051087), protein folding (GO:0006457), and protein processing in the endoplasmic reticulum (KEGG pathway: dosa04141) were significantly enriched in module 5 (Fig. 3a, f and Additional file 2: Tables S3 and S4), indicating that genes belonging to module 5 is composed of heat stress-related genes (Additional file 2: Table S5). We identified three hub genes in module 5 that can have a central role in the regulation of the module (Fig. 3g). Os02g0115900 encodes binding immunoglobulin protein (BiP), which is an endoplasmic reticulum-localized Hsp70 chaperone [37]. Os03g0113700 encodes mitochondrial Hsp70 [38]. Os08g0525600 encodes FK506 binding protein, which is involved in stress response [39]. We also calculated the Pearson’s correlation coefficient (r) of gene expression and temperature to extract genes responsive to air temperature. r ranged from − 0.76 to 0.77 (Fig. 3h and Additional file 2: Tables S6 and S7). Among the 91 genes with positive correlation (r > 0.5), 38 genes were members of module 5, and these genes include heat shock protein genes (Additional file 2: Tables S5 and S6). Expressions of 60 genes were negatively correlated with air temperature (r < − 0.5) (Additional file 2: Table S7). Among the 60 genes with negative correlation (r < − 0.5), Os09g0551600 showed the lowest r. Os09g0551600 encodes an HMG protein (nucleosome/chromatin assembly factor D protein). The HMG protein is involved in various DNA-dependent processes, including transcription, recombination, and DNA repair [40]. Os05g0537400 showed the second lowest r. Os05g0537400 encodes protein phosphatase PP2C50, a major negative regulator of ABA signaling regarding stomata closing in rice [41]. These genes might be important for the adjustment to air temperature change in rice. Collectively, our results identified the temperature-responsive genes and the difference in gene expression between rice cultivars. Our approach, which uses diel transcriptome data across a comprehensive range of temperature conditions, allows for a more accurate evaluation of temperature effects by minimizing the influence of confounding factors such as time-of-day variation, compared to analyses based on single time point measurements. Further analysis of temperature-responsive genes will contribute to plant response to temperature.

Radiation is a better predictor of expression than temperature for the majority of genes

We then compared radiation and temperature as the predictors of gene expression dynamics (Fig. 1b). Under the field conditions, radiation and temperature are correlated with each other. Here we are interested in how the data from the GC, where radiation and temperature were systematically varied independently, enabled the choice of the better predictor. We thus trained statistical models for gene expression dynamics using different training data sets: GC data, field data, or a mixture of them (50% GC and 50% field). Field data were obtained over 3 years from rice plants grown under typical conditions for Koshihikari and Takanari in Japan (see Methods for details). We mainly report the results for Koshihikari in the main text, but the results for Takanari were qualitatively the same (Additional file 1: Figs. S7 and S8). We focused on 474 genes representing various expression patterns in the field. These genes were chosen based on the clustering of microarray data analyzed by Nagano et al. [8] and were representative of each cluster of genes with similar expression (see Methods for details). Among the 474 representative genes, 8 genes showed too low expression levels for model training. We thus trained models for the remaining 466 genes (one model for a gene). We used the R package FIT [16] for the model training. In the model of FIT, we consider plant’s age (days after planting), circadian clock (time of day), and environmental responses as the predictors of gene expression. The sub-model of environmental response consisted of the sum of a gate function (diel changes in environmental responsiveness) applied to environmental variables over a period of time in the past. Radiation or temperature, the two environmental variables most influential on gene expression [8], was chosen for each gene. In the present study, we did not consider the model to consist of both temperature and radiation due to constraints on computation time and parameter optimization (see Methods for details). Because we used different training data sets (GC, field, and the mixture of them), we randomly subsampled 512 data points from each set to standardize the size of different training data. We repeated the subsampling and model training 100 times. We can thus calculate how frequently radiation or temperature was chosen as the predictor for each gene (Fig. 4). When trained with field data, temperature was more frequently chosen than radiation for the majority of the genes (338 out of 466 genes with successful model training), corroborating a previous result with field data [8]. However, when the models were trained with GC or mixed data, radiation was more frequently used than temperature as the predictor of the majority of the genes (GC: 308/466 genes; mix: 378/466 genes) (Additional file 2: Table S8). The choice of the predictor was also more consistent across data subsampling when we used GC or mixed data than field data (Fig. 4b). To characterize the genes, we tested gene ontology (GO) annotation and KEGG pathway enrichment in the genes belonging to clusters whose representative genes were the 338, 308, or 378 genes. While no GO terms and KEGG pathways were significantly enriched in the genes belonging to the clusters of 308 genes, genes annotated for antenna proteins of photosynthesis (dosa00196) and the spliceosome (dosa03040) were significantly enriched in the clusters of 338 genes (Additional file 2: Table S9). Additionally, genes annotated for protein transport and localization (e.g., GO: 0015031 and GO: 0008104) were significantly enriched in the clusters of 378 genes (Additional file 2: Table S9). We also conducted GO and KEGG enrichment analyses on the gene sets better predicted by the other predictor (128, 155, and 85 genes in the field, the GC, and the mixed models, respectively). No significant GO terms and KEGG pathway were significantly enriched in the clusters of 128 genes, and only the ribosome pathway (dosa03010) showed significant enrichment in the clusters of 155 and 85 genes (Additional file 2: Table S9). Similar results were obtained for Takanari (Additional file 2: Tables S10 and S11).

Fig. 4

The number of times temperature or radiation (or neither) was chosen as the predictor of gene expression (Koshihikari). a A single vertical line represents a gene whose green, blue, and gray parts show the frequency where radiation, temperature, or neither was chosen (among the 100 data subsampling). The 466 genes are sorted according to the frequency at which radiation was chosen in the model training with mixed (GC + field) data. b Histograms showing how frequently temperature or radiation was chosen

To identify the genes for which the input data influenced the choice of predictors, we extracted the representative genes for which temperature was consistently chosen with field-data training and radiation was consistently chosen with mixed data training (criterion for consistency: ≥ 80% subsampling). The number of the representative genes extracted was 92 for Koshihikari and 101 for Takanari, where 38 genes were shared (Additional file 2: Table S12). Because individual genes represent clusters of different sizes (number of genes), we estimated the genome-wide impact. Among the 15,907 genes belonging to the 466 clusters, 2929 (18.4% of the total genes) were identified for Koshihikari and 3208 genes (20.2%) for Takanari, with 1167 genes (7.3%) shared. To characterize the genes, we tested the enrichment of genes with gene ontology (GO) annotation and KEGG pathway. While genes significantly enriched in Koshihikari were not observed, genes annotated for vesicle-mediated transport (e.g., GO: 0016192), 1,3-beta-D-glucan biosynthesis (e.g., GO: 0003843), and proteasome (e.g., dosa03050) were significantly enriched in Takanari (Additional file 2: Table S13).

We further analyzed genes for which the same predictor (radiation or temperature) was consistently chosen in both field and mixed data training. For radiation, we extracted 18 and 11 representative genes in Koshihikari and Takanari, respectively, with 4 shared (Additional file 2: Table S14), corresponding to 526 (3.3%) and 251 (1.6%) genes genome-wide, with 93 shared genes (0.6%). These were significantly enriched for flavonoid biosynthesis (e.g., dosa00941) in both cultivars, and for lipid transport (e.g., GO:0006869) in Takanari (Additional file 2: Table S15). For temperature, 4 and 19 representative genes were identified for Koshihikari and Takanari, respectively, with 1 shared (Additional file 2: Table S16), corresponding to 96 (0.6%) and 687 (4.3%) genes genome-wide, with 14 shared genes (0.1%). Enrichment analysis revealed significant associations with protein processing (e.g., dosa04141) in both cultivars and with protein biosynthesis/metabolism (e.g., GO:0043043, dosa03010) in Takanari (Additional file 2: Table S17).

Mixed data improves the prediction of gene expressions in the field

To examine whether model predictions improve quantitatively due to the use of GC results in model training, we evaluated the models’ performances by applying the models to field-grown plants (n = 615) that were not included in the training data. Because the results can depend on training data sizes, we also varied the sizes of subsampled training data (n = 64, 128, 256, and 512; subsampled 100 times for each size). We calculated mean absolute error (MAE) to measure the model performance by applying the trained models to the test data. Thus, we obtained 100 MAE values for 466 genes (Fig. 5a–c). Prediction performance varied substantially across genes, and these differences were relatively consistent across training data subsampling.

Fig. 5

Prediction performances of gene expression models trained with different data sets in Koshihikari. The performances were evaluated by applying the models to the field test data (not included in the training data). a–c Heat maps of mean absolute errors (MAE). The GC + field (c) models were trained with a mixture of GC and field data (50% each). Training data size (n) was also varied. d–h show summary statistics of a–c. d The numbers of genes with poor predictions (log2rpm ≥ 5). e Medians (across 466 genes) of the MAE values. In d and e, data points correspond to training data subsampling, and black horizontal lines represent means. f–h Gene-wise medians of the MAE values compared between models trained with different data sets

As a general trend, the use of mixed data (GC + field) resulted in better predictions than other data sets (Fig. 5d). We considered MAE ≥ 5 (in log2rpm) as poor predictions. The number of genes with poor predictions almost always exceeded 50 when we used GC data alone for model training (Fig. 5d). The genes with poor predictions were much fewer with field-alone or mixed data. The number of genes with poor prediction decreased with increasing training data size for field and mixed training data. Notably, the largest mixed data resulted in the virtual absence of genes with poor predictions. Larger training data sizes for GC models decreased model performances, suggesting overfitting. Medians of the MAEs calculated across genes, which reflect the average performances of the models trained with different data sets, were small when the models were trained with large mixed data (Fig. 5e). On average, MAE median was smaller for GC than for field models (Fig. 5e). Results in Fig. 5d and e suggest that in GC models, whereas many genes showed poor predictions, other genes showed good prediction performance; for some genes GC data appear more informative than field data to predict the field test data.

To identify such genes, we then summarized the MAE in Fig. 5a–c gene-wisely (rather than subsampling-wisely) by calculating medians for the largest training data of n = 512 (i.e., average performance of models for a particular gene) (Fig. 5f–h). When the mixed model performances are plotted against field model performances, 360 out of the 466 genes showed better predictive performance with the mixed model than with the field model, as indicated by points above the diagonal line in Fig. 5g (Additional file 2: Table S18). Similarly, when the GC model performances are plotted against field model performances, 202 out of the 466 genes showed better predictive performance with the GC model than with the field model, as indicated by points below the diagonal line in Fig. 5h (Additional file 2: Table S18). Of these 202 genes, 192 overlapped with the 360 genes that showed better performance in the mixed model than in the field model. As a result, similar GO terms or KEGG pathways were significantly enriched in the genes belonging to clusters of 202 genes (Additional file 2: Table S19). In particular, genes related to protein synthesis and metabolism, such as ribosome (GO: 0005840) and peptide metabolic process (GO: 0006518), as well as intracellular components, such as intracellular anatomical structure (GO: 0005622), were significantly enriched in the genes belonging to clusters of both the 360 genes and 202 genes (Additional file 2: Table S19). Similar results were obtained for Takanari (Additional file 1: Fig. S8; Additional file 2: Tables S20 and S21). These results suggest that genes better predicted by the GC model also contributed to the improved performance of the mixed model.

In the GC data, we created conditions with both positive and negative temperature-irradiance correlations (Figs. 1c and 2a), yet whether we used only positively or negatively correlated data (or both) did not affect the outcome (Additional data 1: Fig. S9).

View original article

GENOME BIOLOGY

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Field-crop transcriptome models are enhanced by measurements in systematically controlled environments

Comments (0)