Applications of high-throughput reporter assays to gene regulation studies

Gene expression programs are controlled by the binding of transcription factors (TFs) to cis-regulatory elements (CREs) such as promoters, enhancers, and silencers [1]. Understanding the logic and grammar of these binding events is fundamental to determine mechanisms of gene regulation and their role in development, disease, and evolution. This requires multi-pronged studies to identify CREs active in particular conditions, determine the TF motifs contributing to activity, identify the combinatorial mechanisms involved in regulation, and the rational design of CREs with specific functionalities for bioengineering applications or to test different hypotheses of regulatory mechanisms [2]. Although these have been central, long-standing goals in the gene regulation field, they remained mostly elusive for several decades given the large number of sequences that often need to be evaluated.

With the advent of high-throughput sequencing, several complementary approaches have been developed to fill this gap. For example, chromatin immunoprecipitation followed by sequencing (ChIP-seq) [3], DNase-seq [4], and the assay for transposase-accessible chromatin with sequencing (ATAC-seq) [5], have identified regions bound by specific TFs and cofactors, open chromatin, and different chromatin states. These studies have also provided insights into the DNA binding logic of hundreds of thousands of CREs based on combinations of TF binding peaks or footprints [6, 7, 8]. However, these studies do not directly measure transcriptional activity and rely on the natural DNA sequence variability present in the studied genomes. Given that this natural variability only represents a very small fraction of the possible sequence space, these studies are often limited in the hypotheses that can be tested regarding regulatory logic, effect of variants, and design of novel sequences [9].

In the last 15 years, high-throughput reporter assays have emerged as major tools to directly measure putative CRE activity across thousands of sequences simultaneously. These are an adaptation of reporter assays that became increasingly popular in the 80s and 90s, which couple candidate regulatory sequences to reporter genes and unique sequence barcodes, enabling the measurement of transcriptional activity in a quantitative, highly reproducible manner, across tens of thousands of sequences. A pioneering 2009 study laid the ground through a saturation mutagenesis-based in vitro assay to analyze the effect of mutations at every position in bacteriophage and mammalian core promoters, thereby providing a powerful tool for the annotation of functional regulatory elements [10]. Since then, different high-throughput reporter assays, such as massively parallel reporter assays (MPRAs) and Self-Transcribing Active Regulatory Region sequencing (STARR-seq), [10, 11, 12, 13, 14], have been developed to identify and dissect the mechanism of regulation of thousands of CREs, determine the effects of genetic variants, train statistical models that provide a better understanding of regulatory mechanisms, and to rationally design synthetic elements sequences. Given the breadth of this field, it is impossible to mention all the important work performed. This review will therefore provide a broad view of the main experimental approaches currently used as well as their different applications to CRE studies.

Comments (0)

No login
gif