Origin and functional role of antisense transcription in endogenous and exogenous retroviruses

Retroviruses (Retroviridae) are a family of positive-sense single stranded RNA (ssRNA) viruses that infect vertebrates [1]. They are classified into two subfamilies (Orthoretrovirinae and Spumavirinae) and several genera. The defining feature of retroviruses is that they encode a reverse transcriptase (RT, an RNA- and DNA-dependent DNA polymerase) that converts the ssRNA genome into a dsDNA molecule, and an integrase (IN) that inserts the dsDNA molecule into the host genome.

All retroviral genomes include five essential genetic elements: two identical long terminal repeats (LTR), and three essential genes: the structural genes gag and env, and the enzymatic genes pol/pro. The two LTRs are identical direct repeat sequences located at the 5’ and 3’ ends of the genome, they are arranged in a tail-to-head orientation and have regulatory roles: the 5’ LTR contains the promoter driving expression of the viral genes, and the 3’ LTR contains the polyadenylation signal. Retroviral genera that contain only these five elements (alpha-, beta-, and gammaretroviruses) are said to have a “simple” genome. On the other hand, deltaretroviruses, epsilonretroviruses, lentiviruses, and all spumaviruses encode additional regulatory and accessory proteins, and thus have a “complex” genome. In many cases, these genes overlap each other and are encoded in different reading frames. Their expression involves two or three classes of transcripts produced from a single RNA molecule encompassing the entire viral genome: full length and singly spliced transcripts for retroviruses with simple genome, plus multiply spliced transcripts for retroviruses with complex genomes [1].

Over millions of years, now-extinct ancestral exogenous retroviruses infected and colonized their hosts and persist to this day as endogenous retroviruses (ERV) [2,3,4]. While present-day exogenous retroviruses only infect somatic cells, ancestral forms may have also been capable of infecting cells of the germline, which ensured that they may be transmitted vertically and become fixed in the population [4, 5]. Today, ERVs occupy a large fraction of vertebrate genomes. In humans, endogenous retroviruses (HERV) account for ~ 8% of the genome with ~ 700,000 loci [6, 7]. HERVs are broadly grouped into Class I (similar to gamma- and epsilonretroviruses, Class II (similar to betaretroviruses), and Class III (similar to spumaviruses). While most ERVs present a simple genome that includes the five basic genetic elements [4], some (such as HERV-K) can under certain circumstances express additional proteins [8, 9].

Retroviral LTRs as bidirectional promoters

The retroviral 5’ and 3’ LTRs play regulatory roles in viral expression. Specifically, the U3 region of the 5’ LTR – organized in promoter, enhancer, and modulatory domains – contains binding motifs for a wide array of transcription factors that promote and regulate expression of the viral genes. Retroviruses such as HTLV-1 and HIV-1 also express regulatory proteins that function as transactivators and augment transcription from their own 5’ LTR through positive-feedback loops. The HTLV-1 Tax protein promotes transcription by recruiting to the 5’ LTR transcription factors of the ATF/CREB family as well as p300/CBP [10]. The HIV-1 Tat protein recruits P-TEFb to the 5’ LTR, which in turn increases processivity of RNA polymerase II and promotes transcription elongation [11]. At the other end of the proviral genome, the 3’ LTR contains the motifs required for polyadenylation of the pre-mRNA, which ensures its proper processing, nuclear export, stability, and translation [12].

As discussed above, 5’ and 3’ LTRs are identical direct repeats and contain the same transcriptional regulatory elements. While the presence of a polyadenylation signal in the 3’ LTR is essential to proper processing the viral transcripts, the presence of a polyadenylation signal in the 5’ LTR could impact viral expression. The location of the polyadenylation signals separates retroviruses into two groups [13]. On one hand, viruses such as HTLV-1, HTLV-2, BLV, RSV, and MMTV use bipartite polyadenylations signals: an AAUAAA sequence (recognized by the cleavage and polyadenylation specific factor, CPSF) located in the U3 region upstream of the transcription start site (U3-R boundary), and a GU/U-rich sequence (recognized by the cleavage stimulation factor, CstF) positioned in the beginning of the U5 region. Therefore, the full polyadenylation signal is present only at the 3’ end of the viral transcripts (3’ LTR). Further, in RSV and MMTV, the R region is very short, thus keeping the two signals at a functional distance. However, in HTLV-1, the R region is 275 bp, and functional proximity of the two polyadenylation signals is achieved via secondary structure (looping) of the viral transcript [13]. In the second group of retroviruses (e.g., HIV-1, HIV-2, EIAV, MoMLV) both polyadenylation signals (AAUAAA and GU/U-rich sequences) are located in the R region, and therefore are transcribed at both ends of the viral RNA molecule. In the case of HIV-1, two mechanisms ensure utilization exclusively of the polyadenylation signals at the 3’ end of the transcripts: the presence of U3-derived sequences with “polyadenylation enhancer” activity (which are not present at the 5’ end) [14,15,16,17], and the formation of secondary structures that suppress the polyadenylation activity of the signals at the 5’ end of the transcript [18,19,20]. Suppression of the 5’ polyadenylation signal has also been shown to involve binding the of U1 snRNP splicing factor to the major splice donor site at the 5’ end of the viral RNA [14, 21, 22].

Several studies have provided mounting evidence that retroviral LTRs are capable of bidirectional transcription. An early report by Larocca and colleagues showed that the 3’ LTR of HTLV-1 can direct antisense transcription in a Tax-independent manner [23]. A more recent study showed that sense and antisense transcription across the HTLV-1 provirus do not interfere with each other and with the expression of Tax or the HTLV-1 antisense protein, HBZ [24]. The same study also showed that, in the absence of Tax, antisense transcription predominates [24]. A negative sense promoter has also been identified within the 3’ LTR of HIV-1 [25,26,27]. Antisense transcription driven by the LTR of both HTLV-1 and HIV-1 is independent of – or inhibited by – their respective transactivators, Tax and Tat [25,26,27]. Interestingly, the ability to drive bidirectional transcription is not limited to complex retroviruses. Indeed, the LTRs of the simple retrovirus, murine leukemia virus (MLV) and also endogenous retroviruses can direct sense and antisense transcription [26, 28,29,30,31].

Altogether, bidirectional transcriptional activity is widespread among simple and complex exogenous retroviruses as well as endogenous retroviruses. This suggests that this property has an ancestral origin, it has been conserved in present-day retroviruses, and thus it serves a purpose that benefits viral persistence or spread.

Antisense transcription in exogenous retroviruses

HIV-1 and other lentiviruses. The first retroviral antisense gene to be described was the antisense protein (asp) gene of HIV-1 [32]. The asp ORF was identified through sequence analysis of twelve HIV-1 viral isolates, and it maps in the same genomic region as the env gene straddling the gp120/gp41 junction (Fig. 1). The product of this gene – the antisense protein (ASP) – is a polypeptide of ~ 190 residues with high content of hydrophobic amino acids, which suggests an association with cellular membranes [32]. While the original study did not include experimental evidence that the asp ORF encodes an actual protein, it provided clues in support of that conclusion. First was the evidence that the ORF is longer than 100 codons, which is uncommon in DNA strands complementary to known genes [33]. Second, the sequences analyzed showed the presence of signals necessary for production of an antisense mRNA transcript, such as the promoter, poly-A addition signal and site, and the downstream G and T domains. Finally, conserved sequences necessary for protein translation were also detected, including canonical start and stop codons, and a codon periodicity of ‘G-nonG-N’ [32].

Fig. 1figure 1

The HIV-1 antisense gene, asp. The figure shows a schematic representation of the HIV-1 proviral genome with structural and enzymatic genes (gag, pol, env), regulatory genes (tat and rev), and accessory genes (vif, vpr, vpu, nef) expressed from the proviral 5’ LTR. The antisense gene asp is expressed from a negative sense promoter in the U3 region of the 3’ LTR in a manner independent of the viral transactivator, Tat. The negative sense promoter contains binding sites for USF, Ets-1, LEF-1, Sp1 and NF-κB. The antisense transcript Ast is a bifunctional RNA with both noncoding and protein-coding activities. The former is carried out in the nucleus: Ast acts as a lncRNA that promotes epigenetic silencing of HIV-1 by recruiting the histone methyltransferase (PRC2) to the 5’ LTR leading to trimethylation of lysine 27 on histone H3 (H3K27me3), which leads to assembly of the nucleosome Nuc-1, and inhibition of transcription. In addition, Ast is translocated to the cytoplasm where it functions as a mRNA and leads to the expression of the antisense protein ASP. In non-productively infected cells, ASP accumulates in the nucleus, whereas in productively infected cells ASP localizes in the cytoplasm and on the cell membrane in close proximity of the ENV. Further, upon viral budding and release, ASP is also detectable on the viral envelope

Experimental evidence of antisense transcription in the HIV-1 genome first came in 1990 through the use of Northern blot analysis of poly-A + RNA extracted from acutely infected cells [34] (Table 1). Subsequently, the use of RT-PCR allowed to prove HIV-1 antisense transcription in chronically infected T- and myeloid-derived cell lines, and also in clinical samples from early-stage, asymptomatic people living with HIV-1 (PLWH) [25, 35]. Further, the introduction of strand-specific RT-qPCR assays able to avoid artifacts due to endogenous and/or self-priming provided stronger evidence of antisense transcription in the HIV-1 proviral genome [36,37,38]. Antisense transcription was also detected in studies employing high-throughput sequencing methods [39].

Table 1 Antisense transcription activity in endogenous and exogenous retroviruses

The structure of the HIV-1 antisense transcripts and the mechanisms that regulate the expression were the focus of several reports. The earliest identified an antisense RNA of 2242 nt that originated in the R region of the 3’ LTR and terminated in a poly-A tract [25]. These results were confirmed by a later report [40]. However, Landry et al. identified multiple transcription start sites in the U3 region as well as in the nef and env genes, and also a polyadenylation signal in the pol gene [36]. A more in-depth analysis by Kobayashi-Ishihara and colleagues described a major antisense transcript (ASP-L or Ast) of 2574 nt with start site in the U3 region of the 3’LTR and a termination site in the env gene (Fig. 1) [37]. Interestingly, the same study demonstrated that a large fraction of HIV-1 antisense transcripts has a predominantly nuclear localization [37].

The location of the 5’ terminus of the antisense transcripts suggested that their expression is directed by a negative sense promoter (NSP) within the 3’LTR [25], which was shown to have 3- to 9-fold lower activity than that of the HIV-1 positive sense promoter (PSP), and it was inhibited by Tat expression, possibly by directing the transcriptional machinery to the PSP [25, 27]. The report by Michael et al. showed that NSP is a TATA-less promoter, and that the NF-κB and USF binding sites are critical for its activity [25], and a subsequent report identified an Sp1 binding site that is essential for NSP function [41]. Bentley et al. described regions of the 3’LTR with moderate, profound, and variable impact on NSP activity [27]. The segment of the 3’LTR with profound impact on NSP activity was mapped in the U3 region, and it contains binding sites for Sp1, NF-κB, LEF-1, Ets-1, and USF (Fig. 1) [27]. Disruption of the TATA box in the positive strand of the U3 region in the 3’ LTR increased NSP activity, which supports the notion that NSP is a TATA-less promoter and suggests that antisense transcription is under the control of an initiator element (InR) [27, 37]. Indeed, two putative InRs were later identified within the U3 region of the 3’ LTR [42], and a third one within in the R region (Fig. 1) [40]. A recent study from the Matsuoka group showed that HIV-1 antisense transcripts are inefficiently polyadenylated, which promotes their nuclear retention [43]. Our group investigated additional mechanisms involved in regulating the expression and possibly the function of HIV-1 antisense transcripts. First, we reported that the activity of the NSP within the HIV-1 3’ LTR is under epigenetic regulation [44]. In particular, we found the presence of a nucleosome over the U3 region and in close proximity of the nef-3’ LTR boundary (Fig. 1). Assembly and disassembly of this nucleosome is under the control of epigenetic modifications of lysine 9 and 27 on histone H3: acetylation of these residues increases transcriptional activity of NSP and promotes antisense transcription, whereas di- and trimethylation of these residues has the opposite effects [44]. In addition, we identified and precisely mapped post-transcriptional (epigenetic) base modifications deposited on HIV-1 antisense transcripts, which include primarily ribose methylation at multiple adenosine and guanosine residues, and pseudouridylation [45]. Studies are underway that seek to address whether dynamic addition/removal of these modification contributes to regulate the stability, sub-cellular localization, interaction with binding partners, and functional activity of HIV-1 antisense transcripts.

In line with that, studies from several groups including our own have shown that HIV-1 antisense transcripts act as bifunctional RNAs (Fig. 1). In addition to serving as mRNA for the expression of the HIV-1 antisense protein ASP, these transcripts also function as noncoding RNAs that regulate the expression of HIV-1 sense transcripts. An early report showed HIV-1 antisense transcripts reduce the expression HIV-1 Gag RNA, the levels of HIV-1 proviral DNA, and viral production in the culture supernatant [37]. Subsequently, Kevin Morris’ group provided evidence that HIV-1 antisense transcripts promote HIV-1 latency via epigenetic silencing of HIV-1 transcription [46]. In that report, Saayman et al. showed that knockdown of HIV-1 antisense transcripts resulted in a reduction in suppressive epigenetic marks (H3K9me2 and H3K27me3) at the 5’LTR, and they also demonstrated that HIV-1 antisense transcripts interact with the DNA methyltransferases [46]. Our group reported that ectopic overexpression of HIV-1 antisense transcripts lacking protein-coding capacity suppressed basal HIV-1 transcription during latency, inhibited latency reversal, and accelerated re-establishment of latency [38]. In addition, overexpression of HIV-1 antisense RNA maintained high levels of PRC2 and the suppressive epigenetic mark H3K27me3 at the HIV-1 5’LTR even after treatment with LRAs. We also showed that these effects involve interaction with subunits of the epigenetic silencer PRC2 (Fig. 1) [38]. A more in-depth discussion can be found in a recent review by Li et al. [47].

Sequence analyses and computer modeling suggest that the protein encoded by HIV-1 antisense transcripts (ASP) includes intracellular N- and C-termini, two transmembrane domains and an intervening extracellular loop [48, 49]. Additional features of interest are two closely spaced cysteine triplets in the N-terminal portion of the protein and a highly conserved PxxPxxP motif located between residues 40–50 of the protein (Fig. 1). Thorough and systematic experimental analyses are needed to assess the functional role of these and possibly other yet unidentified ASP domains [50]. A very recent study utilized molecular modeling and dynamics simulation to predict the 3D-structures of ASP [51]. The in silico analyses described in that study identified three possible functional domains in ASP, namely the Von Willebrand Factor Domain-A (VWFA), the Integrin subunit alpha-X (ITGSX), and the ETV6-Transcriptional repressor [51]. Wet lab molecular studies are required to confirm the validity of these findings, and to ascertain the role these domains play in the mechanisms of action of ASP. Despite being first identified more than 30 years ago, the role of ASP in the virus lifecycle remains largely a mystery, which is in part due to its low expression levels. Additionally, the hydrophobic properties of ASP make it exceptionally challenging to raise antibodies able to reliably detect it. Nevertheless, expression of ASP has been demonstrated in several cell systems. An early study used electron microscopy to show that ASP associates with plasma, mitochondrial and nuclear membranes [139]. A later report found that ASP localizes to the plasma membrane (with both polarized and unpolarized expression patterns) as well as with cell surface protrusion [49]. More recently, we used flow cytometry and confocal microscopy to study ASP expression in several chronically infected lymphoid and myeloid cell lines and also in acutely infected primary human CD4 + T cells and monocyte-derived macrophages (MDM) [48]. Using a mouse monoclonal antibody against an epitope mapping in the extracellular loop of ASP, we found that ASP displays a polarized nuclear distribution in unstimulated, non-productively infected lymphoid and myeloid cell lines [48]. After cell stimulation and reactivation of productive infection, ASP was transported into the cytoplasm and to the plasma membrane where it colocalizes with the HIV-1 envelope glycoprotein ENV (Fig. 1). We also showed that upon budding and release from infected cells, ASP is present on the viral envelope (Fig. 1) [48].

The role of ASP in viral replication has been explored to a much lesser degree. Clerc and colleagues generated HIV-1 molecular clones with a premature stop codon in the asp ORF, but they were not able to show any appreciable difference in viral replication compared to the wildtype clone [49]. Two reports by the Barbeau group showed that ASP expression can induce autophagy in infected cells [52, 53]. Many viruses utilize autophagy to their advantage during their replication cycle [54]. In the case of HIV-1, autophagy is required for GAG processing during infection of macrophages, and it significantly increases viral production [

Comments (0)

No login
gif