We performed a BLASTP search against the non-redundant protein database, excluding the Gramineae family (taxid: 4479).[47] It was expected to find several polyQ sequences since such imperfect polyQ repeats have been found in at least 17 eukaryotic proteomes.[53] The five top hits with high significance (E-value < 1e−9) and high sequence similarity (> 85% identity) to the celiac α-gliadin polyQ P4022 peptide included proteins from pathogenic organisms and are presented in Table 2, Panel 1. Interestingly, all best five BLASTP-output proteins with up to two mismatches in the P4022-matching sequence belong to the genus of protozoa Eimeria, phylum Apicomplexa. As of September 2020, other proteins were also retrieved from the BLASTP search, not only from Eimeria species but also from other organisms. The ones reported here were consistently ranked as top 5 hits over several BLASTP searches performed over time.
TABLE 2. BLASTP-search hits for the polyQ P4022 sequence (120QQQQQQQQQQQQILQQILQQ139) of α-gliadin Panel 1: Eimeria proteins associated with chicken coccidiosis Hit # Description Species NCBI access number Matching sequence Subcellular /topological localizationa 1 tRNA-splicing endonuclease positive effector E. brunetti CDJ52926.1 422QQQQQQQQQQQQLLQQLLQQ441 T (double-pass)/E 2 Hypothetical protein E. maxima XP_013334286.1 924QQQQQQQQQQQQLLQQLLQQ943 T (single-pass)/I 3 Hypothetical protein E. mitis XP_013355649.1 26QQQQQQQQQQQQLLQQLLQQ45 I/I 4 Hypothetical protein EPH 0046580 E. praecox CDI76686.1 87QQQQQQQQQQQQMLQQILLQQ107 E/E 5 Translation initiation factor 3 subunit 10 E. maxima XP_013336659.1 264QQQQQQQQQQQQVLQQVLQ282 I/I Panel 2: Proteins associated with human coccidiosis 1 Plectin C. cayetanensis XP_026192064.1 124QQHQQQQQQQQQPLQQLLQQ143 I/I 2 Transcription factor kayak C. cayetanensis XP_026189818.1 258QQQQQQQQQQQQIQQQQIQQQ278 E/E 3 Myb-like protein P C. hominis OLQ19473.1 2230QQQQQQQQQQQQQLQQ-LQQ2248 I/I 4 Myb-like DNA-binding domain C. hominis TU502 XP_665879.1 1854QQQQQQQQQQQQQLQQ-LQQ1872 I/I 5 Hypothetical protein C. parvum Iowa II QOY40252.1 2284QQQQQQQQQQQQQLQQ-LQQ2302 I/I 6 Ubiquitin C-terminal hydrolase C. parvum Iowa II XP_626187.1 660QQQQQQQQQQQQQQQQQLQQ679 I/I Note: Panel 1: Best five output proteins (∼ 90% amino acid identity, E-value < 1e-9) retrieved from a stringent BLASTP[47] search against the non-redundant protein database excluding the Gramineae family (taxid: 4479) using the P4022 sequence as a query in non-gluten related proteins. P4022-matching sequences are depicted indicating the flanked amino acids (uppercase numbers) and mismatch residues (underlines). Panel 2: Best two (> 85% amino acid identity, E-value < 7e-10) proteins from each human coccidian parasites: C. cayetanensis (taxid: 88456), and both C. hominis (taxid: 237895) and C. parvum (taxid: 5807). The sequences were retrieved in September 2020. Abbreviation: T: Transmembrane, E: extracellular, and I: intracellular. aSubcellular localization prediction was performed with Protter.[61]Coccidia of the family Eimeriidae, such as Eimeria species, are monoxenes (one-host parasites), a group of obligate intracellular parasites of great interest in vertebrates causing acute enteritis and coccidiosis.[25-28] Seven Eimeria species are recognized to affect chickens: E. acervulina, E. brunetti, E. maxima, E. mitis, E. necatrix, E. praecox, and E. tenella.[54] The different Eimeria species have distinctive characteristics in prevalence, pathogenicity, infection in the intestine, and oocyst morphology.[28, 55]Eimeria tenella affects the paired caeca, leading to extensive bleeding. The presence of lesions due to the second generation of schizonts deeply compromises the intestinal epithelium within the lamina propria.[56]Eimeria maxima infects the mid-small intestine, leading to a thickening of the intestinal lining accompanied by a mucoid to bloody exudate. Eimeria mitis and E. praecox both infect the upper small intestine. Eimeria brunetti and E. necatrix affect the distal small intestine and the colon, being able to cause severe pathology.[28]
From the BLASTP query, only the tRNA-splicing endonuclease positive effector (Hit 1) and the eukaryotic translation initiation factor 3 (Hit 5) have known biological functions. The tRNA-splicing endonuclease positive effector contains a domain belonging to the P-loop containing nucleoside triphosphate hydrolases (InterPro Domain: IPR027417), which are DEAD-like helicases involved in ATP-dependent RNA or DNA unwinding.[57] Moreover, the tRNA-splicing endonuclease positive effector bears two DEXXQ-box helicase domains of the RNA/DNA helicase senataxin (SETX). SETX is involved in transcription, neurogenesis, and antiviral response. Mutations in SETX have been linked to two neurodegenerative disorders: ataxia with oculomotor apraxia type 2, and amyotrophic lateral sclerosis type 4.[58] The eukaryotic translation initiation factor 3, subunit 10, is a component of the eukaryotic translation initiation factor 3 (eIF-3) complex. It participates in several steps of the initiation of protein synthesis.[59] It may regulate cell cycle progression and cell proliferation[60] which may be important steps to intervene in the parasite infection. Up to now, no pathogenic role of the identified proteins has been reported. However, it requires to be investigated, considering that only their gene sequences were uploaded to the GenBank in October 2013 as part of a genomic analysis that studies the causative agents of coccidiosis in chickens.
Until now, only four coccidian parasites have been reported to infect humans: Cystoisospora belli, Cyclospora cayetanensis, and both Cryptosporidium hominis and Cryptosporidium parvum.[24, 27, 48-51, 62]Cyclospora cayetanensis is currently considered the “human Eimeria'' that causes human coccidiosis[27, 63]. Accordingly, based on a reevaluation of the parasite molecular taxonomy, it has been suggested that the human-associated Cyclospora is closely related to Eimeria species, and it has to be considered as a mammalian Eimeria species and associated with traveler's diarrhea[64, 65]. Cyclosporiasis has been reported worldwide in both developed and developing countries, but it is most common in tropical and subtropical areas[24] with a high prevalence in Turkey 5.7% and Peru 4.3%.[66] In 2010, the prevalence rate in endemic areas of 22 countries ranged from 0% to 13% (average 1.7%).[24] Notably, at least 30 outbreaks of cyclosporiasis together with a second coccidia species, cryptosporidiosis, were associated with contaminated water and food over the last two decades worldwide.[24] In North America, 11,500 cases of cyclosporiasis were registered between 2016–2019.[24] In 2011, Sweden reported the two most extensive cryptosporidiosis episodes ever in Europe, affecting around 47,000 people.[67] In the case of CeD, a recent meta-analysis showed that the pooled global seroprevalence is 1.4%.[43] Interestingly, among the European countries, Sweden reported a higher prevalence of patients with CeD (2.6%) than the European average.[24] In Peru where cyclosporiasis is endemic, a recent study shows a CeD prevalence of 1.2% which is one of the highest in South America.[68]
Next, a phylogenetic tree was built using the Eimeria P4022-like-containing proteins and 17 BLASTP-based homologous proteins and revealed that three of the five Eimeria proteins (tRNA-splicing endonuclease positive effector from Eimeria brunetti; Hypothetical protein from Eimeria mitis; and eukaryotic translation initiation factor 3, subunit 10 from Eimeria maxima) clustered with their homologous proteins of the human infecting parasite Cyclospora cayetanensis (Figure S1). This suggested that in terms of phylogeny, they seem to be orthologous proteins, matching with the fact that Cyclospora cayetanensis is considered the “human Eimeria” causing human coccidiosis.[27, 63] Therefore, we performed a second search[47] of the polyQ P4022 sequences restricting the BLASTP search to reported human coccidian parasites retrieving the following outputs (Table 2, Panel 2): Cystoisospora belli (taxid: 482538), Cyclospora cayetanensis (taxid: 88456), and both Cryptosporidium hominis (taxid: 237895) and Cryptosporidium parvum (taxid: 5807). We found more than 100 different polyQ P4022-containing proteins bearing at least six of them with both high sequence identity (> 85%) and significance (E-value < 1e−10) to the gliadin polyQ peptide P4022 (Table 2, Panel 2). No protein belonging to Cystoisospora belli was retrieved from the BLASTP search, perhaps because of a lack of data. The NCBI protein database for this organism only consists of one protein.
Regarding their functions, until now, none of the six selected proteins from human coccidia (Table 2, Panel 2) has been linked to enteric pathogenic processes. Plectin is considered a universal biological organizer that cross-links several elements of the cytoskeleton,[69] and the transcription factor kayak has been proposed to control the circadian behavior in Drosophila.[70] The transcription factor Myb-like protein P, and its Myb-like DNA-binding domain is part of a large gene family of transcription factors with highly conserved DNA binding domains found in insects, higher plants and vertebrates. They are often involved in regulating differentiation and proliferation and are implicated in many tumors.[71, 72] The ubiquitin C-terminal hydrolase of the cysteine proteinase fold seems to be involved in ubiquitin-dependent protein catabolic processes.[73] In August 2021, we performed a new BLASTP search to find new related proteins in an attempt to identify to identify their functions, but not significant similarities were found.
Searching for structural characteristics of the coccidian polyQ P4022-like-containing proteinsThe high sequence similarity of proteins of the coccidiosis-causing Eimeria species and proteins of human coccidian parasites with the α-gliadin polyQ P4022 peptide paves the way for a possible sequence-related mechanism. To interact with partner proteins (e.g., receptors), these sequences need to be exposed to the solvent (see Table 2). Considering the lack of structural information, we performed some initial bioinformatic analysis to search for evidence about structural similarities between gliadin and the discovered proteins. In particular, the localization of the polyQ sequence would support its alleged role in coccidia pathomechanism.
The primary sequence analysis of the P4022-like-containing proteins indicated that they are polyQ proteins, seven of them containing multiple polyQ repeats. According to the definition of Ramazzotti et al.,[53] most of them are classified as impure polyQ repeats. In this regard, the interruption of pure primary polyQ sequences with specific amino acids (up to 25% out of the total polyQ sequence) like leucine, makes the structure less aggregation-prone.[53, 74, 75] A remarkable example is illustrated with the delay of the onset and severity of human neurodegenerative diseases, such as ataxin 1 polyQ involved in Spinocerebellar ataxia type 1.[76-78]
In the previously reported α-2-gliadin model (Figure 2), the region 120–139 that corresponds to P4022 is solvent-exposed, reinforcing the idea that the polyQ stretch could directly interact with target proteins.[79] For all the eleven proteins obtained in the BLASTP search, we performed structural modeling using the PHYRE2 server.[80] Only two protein sequences were obtained with high confidence (more than 90%): the eukaryotic translation initiation factor 3 subunit 10 from Eimeria maxima and the Myb-like DNA-binding domain from Cryptosporidium hominis. The three-dimensional modeling of the two P4022-containing proteins shows that the PolyQ-matching sequences are located in a helical context and partially exposed to the solvent (Figure 2).
Location of the polyQ region in the 3D molecular models. The α-2-gliadin from Triticum aestivum was previously modeled.[79] The eukaryotic translation initiation factor 3 subunit 10 from E. maxima and the Myb-like DNA-binding domain from C. hominis were modeled by the PHYRE2 server[80] in the intensive mode and energy minimized in the same conditions as we previously described[79] to remove any clashes. Models were prepared using VMD 1.9.3[81] represented in surface and colored by secondary structure content as orange (helix), cyan (β-sheet), and white (random-coil). The polyQ and P4022 homology regions are shown in balls and sticks for clarity.Sequence-based disorder probability analysis of the P4022-like-containing proteins from Eimeria and human coccidia showed that the matching regions with the polyQ P4022 sequence are primarily located in disordered areas, making them accessible to the solvent (Figures 3B and 3C).
Disorder probability of target proteins found by BLASTP search of the P4022 peptide. Intrinsically disorder-probability plots of (A) wheat α-2-gliadin from Triticum aestivum and best scored (B) Eimeria proteins, and (C) human coccidian proteins, were calculated using the PrDOS server[83] setting a 5% threshold (false positive rate, horizontal line). The primary sequence representation is given in the number of amino acids. In black, different polyQ sequences are present in each protein; in red, the P4022 and polyQ P4022-matching sequences; and the grey bars depict coiled-coil regions when available in the UniProt database.Five proteins have coiled-coil domains (α-helical super secondary structures) overlapping with the polyQ stretch of the P4022-like sequences (Figure 3B and 3C). Such structural overlapping is known to regulate aggregation, insolubility and activity of polyQ proteins.[82] Thus, these structural features suggest that the P4022-matching sequences may be accessible to interact with the host intestinal epithelium directly, as gliadin does (Figure 3A).
There may be two scenarios in which the solvent-accessible P4022-matching sequences may interact with the host intestinal epithelium without previously digestion of the P4022-like-containing proteins: (I) the P4022-like sequence is part of an extracellular domain of a transmembrane protein; and (II) the P4022-like sequence is part of a cytosolic protein which is secreted in the proximity of the host intestine epithelium. This last situation was reported for many Eimeria invasion-related proteins secreted from parasite apical organelles to fulfill the parasite invasion process, that is, adhesion/locomotion, invasion of the host cell, and intracellular multiplication in the host intestine.[28, 55, 63] In this regard, computational predictions showed that two of the Eimeria P4022-containing proteins (Hits 1 and 2 of Table 2, Panel 1) are transmembrane proteins, exposing only one of them (Hit 1) the P4022-matching sequence to the extracellular space of the protozoa (Figure S2). Moreover, one of the five Eimeria proteins is predicted to be extracellular (Hit 4), exposing the P4022-matching sequence to the parasite's extracellular space. The other two Eimeria proteins are likely to be cytosolic (Hit 3 and 5), being their P4022-like sequences immersed into the protozoa cytoplasm space (Table 2, Panel 1).
Comments (0)