Disorder structural predictions of the native EWS and its oncogenic fusion proteins in rapport with the function *

The Intrinsic structural disorder (ISD) of native EWS and its fusion oncogenic proteins, including EWS/FliI, EWS/ATF1 and EWS/ZSG, was estimated by different Predictors. The ISD difference between the wild type and the oncogenic fusions found in the CTD is due to the fusion partner, usually a transcription factor (TF). A disordered region was found in the sequence (AA 132 156) of the NTD (EAD) of EWS, consisting of the longest region free of Y motifs. The IQ domain (AA 258 280), a Y-free region, flanked by two Y-boxes, is also disordered by all used Predictors. The EWS functional regions RGG1, RGG2 and RGG3 are predominantly disordered. A strong dependence was found between the structure of EWS protein and its oncogenic fusions, and their estimated ISD. The oncogenic function of the fusions is related to a decreased ISD in the CTD, due to the fused TF. The Predictors shown that the different isoforms have similar profiles, shifted with some amino acids, due to the translocations. On the bases of the prediction results, an analysis was made of the EWS sequence and its functional regions with increased ISD to make a relationship sequence-disorder-function that could be helpful in the design of antitumor agents against the corresponding malignances.


INTRODUCTION
Ewing's Sarcoma Oncogene (EWS) on chromosome 22q12 is encoding a RNA binding protein that is target of tumor-specific chromosomal translocations in Ewing sarcoma tumor, Myxoid liposarcoma, Malignant melanoma of soft parts, Desmoplastic small round cell tumor, Peripheral neurectodermal tumour, Angiomatoid fibrous histiocytoma, Extra-skeletal myxoid chondrosarcomas, Rhabdomyosarcoma, Locally destructive tumour, Myoepithelioma tumours of soft tissue, Hidradenoma or eccrine acrospiroma, Mucoepidermoid carcinoma, Neuroblastoma, Olfactory neuroblastoma, Solid pseudopapillary tumour of the pancreas and Acute myeloid leukemia.Around 85% of Ewing tumours carry the EWSR1/FLI-1 fusion.The cellular function, the mechanism of participation in the multiple levels of gene expression and the role of EWS in pathogenesis of the resulting cancers are not well defined.The interaction mechanism of EWS in self-association and oligomerization involves N-terminal and centrally localized amino acids, while for optimal association are required full-length EWS molecules [1].The EWS activation domain (EAD) is located within its N-terminal 286 amino acids, and like many chromatin organizing proteins is an intrinsically disordered (ID) protein [2].
Here is made an attempt to estimate the Intrinsic structural disorder (ISD) of EWS and its reported fusion oncogenic proteins by different methods of prediction.On the bases of the prediction results, an analysis was made of the EWS sequence and its functional regions with increased ISD to make a relationship sequence-disorderfunction that could be used to design antitumor agents against the corresponding malignances.
2) EWS-Fli1 (526 AA); ADX41460.1.The results from different Predictors were compared for the first time on the example of native EWS protein and its oncogenic fusions.A new approach is the analysis on the bases of the prediction results of the functional regions of EWS and its oncogenic fusions to make a relationship sequence-disorder-function.

Feature of the Protein Intrinsic Disorder
Intrinsicaly disordered proteins (IDPs) lack stable tertiary and/or secondary structure and participate in both one-to-many and many-to-one signaling, important for transient protein-protein and protein-nucleic acid interactions.The alternative splicing and posttranslational modifications of IDPs are linked to their functions in cellular regulation, recognition and signal transduction.Human diseases associated proteins, including cancer, are enriched in ID and enter in high-specificity-low-affinity interactions by one-to-many binding mode via plasticity [10].The ID Transcription factors (TFs) can provide significant advantages in response to molecular targets, allowing multiple partners interactions and fine control over binding affinity, thus representing targets for therapeutic drugs.Some types of DNA-binding domains (DBDs) in TFs are highly unstructured in isolation, and undergo a disorder-to-order transition upon binding to specific DNA [11].Modular DBDs, such as zinc finger domains, recognize a DNA sequence in a regulatory region of a targeted area.Chromosomal translocations represent major genetic aberration leading to cancer.The high level of ISD is enabling fusion proteins to evade cellular surveillance mechanisms that eliminate misfolded proteins.Predictions of translocation-related human proteins show 43.3% disorder vs 20.7% in all human proteins, fewer Pfam domains and translocation breakpoints tending to avoid domain splitting [12].The vicinity of the breakpoint is significantly more disordered than the rest of fusion proteins.The ISD is enabling the long-range structural communication in the oncogenic function.The fusion of a DBD to a transacti-vator domain results in an aberrant TF, such as EWS/Fli1, EWS/ATF, EWS/ZSG, EWS/ERG, EWS/WT1, EWS/ CHOP and EWS-CHN.
The ID of EWS was estimated by Predictors: IUPred (long disorder) (Figure 1(a)), RONN (Figure 2(a)), GlobPlot2, DisEMBL (Figure 2(b)), FoldIndex, PONDR, VL3H.A similar distribution with high ID was found in the NTD (EAD, Trans-activation domain (TAD)), but also in the CTD (RBD) of EWS.The prediction of the NTD (AA 1 -240) by Porter, PaleAle, BrownAle, XStout, XXStout, 3Distill, Porter+, SCLpred shown an animal nuclear localization, found also experimentally in vivo and in vitro.Thus, the NTD of EWS consists of large disordered regions and very small amount of short ordered regions, while the CTD is almost completely disordered: 1) A common disordered region was found by all Predictors, containing the AA 132 -156 in the NTD, consisting of the longest region free of Y motifs, and functionally responsible for multiple interactions with different factors.
2) The IQ domain (AA 258 -280, binding calmodulin) contains the NTD longest region, free from repetitive structures, flanked by two Y-containing boxes, and disordered by RONN, IUPred, FoldIndex, PONDR, while by GlobPlot the connecting region has a break in the disorder.
5) The IQ domain (AA 258 -280), containing the second long Y-free region, flanked by two Y-boxes, is disordered by all used Predictors.
6) The RRM region (RNA recognition motif), consisting of about 100 conserved residues, is ordered by Globplot2.

EWS/ATF1
The EWS/ATF1 is a fusion between EWS and the TF ATF1.The bZIP domain (AA 214 -271), consisting of a basic region that directly contacts DNA and a leucine zipper (ZIP) that allows dimerization, is necessary and sufficient for dimerization and DNA-binding [13].Transactivation by EWS/ATF1 does not require dimerization [13].The inhibition of B-ZIP TFs could be therapeutically used in cancer cells (clear cell sarcoma) [14].
IUPRED predicted a globular domain in the last 39 AA (bZIP) at (AA 498 -537) and (AA 393 -432) for isoforms 5) and 4) of EWS/ATF1.The disorder profile is identical in the NTD (AA 1 -264, EAD) and similar in the

EWS/Fli1
The EWS/Fli1 (ETS type, AA 281 -361) is a fusion between the NTD of EWSR1 protein (1 -265) and the DBD of the human FLI1 protein (452 AA) at AA 260.Self-association of EWS and EWS-FLI1 (but not FLI1) and interaction of EWS-FLI1 with EWS and FLI1, was observed in vivo.The NTD (EAD) contributed to homo and heterotypic interactions.EWS-FLI1 self-associates and binds to FLI1 via its CTD DBD [1].

EWS/ZSG
EWS/ZSG is a Zing-finger type oncogenic protein (EFT) with rare frequency in tumors.The IPD of EWS/ZSG, estimated by IUPRED, shown a long disordered region, originated from EWS, followed by a globular domain, comprising residues 357 -584 (long B isoform), 357 -696 (long A isoform) and 357 -604 (protein short isoform), followed by a short disordered region at the Cterminal end of the molecule (Figure 1(d)).The ID of the EAD in EWS/ZSG is analogous to EWS/Fli1 and EWS/ATF1.The profile is the same in the NTD (EAD) and similar in the CTD, where the isoforms differ.The IPD of EWS/ZSG differs from the other EWS oncogenic fusions by the total disorder in the CTD, similar to the native EWS.Predictions of ISD by DisEMBLs shown similar profiles for all isoforms (Figures 3(c)-(e)).Calculated Potential globular domains (GlobDoms) by Rus-sell/Linding definition (Globplot 2) lacked in all isoforms of EWS/ZSG.

EWS/WT1
EWS/WT1 is a zing-finger type oncogenic protein, frequent in DSR tumors.EWS/WT1 self-association maps to the fusion junction (negatively influenced by phosphorylation), where DBD and self-association domains overlap, but the DNA-binding does not depend on selfassociation.The binding of several EWS/WT1 molecules leads to homotypic associations that translate into transcriptional effects [17].In EWS/WT1 the NTD (EAD) was disordered, while the rest of the protein, containing the zing fingers of WT1, are globular (AA 15 -89) estimated by IUPRED.

EWS/ERG
The ERG is an ETS type of TF, similar to Fli1.In EWS/ ERG (type 1e) the NTD (EAD) residues are disordered, and the rest of the protein, containing ERG (AA 84 -247) forms a globular domain (IUPRED).The breakpoint is around AA 80.In the EWS/ERG (type 9e) the globular domain is composed from AA 23 -189.The complete sequence is not available, but the profile of the different isoforms is similar.Erg/Ets-2 dimer formation may prevent Ets-2 from acting as TF [18,19].The monomer is functionally active, while the heterodimeric complexes are inactive.

EWS/CHOP
CHOP is a bZIP type of TF, similar to ATF1.EWS/ CHOP is composed from two highly disordered regions, connected with a globular linker, comprising AA 91 -137, estimated by IUPRED.The complete sequence is not available.

Estimation of the Different Methods of Protein Disorder Prediction
Comparing, the EWS oncogenic fusions show similar ID in the NTD (AA 1 -264, EAD).The CTD disorder of fusions differs from that of native protein and between the fusion proteins.A strong relationship was found between the structure and estimated ID of EWS and its oncogenic fusions, by all used Predictors.Finally, the oncogenic function is related to a decreased IPD in the CTD, due to the fused partner, a TF.The different isoforms shown similar profiles, shifted with some amino acids, due to translocations.All methods follow the same shape with small differences that does not influence the complete profile.Thus, Predictors could be used to study the relationship structure-function-protein disorder of native EWS and its oncogenic fusions.A relation structure/disorder/function was found in some regions of the proteins.The disordered region found in AA 132 -156 of EAD consisted of the longest region, free of Y motifs.The IQ domain (AA 258 -280), a Y-free region, flanked by two Y-boxes, is also disordered by all used Predictors.The EWS functional regions RGG1, RGG2 and RGG3 are predominantly disordered.This is consistent with the finding that the particular AA composition of the EAD creates an enabling structure with several critical Tyr residues, dispersed in a polar/neutral environment, favoring hydrogen bonding interactions and flexibility [20].

Relationship between IPD and Multimerization in EWS and Its Oncogenic Fusions
The common structural features limited to TET family members suggest that they bind RNA and/or ssDNA in a unique way [15].TBP dimerization inhibits DNA-binding thus regulating TBP-DNA interaction [21].The formation of inactive homo or heterodimers could be a general mode of regulating transcription factors activity in vivo [22].Multimerization is characteristic for EWS and many EWS fusion oncoproteins.The self-association may be important for the function of EWS and its oncogenic fusions, and realized in different ways.The mechanism may be related to the function of EWS in the normal cells and EWS fusions in cancer cells.The homoand hetero-association may have impact in TF regulation, as well as trans-activation in vivo.The capability of these molecules to associate is closely related to their high level of IPD, flexibility and accessibility to interact with regions inside the same molecule, and with other molecules with same or different nature, thus allowing structural adaptation and multi-partner interactions in their regulative and oncogenic functions.The phenomenon of ISD possibly is linked to the functional regions and the specific interactions undertaken by them.Such is the interaction found in vitro between EAD (AA 1 -57) and hsRPB7 [23].
Disorder Predictors can help identify some local domains within longer regions of disorder.Combining prediction of ID with other techniques provides an alternative strategy for protein structural characterization and drug target identification, small molecule design and assay development.Disordered proteins are potential targets of small molecule therapeutics.Small molecules that disable EWS-FLI1 function with minimal toxicity (sparing hematopoietic stem cells) could potentially provide a therapy for patients with ESFT and other related sarcomas [24].

CONCLUSION
Summarizing, the data from the ID predictions, using several Predictors, have shown a relationship between the functional characteristics of proteins, their structure and amino acid composition.This relationship function-structure-disorder could be used in the design of potential antitumor agents against the EWS fusions related tumors.A contribution of this work is that the fusions oncogenic potential could be studied in the relation functional regions-intrinsic disorder.

Figure 1 .
The results show identical distribution with high ID in the N-terminal domain (NTD) of EWS (EAD, transactivation domain).The ID difference is in the C-terminal domain (CTD) that is RNA binding for EWS native and DNA binding for the EWS fusions, originated from different TFs.The CTD of EWS shown full disorder, while the CTDs in the fusions shown also globular domains, characteristic for the fused TFs.The DisEMBLs predictions of the IPD were shown on Figure3.

Figure 2 .
Figure 2. Predictions of ISD for native EWS protein: (a) Predictions of ID for native EWS by RONN; (b) Predictions of ID for native EWS by DisEMBLs.CTD, where the isoforms differ (Figure 1(c)).The EWS/ ATF1 isoforms have increased ID of regions flanking bZIP (AA 457 -550).The CTD is almost ordered, including the bZIP domain that is folded and linked by highly conserved sequences, mobile and unstructured.The results, obtained by different methods are similar, where the critical elements (particularly the breakpoint), are connected by long segments of structural disorder.The calculated distance/disorder between the oncogenic elements TAD and b-ZIP was of 280/265 AA [12].The region (AA 340 -353) is structured (Disembl) (Figure 3(b)).Potential globular domains (GlobDoms) by Glob-plot2 in EWS/ATF1 are (AA 294 -432 in (4)) and (AA 399 -537 in (5)).The disordered region AA 136 -152 was used to generate Ab against the EWS protein [15].