Differential Expression of Genes Involved in Cell Polarity , EMT and Cell-Fate in Breast Cancer and Corresponding Normal Tissue

Objectives: Cell polarity and epithelial morphology are peculiar features of cells forming the terminal ductal lobular unit, and they are early lost during neoplastic transformation because of an epithelial-mesenchymal transition (EMT). To understand these early events we analyzed a set of 125 genes related to cell polarity, EMT and cell-fate decision in 26 breast cancer specimens and corresponding patient-matched normal tissue. Methods: The difference of gene expression was explored by t-paired test. In addition, to evidence latent variables accounting for genes correlations, a Factor Analysis was applied as exploratory technique. Results: Among the 90 differentially expressed genes, those coding for cell-polarity complexes, apical-junctional components and luminal cytokeratins were overexpressed in tumor samples (suggesting a terminally differentiated phenotype) whereas those coding for stemness-associated features or related with EMT were expressed in normal tissues but not in tumor samples, suggesting the persistence of stem/progenitor cells. Factor analysis confirmed these findings and indicated that the difference between tumors and normal tissues can be synthesized in three main features representative of specific molecular/morphological alterations. Conclusions: The a priori definition of a selected panel of genes and the application of an exploratory statistical approach, greatly contribute to reduce the intrinsic biological complexity of tumor specimens and to describe the difference between tumor specimens and corresponding histologically normal tissues.


Introduction
Breast cancer is a biologically heterogeneous disease arising at the terminal ductal lobular units of the mammary gland.According to the widely accepted model of breast cancer progression, normal epithelial cells of the terminal ductal lobular unit acquire increasingly aberrant molecular and morphological abnormalities, resulting in different subtypes of invasive cancer [1].
To better understand breast cancer biology we focused on a peculiar feature of cells forming the terminal ductal lobular unit: cell polarity.Indeed, epithelial mammary cells are characterized by an asymmetric distribution of cytoplasmic and membrane proteins, termed apico-basolateral cell polarity that is essential to maintain the normal tissue specificity and a correct directional secretion of milk into the lumen [2].During neoplastic transformation, cell polarity and epithelial morphology are early lost because of an epithelial-mesenchymal transi-tion (EMT).Physiologically activated during embryo development, EMT is a process in which polarized epithelial cells, generally organized in cells layers, convert into single fibroblastoid cells characterized by a reduced cell-cell adhesion, loss of apico-basolateral polarity and epithelial markers, and by the acquisition of motility, spindle-cell shape and mesenchymal markers [3,4].Recently, EMT activation has been proposed for the generation of cells with stem-like features including selfrenewal ability [5,6].Such a hypothesis has fuelled the exciting suspect that the disruption of the mechanisms deputed to cell polarity control should somehow be involved in the emergence of the so-called tumor initiating cells.Due to the acquired stem-like features, these cells should be able of reinitiating growth and form distant metastases with relevant clinical implications in terms of prognosis and responsiveness to therapy [7,8].Given the relation between EMT and loss of cell polarity, and the proposed hypothesis that EMT may generate cell with stem cells properties, we investigated the pattern of expression of a selected set of genes related to cell polarity, EMT, and cell-fate decision.To this aim, we interrogated a previously published microarray dataset consisting of 26 primary breast cancers with a patientmatched histologically normal tissue, concomitantly taken from the clinically free of tumor ipsilateral mammary gland [9].

Statistical Analysis
As some genes were recognized by more than a single probe-set, each of which characterized by an individual specificity and sensitivity that differently contribute to gene expression value, a gene expression mean value was calculated after weighting each probe-set for its own sensitivity and specificity score.Specifically, each expression value (already log2-transformed in the original data-set) was multiplied for the semi-sum of sensitivity and specificity scores of the corresponding probe-set.Given the patient-matched samples study design, all statistical analyses were performed considering the difference of gene expression between tumor and corresponding noncancerous adjacent tissue as a new variable.The difference of gene expression was explored by t-paired test.To correct for multiple testing, the false discovery rate (FDR) was estimated [26] and an estimated FDR < 0.01 was considered statistically significant.To evidence latent variables accounting for genes correlations, a Factor Analysis was applied as exploratory technique [27].The number of retained factors was selected according to the scree test [28].To facilitate the interpretation of the factors, Varimax rotation was applied.Loading values lower than 0.3 were not considered.All analyses were performed using open source software R 2.11.1 packages HDMD (http://www.R-project.org).
When we applied a factor analysis to investigate the latent variables intrinsically associated with the 125 genes, three factors were retained according to the scree plot.Genes associated with the epithelial phenotype principally characterized the first factor (Figure 1).In fact, among the genes with a positive loading value we found those coding for tight (CGN, CLDN1, CLDN3, CLDN4, CLDN7, CLDN8, CLDN12, CXADR, MAR-VELD2 and OCLN) and adherens (CDH1, CDH3, CTNND1, JUP and MLLT4) junction components, cell polarity complexes (DLG3, DLG5, INADL, LLGL2, PRKCI and PRKCZ) and coding for a surface marker specifically expressed by well-differentiated mammary (CD24).The concomitant presence of luminal (GATA3, KRT7, KRT8, KRT18 and KRT19) and basal (KRT5, KRT14 and KRT17) markers clearly suggests the coexistence, in tumor specimens, of luminal and myoepithelial cells.Notably, among the genes with a negative loading value we found ASAM, ESAM, and VEGFB (all involved in endothelial cell proliferation), NOTCH1 and NOTCH4 (associated with cell-fate decision), and AKT2 and TWIST2 (involved in EMT).
The second factor (Figure 2) was characterized by genes associated with EMT.In fact, among the genes with a positive loading value we found ID3, ID4, SMAD3, SNAI2, TGFBR2, TWIST1, TWIST2, ZEB1, and ZEB2, all involved in the activation of EMT via the canonical TGF-β pathway.Such a transition towards a more aggressive mesenchymal phenotype is supported by the presence of VIM coding for vimentin, a typical mesenchymal marker, and by the presence of genes involved in endothelial cells proliferation (AMOTL1, JAM2, JAM3 and VEGFC) or cell-fate decision (ALDH1A1, ALDH1A3, JAG1, JAG2, NOTCH2, NOTCH3, NUMB).
Finally, the third factor (Figure 2) was principally characterized by genes associated with a terminal luminal differentiation of the tumor.In fact, among the genes with a positive loading value, in addition to those coding for the apical-junctional complex (MAGI2, MAGI3 MARVELD2 and OCLN) and polarity complexes (DLG1, DLG5, INADL, LLGL2, PARD3B and PARD6B), we found genes associated with an estrogen-dependent phenotype (ESR1, GATA3, and PGR) that provided a more specific characterization of tumor specimens with respect to the "generic" epithelial phenotype described by the first factor.According to the terminal luminal differenttiation, we found the negative loading value of EGFR gene, whose expression is generally negatively associated with that of ESR1.Furthermore, in agreement with the positive role played by TGF-β in mammary gland development, we found several genes (SMAD2, SMAD3, SMAD4 and TWIST1) coding for transcription factors involved in the canonical TGF-β pathway.

Discussion
In the original paper, Pedraza et al. [9] performed largescale transcription profiling on 58 breast cancer specimens, 52 of which corresponding to 26 patient-matched samples.Using significance analysis of microarrays [29], they found 8088 differentially expressed sequences between tumor and corresponding histologically normal tissue, most of which coding for proteins with "binding activity" and involved in various and no better-specified "metabolic processes".To better understand tumor biology, in the present study, e considered only the 26 w atie p set of genes involved in cell polarity and apical junction assembly [11,12], EMT activation [3,4] cell fate-decision and maintenance of a self-renewal state in tumorigenic adult tissues [13,14].Our results indicated that 72% of the selected 125 genes were differentially expressed between tumor and corresponding histologically normal tissue with an estimated FDR < 0.01.In particular, we found overexpressed the genes associated with a welldifferentiated luminal phenotype; complementally we found underexpressed those associated with a basal phenotype, EMT activation or stemness-like morphology.With respect to the corresponding histologically normal tissue, tumor samples differentially overexpressed also the genes coding for estrogen receptor and for the regulators of cellular polarity and adhesion.Conversely, histologically normal tissues expressed the genes coding for some transcription factors involved in EMT and stemness-associated features.Such an apparently paradoxical finding, i.e., the positive association between histologically normal tissues and EMT-and stemness-related genes (more likely expected in neoplastic samples) is not a peculiar feature of breast epithelium since we recently to pleural mesothelioma [30].On the contrary, it suggests, in agreement with the physiological remodeling of the mammary gland, the presence of resident stem/progenitor cells in normal tissue [31,32].Under this light, is not surprising the expression, in normal tissues, of ALDH1A and its paralog ALDH1A3, both coding for proteins recently identified as putative stem-cell markers in several tissue including mammary gland [33].Similarly, is not surprising the expression of JAG1 and its paralog JAG2, as well as of Notch family members (NOTCH1, NOT-CH2, NOTCH3 and NOTCH4) considering the critical role that they play in cell-fate decision in forcing normal mammary stem cells to stay in an undifferentiated state [34,35].The hypothesis is further corroborated by the underexpression of BMI-1 and CD24.In fact, BMI-1 codes for a member of Polycomb group required to maintain the transcriptionally repressive state of many genes throughout embryo development [36] and adult tissues differentiation including mammary gland [37], whereas CD24 codes for a mucin-like cell-adhesion molecule positively associated with a terminally differentiated luminal phenotype, exploited to isolate breast cancer cells with stem/progenitor cell properties [38,39].
Normal mammary gland development and differentiation imply tissue remodeling.Therefore, the expression, in histologically normal samples, of genes involved in EMT and, in particular, the panel of genes related to TG Sørlie, C. M. Perou, R. Tibshirani, et al., "Gene Exs of Breast Carcinomas Distinguish Tumor Subclasses ," Proceeding of the Nationa f the United States F-β signaling pathway is not at all surprising.TGF-β, in fact, is the recognized major inducer of EMT during embryogenesis and cancer progression [40,41].According to the conventional model, TGF-β binds to a heteromeric complex formed by type I and type II receptors and leads to the activation of Smad2 and Smad3 that, after forming a trimer with Smad4, translocate to the nucleus where they bind to Smad-binding elements and activate/repress a plethora of target genes.Through Snail, ZEB and bHLH transcription factors the Smad complex represses the expression of several epithelial markers, including claudins, ZOs proteins, occludin, E-cadherin and cytokeratins, and induces the expression of mesenchymal markers including N-cadherin, fibronectin and vitronectin.Consistent with such a model, in normal tissue we found expressed genes coding for TGF-β receptors (TGFBR2 and TGFBR3), Smad proteins (SMAD2 and SMAD3), ZEB (ZEB1 and ZEB2) and bHLH (TWIST1, TWIST2, ID1, ID3 and ID4) transcription factor family members.Concomitantly, we found underexpressed all genes coding for epithelial markers.In agreement with recent acquisition providing evidence of a negative transcriptional regulation of CD24 expression by Twist [42] we also found a negative association between TWIST1/TWIST2 and CD24 expression as well as a negative association between TWIST1/TWIST2 and CDH1, the gene coding for E-cadherin [43].The observation that in tumor samples TGFB1 and TGFBR1 genes, coding respectively for TGF-β and its type I receptor, were differentially overexpressed, can be explained by considering the well-known dual role of TGF-β that, in addition to its tumor promoter activity, has provided to be a potent inhibitor of epithelial cell proliferation and able to maintain tissue homeostasis [44,45].Consistently, we found overexpressed all genes associated with a well-differentiated luminal phenotype including ESR1, in agreement with the notion that ERα and TGF-β signaling pathways are major regulators during mammary gland development [46].Factor analysis corroborated this hypothesis providing evidence that one of latent variables accounting for genes correlations (namely, factor 3) included genes associated with an estrogen-dependent phenotype and several genes coding for transcription factors involved in the canonical TGF-β pathway.Interestingly, factor analysis clearly indicated that many genes, selected for this study, provided redundant information (as in the case of CLDN3 and CLDN4, CDH1 and CHD3, KRT5, KRT6B, KRT7, KRT8 and KRT18, KRT14 and KRT17, MARVELD2 and OCLN).In addition, it in-dicated that the difference between tumor specimens and corresponding histologically normal tissues are synthesizable in few main features representative of specific molecular and morphological alterations occurring during neoplastic transformation, thus suggesting the application of this statistical approach to reduce, at least partially, the intrinsic biological complexity of tumor specimens.

Figure 1 .
Figure 1.Factor analysis.Schematic representation of genes characterizing the first factor and principally associated with epithelial phenotype.Solid color indicates a positive loading value whereas dashed color indicates a negative loading value.TJs, tight junction components; AJs, adherens junction components; PCs, polarity complexes components; Ang, angiogenesis; CF, cell-fate decision; LMs, luminal markers; BMs, basal markers; EMT, epithelial-mesenchymal transition.

Figure 2 .
Figure 2. Factor analysis.Schematic representation of genes characterizing second and third factors.Second factor (on the