HIV-1 Env gp 120 C 2 V 5 Potential N-Linked Glycosylation Site ( s ) ( PNGs ) Variations and Amino Acid Length Polymorphisms among Infected Family Members

Objective: To ascertain the role of HIV-1 gp120 env PNGs variations and sequence length polymorphism following transmission events as possible supporting forensic evidence to determine directionality of HIV transmission. Method: An observational study of HIV-1 infected family members, where median and range values of the amino acid lengths and PNGs for the genotyped C2V5 region were calculated. Wilcoxon rank-sum test was used to determine differences in these parameters between different family members. Results: For heterosexual transmission, two mothers had longer C3 sequences relative to that of their spouses; p = 0.006 and p = 0.025 whilst the opposite was observed for one mother, p = 0.028. No clear trends were observed for PNGs. In three families, index children had longer C2V5 amino acid sequences compared to their mothers; p= 0.013, 0.040 and 0.043. Second siblings’ V4 and V5 sequences were generally shorter relative to the maternal ones; p = 0.039 and 0.028, respectively. Generally adults had longer V3 amino acid sequences compared to the children; p = 0.018. Similar trends were also observed regarding PNGs within the entire C2V5 region, C3 and V4 sub-regions; p= 0.0025, 0.005 and 0.008, respectively. First siblings’ C2V5 and C3 sequence lengths were significantly longer relative to those of the second siblings; p = 0.005 and 0.007, respectively. Conclusion: Our results are suggestive that HIV-1 env C2V5 amino acid length polymorphism and PNGs tend to increase with age and HIV disease progression. Though sensitive and should be cautiously handled, it is tempting to propose the directionality of the HIV transmission events with respect to C3 sequence length polymorphisms. Correlating HIV-1 env C2V5 amino acid length polymorphism and age of infection may be the first step towards a possible valuable piece of forensic evidence which may be useful in criminalisation of willful HIV infections. However, bigger studies are warranted to substantiate the authenticity of this potentially useful application.


Introduction
Human immunodeficiency virus type 1 envelope glycoprotein 120 (HIV-1 env gp120) is composed of relatively conserved constant (C) C1 to C5 and variable (V) V1 to V5 regions [1].It is ranked one of the most heavily glycosylated proteins known in nature with N-linked glycans constituting over 55% of its molecular weight [2,3].This extensive glycosylation is known to play an important role in evasion of the host immune response by masking key neutralization epitopes and presentation of the glycosylated env (glycan shield) to the immune system as "self" [4][5][6][7][8][9].Under host immune response or antiretroviral therapy selection pressures, it is postulated that HIV-1 evolves towards a denser glycan shield [10], an observation also supported by others [11][12][13].According to this hypothesis, shorter variant with fewer glycans are expected during earlier time points of infection whilst longer V1 -V5 forms with more glycans are expected to evolve at a later stage in response to prolonged immune pressures [14,15], making it possible to assign the directionality of transmission.
Regardless of the HIV-1 subtype, the number of potential N-glycosylation site(s) (PNGs) within the gp120 env gene (env) is conserved at about 25 [10,16].However, there is a tendency for heterosexually and perinatally transmitted viruses to have shorter env variable loops and fewer PNGs [14,[17][18][19][20][21].Changes in the number of PNGs and env gp120 V1-V5 amino acid lengths have been associated with striking a balance between transmission competence and resistance to immune challenges [14,[21][22][23].Whilst glycosylation within the HIV-1 env V3 loop may change the viral co-receptor usage [24], loss of PNGs within the V4 region has been correlated with dementia [25].Intra-host HIV-1 PNGs diversity varies, with some PNGs being evolutionarily conserved while others may be present in some hosts but absent in others and some may even appear or disappear during the course of infection [3,10].HIV-1 subtype B env gp120 V4 loop has been shown to appear as swarms of heterogeneous N-linked glycosylation variants due to mutations, insertions and deletions (indels) which consequently affect amino acid sequence lengths and PNGs distribution patterns [26,27].
Although glycosylation of HIV-1 remains the main obstacle to viral control and eradication, it is possible to exploit the protective glycosylation profiles of gp120 against the virus for the development of an env based vaccine candidate [26].For this reason, understanding glycan shield glycosylation patterns is critical.There is a paucity of data in Zimbabwe regarding the number and distribution patterns of HIV-1 subtype C gp120 env PNGs and sequence length polymorphisms in adults following heterosexual transmission, let alone in vertically infected children.More so, gp120 env characteristics of the unusually prolonged survival of HIV-1 infected but drug naïve pediatric patients are not well documented.Our study population of HIV-1 infected family members provided a unique opportunity to investigate gp120 env PNGs patterns and amino acid sequence lengths polymorphisms following heterosexual and subsequent parent to child transmission (PTCT) events, a unique scenario that has not been reported elsewhere.

Study Population and Procedures
Described are HIV-1 transmission events of close contacts of whom the time and directionality of vertical transmission was known but unknown for heterosexual transmission.The unit of analysis was a family.Four families were willing to participate in our study.HIV-1 infected families labeled 205, 366, 375 and 567 consisted of biological parent(s) and children, index child (older/first sibling) and index child's sibling (younger/second sibling), constituted the study population.The index child was defined as the first child to be recruited into our study.Two families comprised each of both parents and a respective biological index child.The other two families were composed of parent(s) and two subsequent biological children, the first and second siblings.For Family 567 the father figure was missing as he was working in another regional country.Each family member was HIV-1 infected and none had received antiretroviral therapy at the time of sample collection.
Consent was obtained from the respective pregnant mother of each family participating in the national PMTCT programme in Harare peri-urban mother and child clinics that were known to be HIV-1 positive at 36 weeks gestation.Spouses also consented to participate in the family HIV genetic study.Similar recruitment and procedures were followed as previously described [27].All the infants were breast fed for at least nine months.First siblings' blood samples were collected at 60 ± 10 months of age as there were insufficient sample volumes from their respective first available HIV-1 positive samples.In cases of second siblings the first available HIV-1 positive samples were genotyped and sampling time was at about 15 ± 3 months of age.Nucleic acid extraction, PCR amplification, cloning and DNA sequencing methods for the HIV-1 env gp120 C2V5 region were done as previously described and so was HIV-1 subtype determination [28].

Data Analysis
Data were entered and analysed using Stata version 10.The number and distribution of PNGS including C2V5 sequence length polymorphisms were determined for both the parent(s) and children.To compare the sequences' lengths and number of PNGs between family members, a non parametric Wilcoxon rank-sum (Mann-Whitney) test was used.Tests of statistical significance included the 95% confidence interval of relative risks and two sided p-values.Similar analysis was done after stratifying the study population by age and gender for both heterosexual and vertical transmission events.Indels assessment along the sequences was done manually.The 520 base pair nucleotide sequences were translated to amino acid sequence using the Gene Doc program.The amino acid sequences in their fasta formats were entered into a glycosylation analysis site: http://www.hiv.lanl.gov/content/hiv-db/GLYCOSITE/glycosite where PNGs were marked and counted.Variations in the number and location of PNGs within the entire C2V5 region and sub-regions were analysed.Median and range values of sequence lengths and PNGs were calculated for each family member as previously described by others [14].

Ethical Consideration
The study was approved by the Medical Research Council of Zimbabwe (MRCZ) and the Ethical Review Committee in Norway.Written consent to participate in the HIV-1 genetic research study was obtained from the parent(s) who also consented on behalf of their minors.Study participants were free to discontinue at any time without any prejudice.Parents also consented to have their blood samples and those of their children to be used for other future HIV related research studies.

Characteristics of the Four Families
Three of the four couples were in a monogamous marriage except for Mother 375 who was in a polygamous relationship.All parents had at least seven years in school and were of low economic status.Mean age of the mothers and their spouses were 34.8 years ± 3.2 years and 38 years ± 2 years, respectively.

HIV Infections
Parents of the four families were HIV-1 positive at enrolment but did not know when and how they got infected.Mode of HIV-1 acquisition was most likely heterosexually as none mentioned any history of blood transfusion, drug abuse or homosexuality except for mother 366 who had a history of blood transfusion hence it could not be substantiated whether she got infected heterosexually or through blood transfusion.Index children of families 366, 375 and 567 and second siblings of families 205 and 567 were HIV-1 DNA PCR negative at delivery and six weeks postpartum but later got infected through breastfeeding.Index child 205 was HIV-1 DNA PCR negative at delivery.However, he did not turn up for his 6 week postpartum visit hence his time of infection was not definite.

Global Analysis of Families' C2V5 Sub-Regions Nucleotide Sequences
Consistent feature of families sequence analysis showed the highest variation within the env gp120 V4 and V5 and interestingly also within the C3 sub-regions.Indels characterized the genetic diversity of these three sub-regions.
In contrast amino acid substitutions constituted the main cause of variability within the constant domains C2, C4 and to a lesser extent V3 region, see Figure 1.Of note was a 12 base pair nucleotide insertion observed in Family 366 Mother-Infant pair viral sequences that was absent in the purported father`s HIV sequence.

Glycosylation Patterns and Amino Acid Lengths of the Variable Loops, V3-V5
Of the variable regions, V3 was relatively conserved regardless of family member age or sex with no insertions, deletions or shifts in PNGs that characterized V4 and V5 domains.Families 205, 366 and 375 V3 amino acid lengths were all 35, see Tables 1-3.Contrary, most sequences of 567 family members were of 33 amino acids in length; consequently they had the least number of PNGs.Despite having the optimal amino acids length within the V3 region, Family 375 had a just single PNGs, see Tables 1-6.Remarkably was the presence of the conserved GPGQ motif within the V3 crown of all family members' viral sequences that was preserved following both heterosexual and vertical transmission events.On the contrary, 2/5 of Mother 366 HIV-1 clones had GPGR motif instead.V4 region showed extensive amino acid length polymorphism and PNGs variation ranging from median (range) 28(24 -35) and 3(2 -6) respectively, see Tables 1 and 6

Glycosylation Patterns and Amino Acid
Lengths of the Constant Regions

Heterosexual Transmission
There were no significant gender differences with respect to amino acid lengths between fathers and mothers for the entire C2V5 length, 207 (199 -215); 208 (198 -216), p = 0.734.However, after stratifying by sub-regions, there was a general tendency for mothers of families 366 and 375 to have longer C3 region amino acid lengths C2 and C4 regions sequence lengths were exactly the same for all families" heterosexual partners but differed with respect to C3, V4 and V5 sub-regions.Applying the postulated hypothesis to C3 and to a lesser extent V5 sub-region sequence lengths [14], it is tantamount to tempting to propose the direction of heterosexual transmission, a very sensitive issue and should be handled carefully.A male to female (MTF) heterosexual transmission was suspected for family 205 with the father probably infecting his wife whilst female to male (FTM) transmission events were possible for families 366 and 375.Interestingly, though not statistically significant, an exact opposite trend was observed for the V4 variable region, see Table 1.

PTCT and Sequence Lengths and PNGs
There were no significant V4 region amino acid lengths differences with respect to the C2, V3 and to some extent C4 sub-regions between mothers and first siblings.However, for the entire C2V5 region there was a tendency  (10 -13) 0.010 14 (12 -16) 0.080 ) 0.037 13 (12 -13) 0.046  (10 -13) 0.080 14 (12 -16) 0.457  (12 -12) 0.037 13 (12 -13) 0.046 for the index children to have longer amino acid lengths compared to the mothers p = 0.013, 0.040 and 0.043 for families 205, 375 and 567 respectively, see Table 2. Differences were also observed within the V5 sub-region with p values =0.007, 0.040, 0.040 and 0.456 for Families 205, 366, 375 and 567 respectively, but without any clear trend.Family 205 second siblings' V4 and V5 sub-regions amino acid lengths were both significantly shorter relative to the maternal ones; p=0.039 and 0.028 respectively.Similar observations were also noted for 567 second sibling sequences which were significantly shorter relative to maternal ones p=0.034.Significant differences were also observed in the C3 region but with no clear trends.See Table 2. Interestingly a similar trend like the one observed in the mothers with first siblings was also observed with paternal sequences with respect to the C2V5, C3, and V5 regions.However, there were no differences in sequence lengths between paternal ones and those from second sibling, see For both the entire C2V5 region and sub-region C3 the first siblings had significantly longer amino acid lengths relative to the second siblings; 207(201 -216), 200 (198 -202) and 51(49 -54), 48(47 -52); p = 0.005 and 0.007 respectively.However, the opposite was true for the V5 region where the second siblings had much longer amino acid lengths 13 (12 -14), 11 (14 -19); p= 0.0006.After gender stratification no differences were observed regarding amino acid lengths whilst female siblings tended to have a denser glycan shield relative to their male counterparts, 14 (11 -16); 12 (10 -13); p = 0.039.Glycosylation patterns found in siblings had similar patterns to those found in the parents, see Table 6.

Discussion
N-glycosylation is an essential co-translational modification process where a sugar, glycan, is covalently attached to the amide group of asparagine (N) residues [29].The N-linked glycosylation of the HIV-1 env gp120 is one of the most important viral protective mechanisms to overcome in order to develop an effective neutralizing antibody-inducing vaccine.Since several PNGs are relatively constant across HIV-1 subtypes, there is a great deal of interest in developing a carbohydrate based antigen designed to elicit a humoral immune response.Studies have shown a subtype env sequence length dependency following transmission with subtypes A and C significantly shifted towards shorter lengths though untrue for subtype B [14,20,30].Our results are similar to previous subtype A and D heterosexual transmission where trends were observed for HIV-1 env lengths but no significant differences were observed with PNGs [19].
For PTCT, our study observed children being infected with shorter sequences demonstrated by significantly shorter sequences in the second sibling relative to the maternal or paternal sequences which tended to increase with age or disease progression, probably due to immune selection as shown in the first siblings who had significantly longer sequences and PNGs relative to the parents.Our results are suggestive of the fact that a child acquires more or less the same number of PNGs like the mother and these tend to decrease with time or disease progression as evidenced by the presence of the first V4 region PNGs observed amongst the second siblings who were about 14 months old but was absent or lost in 3 out of 4 of the first siblings who were about five years old.This observation may have something to do with functional elements of V4 region which exist solely to facilitate viral escape during the evolving glycan shield [41].PNGs number differences were also observed in chronic HIV-1 infection [31][32][33].Similar to our findings some pediatric HIV-1 env subtype C studies have also demonstrated no clear pattern of change in the number PNGs [34,35].Mother 567 with the least number of PNGs has since died during beginning of this year suggestive of the fact that loss of PNGs may be associated with advanced disease and probably a risk factor of dying.It would be interesting to correlate PNGs with markers of disease progression.
A consistent feature of families' sequence analysis showed the highest levels of variation within the env gp120, V4 and V5 variable including the C3 regions challenging whether the so-called "constant region" with respect to subtype B is really also a constant one when it comes to subtype C.More so, atypically, our families' HIV-1 subtype C V3 region was relatively constant, again querying the currently common practice of extrapolating subtype B findings to non-subtype B ones.Traditionally, V3 region has been considered a variable domain based on analysis of subtype B sequences.However, the small numbers of clones sequenced may not have been sufficient for statistically sound conclusions.A larger sample size is recommended to substantiate these findings.Interestingly, similar results of a relatively constant subtype C V3 region have been obtained elsewhere [36][37][38][39].Conserved PNGs within the V3 region amongst all family members regardless of age or sex points to Amino Acid Length Polymorphisms among Infected Family Members their potentially important functional elements which may function primarily to mask the large number of neutralizing antibody epitopes defined in this region.As previously reported by others, emergence of deletions within the V4 region was also observed among all family members both adults and children [40].These coupled with amino acid substitutions affected the number and distribution of PNGs resulting in the coexistence within some family members V4 variants each characterized by different amino acid sequences and PNGs patterns.Previous subtype C studies have showed extensive V4 length polymorphism where shorter lengths have been shown to enhance infectivity at the cost of exposing neutralizing epitopes [41].
Some countries including Zimbabwe now seek to institute legislation to hold persons criminally responsible for willfully infecting others with HIV.This law has been unsuccessfully applied as it is very difficult to prove criminal intent on the part of the alleged transmitter and currently there is no supporting forensic evidence to prove directionality of infection.For successful prosecution of willful transmission, ascertaining the association of number of PNGs or the HIV-1 env domain length polymorphism and age of infection may be the first step to provide a line of evidence to that effect, hence the need for bigger studies.

Conclusions
Highest env gp120 region variation characterised by indels were observed within theC3, V4 and V5 sub-regions.Indels characterized s.Our study supports the observation that HIV-1 env C2V5 sequences and PNGs tend to increase with age and disease progression in children.Though sensitive and should be handled carefully, with respect to C3 and V5 sequence lengths it is tempting to propose a possible male to female and female to male heterosexual transmission events for families 205 and 366 including 375 respectively.Ascertaining the association of PNGs or env sequence length polymorphism and period of infection may be the first step towards a possible line of forensic evidence.Hence bigger studies are warranted to substantiate the authenticity of this potentially useful application. .

Table 1 .
Mother 375 consistently had lower numbers of PNGs compared to the spouse, yet the opposite was true regarding amino acid lengths.See

Table 2 .
No clear trend was observed regarding the number of PNGs