Corpus Linguistics Representations on Age Groups in Light of Google Books

The goal of the research lays on identifying social representations around words associated with human beings in Google Books BYU Corpus, in a period of 208 years, from 1800 to 2008. In this paper, the main data findings of a corpus-based investigation are focused on the adjectives preceding such words as man, woman, adolescent, boy, girl, child, and teenager in database. By veri-fying patterns of association between each of these words and immediate collocates, it is possible to infer how these concepts are represented over time. First, queries were conducted in the Corpus. Second, adjectives were selected. Third, these were classified into semantic categories. Fourth, collocates were classified through sentiment analysis. Finally, major representations were in-ferred based on semantic categories and sentiment analysis scores. The word “children” showed different representations: medical, consisting of collocates such as asthmatic, disabled, religious and evaluative. We have concluded that over time, representations of age, health and race increased, while representations of innocence decreased. It can be applied that the collocates that appeared in the latter half-century compared to first half-century give an indication of the current representations. Finally, for children, these include hyperactive and disadvantaged, indicating a shift toward a “problematic” representation of children.

for the past 208 years) uses such a classification to determine how times pass by.
There is a new term mentioned among distinct medias nowadays, which is someone who is "ageless", it is to say, someone that even though may be on their twenties, thirties or forties is someone not represented by their ages. It may bring some issues for discussion, because society tends to positively value more the people who seem to be younger, whether it is physically speaking or in their appearance and way to express themselves. Such valuations are shown in the study according to social representations seems in each age group.

Research Questions
When I started defining the research questions, the way I found the most suitable to clarify the sections to the final reader, lays on answering the following questions in the present research: 1) What representations can be identified in relation to the terms surveyed? 2) Is there a difference between the representations of the masculine and feminine terms? In addition, among children, teenagers and adults?
3) Is there a difference between the terms in relation to valuation (positive charge and negative)?
The main reason why only three questions have been considered for the research is mainly due to the fact that terms of gender throughout the study have proven to be a massive field of study for further scenarios. Also, when it comes to understanding the different stages among young childhood and young adult age groups, it is also a main subject of study that deserves to be studied in a deeper level.
In the following section, methodology it will be further explained and in the chapter of results it will also be answered according to the study.

Methods
This section is dedicated to explaining the method of the present research according to research questions mentioned above. In this study, we performed the analysis of the language patterns of a set of words, in English, referring to the human being and, from these patterns; we try to verify the representations associated with the different ways of referring to being a human being over time. A pattern can be identified if a combination of words occurs relatively frequently and if there is an associated meaning. The data used in this research deal with listings of occurrences of bigrams found in English-language publications indexed by Google Books. Google Books is a collection of millions of digitized publications by the Google Company from library collections around the world.
The format of this database, as well as its extension, will be explained in the section about procedures. Bigrams, on the other hand, are sequences of two words placed side by side in a text. For example: "Brazilian poet", "young women" and "American men". Bigrams are available on the Google Books Ngrams website, which also allows the user to search and produce graphs of the use of these bi- Therefore, our data were bigrams, along with their frequency of occurrence in publications in English indexed between 1800 and 2008. The following tools were used in this research, which are described in procedures section: 1) The Brigham Young University interface for Google Books N-Grams and this interface helps with searches on Google Books N-Grams.
2) Google Book's viewer N-Gram Viewer. Such online interface allows graphical display of N-gram occurrences in the database Google Books N-Gram data.
3) USAS semantic labeler. This online tagger assigns tags semantics to each lexical item submitted. 4) Lexical valuation/sentiment analysis lists by Hamilton et al. (2016a). Hamilton et al. (2016b) made these lists available on the web. They contain the evaluation of the polarity (positive or negative) of thousands of lexical items, distributed in the decades that they occurred in the corpora surveyed by these authors. 5) Script developed by the supervising, Professor Tony Berber Sardinha. This script, written in Unix Shell, performed all the work of preparing and processing the path to evolve the study. On the procedure part, the survey data comes from the Google Books N-Gram Database. This feature is available free of charge at: http://storage.googleapis.com/books/ngrams/books/datasetsv2.html. An N-gram deals with a sequence of words, such that a unigram is a word, a bigram is a sequence of two words, a trigram, a sequence of three words, a quadrigram, of four words and a pentagram, five words. Google Books offers these five options. The Google Books search interface provided by Brigham Young University (through Professor Mark Davies's online corpora project) makes faster and more efficient searches than directly through Google Books Ngram. However, this interface does not provide access to the complete collection "English 20120701", but only American English (items 4 and 5 above) and British English (items 6 and 7 above). As the option for the American database is almost five times greater than that referring to the British database, we opted for the American database, which corresponds to the "American English (155 billion)" option on the interface BYU from Google Books. This option refers to item 5 above, American English version 20090715 and not to item 4, American English version 2012071. Although the help of the BYU interface does not make it clear which of the two bases it used, The results indicate that this is the base of 2009 and not that of 2012 because the results show only until the 2000s; if the database had been used 2012, the results would show the decade of 2010. The 2009 and 2012 versions are different not only in relation to the fact that the 2012 version has three more years of publication, but mainly due to the fact that, of 2012 has considerably increased the number of publications and consequently of words from previous years, as shown in the table below. For example, in the 2009 version, there were just over 3 billion words indexed relating to the year 2005; in the 2012 version, that number increased by more than three times, for more than 10 billion words. The same occurred, in different degrees, in relation to the other years. In the "Total" line, totals from 1810 to 2009 appear. As noted, the base size doubled in size between the 2009 version and the version of 2012. Table 1 shown demonstrates the main comparison between words seems on Google Books and N-grams related to it.
According to Table 1, it is important to note that the names of the databases on the Google website Books N-Gram and the BYU interface reflect the place of publication and not exactly the variety of English. Thus, the publications listed in the "American English" option are not necessarily written in American English, nor do Native American English authors as well as British English write them authors do not reflect purely British English nor British authors. What these bases represent, in fact, they are publications that have been indexed as having been published in USA or Britain. This indexing itself is based on the data recorded by Google Books from the libraries in which the publications were automatically scanned and may contain inaccuracies, as there is no information if the data had a later manual check. Furthermore, even if the registration bibliographic database is reliable, a book published in Great Britain may have been written by an American author and vice versa, putting in question the representativeness of the text as being an example of one of these variants. Thus, we will not speak in terms of "American English" as being the variant studied in this research, but only as "English language". In addition, on the other hand, each of the terms was analyzed using the steps below, which are described below: 1) Search for the term in Brigham's Google Books N-Gram Database interface Young University immediately preceded by an adjective. We designate the results of this search, in the thesis, with the acronym (adj +). In the development of search, searches were also made with nouns and verbs, but these forms were not incorporated in the final version of the thesis because they scope of the research.
2) Search for the term in Google Books N-Gram Viewer, to be able to view. Its occurrence and distribution over the studied period . When relevant, the resulting graph was saved and incorporated into the thesis.
3) Calculation of the normalized frequency of those placed (the adjective associated with search term). 4) Semantic labeling of those placed, using the USAS University of Lancaster. This labeling was used for a first classification of the placed, in order to help the visualization of the semantics of the placed terms. This instrument served as support for the analysis qualitative representation of data. It is necessary to emphasize that semantic tagging does not automatically indicate representations; The identification of representations was made in a qualitative way, using some label categories, but not limited to them. 5) Analysis of the temporal variation of the placed. In this stage, those whose frequency increased and decreased most among the first 50 years (i.e. 1810-1850) and the last 50 years (i.e. 1960-2000) understood by the data. Also identified were those who did not occur in the first 50 years and that had existed in the last 50 years (that is, those that did not necessarily appear in the last 50 years, but which started in 1860). 6) Analysis of the valuation of the placed. In this last stage, those placed were scored on a rating scale, that is, by means of a number that represents its positive or negative charge. Hamilton et al. (2016a) tool has been employed to reach such aim.

Results of the Research
This section reports on the results of this research in order to answer the research questions and the final answer to the questions are described in the conclusions section. Further tables, graphs and results can be verified in the full version of the study. Thus, follows the questions to be answered and early settings below: 1) What representations can be identified in relation to the terms surveyed? 2) Is there a difference between the representations of the masculine and feminine terms? Moreover, among children, teenagers and adults?
3) There is a difference between the terms in relation to valuation (positive charge and negative)?
The terms searched were adolescent (s), adult (s), boy (s), child/children, elderly, girl (s), kid (s), man/men, teen (s), teenager (s), woman/women. Below you will be able to find the setting section containing all the frequencies regarding terms use.

Description of Results
In the setting section we want to explain the most frequent semantic category found as well as some other key ones on the study -and here it is found each one of them: A, relative to GENERAL & ABSTRACT TERMS, with 18 placed as "average", "defective", "dependent", "disadvantaged", "exceptional", "good", "hyperactive", "mere", "minor", "natural", "normal", "other", "particular", "perfect", "real", "specific", "true", "typical". The second category is T, on TIME, with 10 placed as "eldest", "modern", "new", "newborn", "old", "older", "oldest", "young", "younger", "youngest". The third category is N, related to NUMBERS & MEASUREMENT, with 10 placed as "additional", "eighth", "fourth", "ninth", "seventh", "single", "tenth", "tiny", "whole". The fourth category is E, relative to EMOTIONAL ACTIONS, STATES & PROCESSES, with 10 placed as "aggressive", "battered", "beloved", "dear", "dearest", "favorite", "happy", "precious", "shy", "unhappy". From these categories, we can suggest the representations of child as the main ones being: 1) evaluative-normativity (beloved, dear, dearest, defective, dependent, exceptional, favorite, good, natural, new, perfect, precious, real, specific, whole); 2) quantification (additional, eighth, eldest, fourth, ninth, seventh, single, tenth); 3) age gradation (minor, newborn, old, older, oldest, young, younger, youngest); 4) behavior (aggressive, happy, shy, unhappy); 5) normativity (average, normal, typical). In short, "child" is a figure that is fundamentally evaluated, counted and measured. Regarding the temporal variation, the numbers indicate that both the increase and the decrease are related to a representation of an evaluative nature. However, the increase is related to two aspects of the gradation of the concept of child: small child and older/youngest child, which practically did not exist at the beginning of XIX century. Find below Table 2 prepared aimed to illustrate results: As shown in Table 2, the increase is truly related to two aspects of the gradation of the concept of child: small child and older or youngest child, which practically did not exist at the beginning of XIX century. Table 3 details for analysis the list of adjective in the right side of the term child (adj +).
As shown in Table 3, it is possible to understand the list of adjectives placed immediately to the right of child (adj +) whose normalized frequency per million more decreased between 1810-1850 and 1960-2000. Table 4 shows the positions that emerged between these two periods. These places indicate the emergence of representations related to health (retarded child, autistic child, disabled child, exceptional child). Therefore, there was a rise in the child's medical representation.
According to Table 4 described, it shows the positions that emerged between these two periods. These places indicate the emergence of representations related to health. It suggests that the concept of "child" has changed from being monolithic Table 3. List of adjectives placed immediately to the right of child (adj +) whose normalized frequency per million more decreased between 1810-1850 and 1960-2000  to somewhat gradual in terms of size and/or age. There was also an increase of the concept of a child related to abortion (unborn child) and physical issues (handicapped child). On the other hand, there was a decrease in the representation of child as an innocent child, like "innocent child", "beloved child", "sweet child". Even "poor child" refers to a value judgment not always related to financial condition. The results reflect those of "Child", insofar as the placements whose frequency has increased comprise gradation (Young child, younger child, older child) and health aspects (handicapped, retarded). However, a racial dimension has also emerged in the representation of "children", which was not apparent as "child" (black children, white children). On the other hand, "children" is a term more willing to be associated with racial aspects than "child" is. At the same time, those who became most rare in the comparison are those focused on the idea of innocence and purity (beloved children, good children, lovely children, innocent children), including religious aspects (spiritual children). Interesting to observe "fatherless children", which had its frequency greatly decreased in the recent period, this suggests that "children" are no longer characterized by the absence of the father. In relation to the placements that appeared in the comparison of the two periods, there is clearly a medical-clinical representation, through "autistic children", "disabled children", "psychotic children", etc. There is also the emergence of a representation of social problems (disadvantaged children).
The following representations seem to be related to "  (single, unmarried). Attention is drawn to the joint representation of physical appearance and evaluation-normativity, for being quite numerous, such as bigrams like "attractive girl", "beautiful girl", "blond/e girl", "thin girl", "tall girl", etc. Therefore, we can suggest that the fundamental representation of "girl" is of a physical-evaluative nature.
These categories indicate the following possible representations: 1) evaluative normativity, 2) physical appearance, 3) age range and 4) ability intellectual. Below it can be found the list of adjectives for the term. Table 7 is describing the immediate terms found on the left of the term girl: According to data shown, the Table 7 has demonstrated the list of adjectives placed immediately to the left of girl (adj +) whose frequency normalized per  Table 8 demonstrates the list of adjectives placed immediately to the right of girl (adj +) whose frequency normalized per million more decreased between 1810-1850 and 1960-2008. As is described, Table 8 details the list of adjectives placed immediately to the right of girl (adj +) whose frequency normalized per million more decreased between 1810-1850 and 1960-2008 and Table 9 shows the list of adjectives placed immediately to the left of boy (adj +) whose normalized frequency per million more grew between 1810-1850 and 1960-2000. As demonstrated, Table 9 shows the list of adjectives placed immediately to the left of boy (adj +) whose normalized frequency per million more grew between 1810-1850 and 1960-2000. As with the other terms discussed above, there is a great deal of evaluative component linked to the concept of "boy", which undergoes judgment based on in physical appearance and intellectual ability.  There is also a component of gradation of this concept, in which the idea of "boy" is dissected in nuances old and size. In relation to the temporal variation, those whose frequency grew the most reflect representations geared to age (old boy, older boy), appearance physical (small boy), which in turn is also related to age, ethnicity (white boy), judgment of value (bad boy), nationality (American boy) and spirituality (Jewish boy). In turn, the representations that have declined the most over time are related to ideals like "brave boy" and "noble boy" as well as value judgments (idle boy) and judgment (lovely boy). Table 10 describe the main results about the adjectives on the term boy: As described, Table 10 shows a list of adjectives placed immediately to the left of boy (adj +) whose frequency normalized per million more decreased between 1810-1850 and 1960-2000. The terms analyzed in relation to adolescence are not marked by gender, because as we saw in the analysis presented above, the gender marking occurs with the junction of the term referring to adolescence to the generic term (e.g. teenage girls). Regarding the temporal variation of the term itself (without participating in the bigrams), the graph ( Figure 1) shows that "teen" and "teens" appeared first; however, searches in the texts revealed that it is not a reference to adolescence, but rather a "ten" (e.g. thirteen). With the sense of "adolescent", the terms emerged in the early twentieth century. The most frequent term is adolescent (s). The graph shows a gradual increase in the use of these terms, differently (with the exception of "teen") in the 20th century. Figure 1 shows the variation mentioned.
The representation of adult life through normativity is quite expressive, being reflected in many bigrams. The gradation of the adult phase is less expressive, unlike adolescence, shown in the sections above, which is quite graded in nuances of time. In fact, the range of representations of adult life through this term is restricted, compared to other terms analyzed here.
Other minority representations are of a political-ideological character, e.g.
"liberal adult" and physical appearance, "obese adult". Regarding the temporal variation, the bigrams that became more frequent in the most recent period (Table 12) point to representations already identified, such as the age gradation (young adult, early adult, mature adult), to the evaluative-normativity (average adult, responsible adult), health (healthy adult) and gender (male adult). Among these, the biggest highlight is "young adult", which had an expressive growth compared to the other bigrams. This indicates the widening of the spectrum of adult life, incorporating a "young" phase, which was not common in the 19th century. There was no decrease in frequency.  findings.
As mentioned, Table 12 brings up the List of adjectives placed immediately to the left of adult (adj +) whose frequency normalized per million more grew between 1810-1850 and 1960-2000.
The main representations reflected in these categories seem to include: 1) evaluative-normativity, e.g. "average woman", "fine woman", "remarkable woman"; 2) spirituality, e.g. "Christian woman", "Jewish woman"; 3) physical appearance, e.g. "attractive woman", "charming woman", "blond woman"; and 4) age group, e.g. "elderly woman", "mature woman", "old woman". Thus, the representation of "woman" seems to be, in general, of a physical-evaluative-spiritual nature. Regarding the temporal variation, the representations that grew the most between the periods compared were the age (young woman, old woman, elderly woman), the ethnic (black woman, white woman), the gestational age (pregnant women ), aesthetics (beautiful woman), and nationality (American woman). In turn, the ones that most fell back in terms of frequency were the evaluative and/or social class representations (poor woman, excellent woman, good woman, and lovely woman), idealized (virtuous woman) and disposition/humor (happy woman, unhappy woman). In other words, there was a decrease in subjective, idealized and humor representations and an increase in age, ethnic and aesthetic representations. Table 13 demonstrates exactly such a perspective in numbers: As I said, Table 13 listed adjectives placed immediately to the left of woman (adj +) whose normalized frequency per million more grew between 1810-1850 and 1960-2000. Table 14 illustrates the adjectives placed to the left.
According to results above, Table 14 lists adjectives placed immediately to the left of woman (adj +) whose frequency normalized per million more decreased between 1810-1850 and 1960-2000.The analysis identified that there were no bigrams that emerged between 1810-1850and 1960-2000   associated with "women" mainly include the following: 1) evaluative-normativity: "good women", "great women", "normal women"; 2) spirituality: "Catholic women", "Christian women", "holy women", "Jewish women", "Muslim women"; 3) age group: "mature women", "old women", "young women"; 4) origin: "foreign women", "immigrant women", "native women"; 5) medical or clinical aspects: "diabetic women", "disabled women", "infertile women"; 6) body or eroticism: "naked women"; 7) temporality: "contemporary women", "modern women"; and 8) gender identification: "bisexual women". In short, the representation of "women" seems to be based on age, body and spirituality. Further details regarding the missing settings can be found in the full study and added in the current manuscript upon demand. With this disconnection, it became necessary to specify, when precise, the marital status. Among the representations that have become rarer are old age (old women), idealization (virtuous women), evaluation (helpless women), spirituality (holy women), appearance (fair women), and origin (Turkish) women, roman women). Below is the illustration of the main findings seen in Table 15.
As mentioned, Table 18 shows a list of adjectives placed immediately to the left of man (adj +) whose normalized frequency per million more decreased between 1810-1850 and 1960-2000. The plural form of the term analysis can be found in the thesis of the current research. In order to accomplish with the number of words allowed in this manuscript, further data can be found in extra attachment added in the submission. In this next analysis, the positions of the female terms were compared with those of the male terms. The feminine terms are girl, girls, woman and women, the masculine ones: boy, boys, man and men. The comparison was made using a script developed by the tutor. There were 120  singular places (i.e., not repeated) in the bigrams compared according to gender. Of these, 31 occurred only in the female bigrams, 51 only in the male ones and 38 occurred in both groups. This indicates that although there is a specialization of those placed, one-third of them do not distinguish between one gender and the other (N = 38, 31.7%). Table 19 also will bring the collocates resulting from the comparison of bigrams based on gender. As I said, Table 19 highlights the collocates resulting from the comparison of bigrams based on gender. The results indicate that those placed in the female terms mainly reflect location (foreign, immigrant, local, native, rural, urban), physical condition (attractive, beautiful, fair, obese, diabetic, disabled), reproduction (childless, infertile, pregnant), virtuosity (respectable, virtuous), spirituality (catholic, Muslim), marital status (married, widowed) and gender identification (bisexual, lesbian). In turn, the male bigram placements reflect a large component of representation of superiority/success, with those placed as ambitious, best, brave, civilized, distinguished, eminent, greatest, honest, illustrious, important, influential, mighty, powerful, principal, reasonable, remarkable, sensible, thoughtful, true, wise, wisest and worthy. There is also a representation of the male through occupation (literary, medical, military, public, scientific), financial success (rich, richest, wealthy), physical appearance (big, red), gender identification (gay), kindness (pious), warmongering (armed), among others. As can be seen, there is a very big difference between the representations of the female and male human beings in the data. The feminine is generally represented from the point of view of physical appearance, its origin/location, reproductive (in-) capacity, idealized virtuosity, spirituality.
linked to physical or intellectual work. At this stage of the analysis, the terms related to childhood, those related to adolescence and those related to adult life were compared. The terms related to childhood are boy, boys, child, children, girl, girls, kid and kids; those related to adolescence are adolescent, adolescents, teen, teenager and teenagers. Finally, those related to adult life are adult, adults, man, men, woman and women. The comparison was made using a script developed by the tutor. A total of 150 places of the bigrams were computed compared according to the age group. Of these, 40 occurred only in the bigrams related to childhood, 43 only in those related to adolescence, 53 only in those related to adulthood and 14 occurred in the three groups. This indicates that there is a marked specialization of those placed, as only a small number (9.3%) occurs in the three categories, unlike bigrams related to gender. Table 20 brings the result of comparison of bigram's based on age groups. As written and fully descrived, Table 20 has brought in the research the result of comparison of bigram's based on age groups. The results of the representation analysis show three dramatically different patterns of the three groups. In childhood, physical representations are predominant (barefoot, beardless, beautiful, bigger, gallant, handsome, large, larger, small, and smaller). There are also representations of behavior (aggressive, happy, idle, merry, nice, mischievous, rough, rude, wanton), own age (oldest, preschool, senior, teenage, youngest) and superiority/virtue (popular, promising, smart, stable). In adolescence, the  (disadvantaged, drunk, drunken, runaway, unemployed, vulnerable), clinics (deaf, depressed, disabled, handicapped, ill, suicidal), gender identification (bisexual, female, lesbian, male) and inferiority (impressionable, inexperienced, troubled). In adulthood, the dominant representation is superiority/success (ambitious, civilized, distinguished, eminent, famous, great, greatest, honest, illustrious, important, influential, mighty, powerful, principal, prominent, reasonable, remarkable, sensible, successful, thoughtful, true, wise, wisest, worthy), followed by representations of occupation (literary, medical, military, professional, public, scientific) and inferiority (desperate, lesser, primitive, strange, wicked). Thus, based on these categories, there seems to be a pattern of representation of the three phases that follows a trajectory that goes from physical appearance, behavior and age classification, in childhood, to issues behavioral, clinical, gender identification and inferiority, in adolescence, for a representation of success and superiority, in adult life. There is, therefore, strong evidence of an appreciation of childhood and especially adult life, to the detriment of adolescence.

Conclusion
This research aimed at identifying the representations associated with terms that designate as Human Being in English Language, from the use of Google Books NGrams database online available, covering a period ranging from beginning of the 19th century to the beginning of the 21st century. A total of twenty terms were investigated, divided between terms related to childhood, feminine (girl, girls), male (boy, boys) and unmarked by gender (child, children, kid, kids); terms related to adolescence (all not marked by gender, adolescent, adolescents, teen, teens, teenager, teenagers) and adulthood, female (woman, women), masculine (man, men) and not marked by gender (adult, adults). The research questions asked were: 1) what representations can be identified in relation to the terms surveyed. 2) Is there a difference between the representations of the masculine and feminine terms? Moreover, among children, teenagers and adults? 3) There is a difference between the terms in relation to valuation (positive charge and negative)? In order to answer the research questions, tools were used computationally, availably on the network and developed specifically for this research. In addition, an interpretative-qualitative analysis of the data was carried out, because no tool is able to extract representations automatically from satisfactory way. Regarding the first research question, the results showed a wide range of representations (more than 30). The evaluative representation of being human was the most constant among the terms, performed through adjectives as "Bad", "difficult", "favorite", "fine", "good", lovely, etc. Another representation quite common is that of superiority, a more pronounced type of the evaluative carried out through adjectives like "best", "greatest", "wisest", etc. Physical representations were also very frequent ("attractive", "beautiful", "gallant", "handsome", etc.). The age grading also proved to be a present representation ("earliest", "Early", "later", "old", "older", etc.). Clinical conditions have also been shown to frequent representation ("autistic", "crazy", "deaf", "depressed", "ill", etc.) as well as types of behavior ("aggressive", "angry", "bored", etc.), social issues (e.g. "Battered", "disadvantaged", "homeless", "unemployed", etc.), inferiority (e.g. "Desperate", "inexperienced", "lesser", "primitive", etc.) and gender identification (e.g. "Bisexual", "heterosexual", "homosexual", etc. the singular and plural forms of the term. For example, gender identification is most conveyed through the form plural than the "adolescent" singular. This confirms once again the finding of Corpus Linguistics that the language in use avoids true synonyms, as each form tends to assume a different role. As a possible reply to the second research question, the analysis showed that there are marked differences between representations from both a historical point of view and age and gender. From a historical point of view, there has been an increase identified in the representations of stages of life recognized as "elastic" categories, which can be graduated by means of adjectives like "young", "old", "senior" and even "adolescent". Furthermore, in comparison to the beginning of the 19th century, there was an increase in representations of gender identification feature that goes beyond the binary classification, through "Homosexual", "lesbian", "bisexual" and related adjectives. There was also an increase in ethnic-racial representations, such as "black" and a decrease in others as "colored". From the age point of view, marked differences were identified: childhood, is marked by physical representations (barefoot, beardless, beautiful), of behavior (aggressive, happy, idle, merry, etc.), age (oldest, preschool, senior, etc.) and superiority/virtue (popular, promising, smart, etc. life, passing from a figure classified by their physical appearance, age nuances and potential, to a generally conflicted figure, marked by medical issues, of identification and behavior, ending in a figure again classified by his physical attributes, but seen through their occupation and success. Finally, in relation to the third research question, the analysis showed great variation between terms. Regarding gender, the results suggest that the terms generic male (man and men) tend to have a more positive valuation than all female correspondents do do. At the same time, feminine terms are more positively represented than the males who designate youth (boy and boys). Regarding age progression, the analysis suggests, in general, that the adult life is more positively represented than childhood, which in turn is more positively represented than adolescence. However, the disparity between the terms of each group: between the terms "adults", "Man" is the best rated, but "adults" is four positions (15 out of 20) from the last ranking position. The same occurred with the other groups, to a greater or lesser extent degree. In all cases, however, the best-valued item tended to be singular, while the worst valued, in the plural. The research presented here has limitations, such as the impossibility of deal directly with texts from Google Books.
The data used, provided by Google, are restricted to lists of bigrams, which include the year of publication and the number of occurrences that year. The text itself is not available. Per this, it was not possible to raise the textual occurrences and make the analysis from them. This limited the analysis to bigrams, instead of the entire text of the works. Another limitation is that some terms are not exclusively age or gender indicators, as "man", which is used to identify the human being in general (male and female), and "boy" and "girl", which can be used to refer to people of various ages, figuratively. The survey offered, for the first time, an overview of how life is historically represented in the English language, with respect to its phases and differentiation between genders. The research showed the impossibility of broad generalizations: each term has its own range of representations, which distinguish it from other terms, even those closest conceptually or morphologically. The language in use resists broad generalizations: there are many nuances between terms. The generalization we can make, based on the results, is that the passage of life is marked by a constant classification of the human being in terms of a finite set of representations, with notably evaluative and normative bias. Moreover, that passage is marked historically, with marked temporal changes in the last 200 years.