Trends in Biodiversity Research — A Bibliometric Assessment

Research on biodiversity has grown considerably during the last decades. The present study applies bibliometric methods to evaluate efforts in this field of study. We retrieved roughly 69,000 bibliographic records from the Web of Science database that matched the word biodiversity (and derivatives) in keywords, title or abstract. Article contributions and number of involved authors and journals increased exceptionally fast since the 1980s, when the term biodiversity was coined. But since the year 2008, a decelerated growth rate leads to an average rate of knowledge generation. Using the frequency of terms extracted from publication titles, we inferred that the community-level focus has increased in biodiversity studies, while molecular biodiversity is still not strongly represented. Climate-related topics are rapidly gaining importance in biodiversity research. The geographical imbalance between allocation of research efforts and distribution of biological diversity is apparent.


Introduction
Massive human-induced species extinctions [1] [2] and habitat deterioration [3] have led, in the last decades, to the emergence of biodiversity research as a wide interdisciplinary field [4] [5].The portmanteau word biodiversity was introduced into biology in 1986 by Walter G. Rosen, during the preparation of a conference on biological diversity [6].Its use was promoted further with the Convention on Biological Diversity being signed in 1992 [7]. Biodiversity is used to refer to the plurality of life in every possible respect [8], usually regarding the diversity of species (within and between), of ecosystems, genetic diversity, etc.
Bibliometrics applies quantitative methods to analyze academic publications as an information process, using the identified patterns and dynamics in scientific publication efforts as a proxy for the development of the analyzed discipline [9]- [11].
The present bibliometric study analyzes the development of biodiversity research.We are familiar with two articles focusing on global, taxon-independent bibliometric analysis of biodiversity [5] [12].These date from the year 2008 [5] (considering data up to 2004) and 2011 [12] (considering data up to 2009), respectively.Considering the fast-evolving field of biodiversity, the relatively "early" study of Hendriks and Duarte [5] could analyze only a fifth of the data that we retrieved using almost the same search criteria.The publication by Liu and colleagues [12] works with a larger dataset (~76,000 records).However, the composition of this dataset varies considerably from ours.While we collected all bibliographic records for biodiversity and the word's derivatives (biodivers*), Liu et al. added five more terms-subsets of biodiversity (genetic-, species-, landscape diversity etc.).They also added all of the papers published in six selected journals specializing in the field.In our opinion, the latter introduces a bias into the analysis.And while an approach of using a wider array of search terms can be helpful depending on the target of the bibliometric analysis, this was not an option for our purpose, as the danger of not touching absolutely all facets of biodiversity would over-represent the chosen additional search terms.Therefore, our approach was to influence the dataset as little as possible thematically to avoid possible constraints in quantitatively evaluating the scientific orientation of research on "biodiversity".
This was crucial for the special focus of the present study, which lies on the analysis of frequently occurring words in titles of biodiversity publications.Apart from this, the core bibliometric questions are addressed: development of publication number, differential journal contributions, authors, co-authorships and citations.

Methods
A dataset containing bibliographic records for biodiversity-oriented journal articles (99.6%) and series articles (0.4%) was compiled using the Web of Science (WoS) vers.5.13.1 citation indices by Thomson Reuters [13].We conducted the search in all Web of Science databases in February 2014 and used as search string biodivers* OR bio-divers*, querying the WoS categories Title, Abstract, Author Keywords, and Keywords Plus.After deletion of 243 duplicate entries, we obtained 68,799 records, each referring to an individual article.
Using Microsoft Excel 2010, Google Refine vers.2.5 [14] and text editors, we searched the retrieved dataset to determine the number of 1) publications per year, 2) journals involved and their contribution to the field, 3) authors and joined authorships as well as contributions, 4) citations per article and 5) article pages.
In addition, frequently occurring words in titles and abstracts were extracted, grouped by year and counted through a Perl script.For that purpose, we first removed special characters, punctuation etc. from the dataset and defined an extensive blacklist of frequent words with low information content with regard to the purpose of identifying scientifically relevant topics, as for example a, about, absence, absent, across, after, all, among, an, also, although.
For those analyses that considered developments in publication history, we usually excluded records for the years 2013 and 2014 to avoid skew, as Thomson Reuters is still in the process of collecting publications from the previous and current years for WoS.

Number of Publications
We retrieved 68,799 bibliographic records for articles that used the term biodiversity (and derivatives) directly, in title, abstract, or author-defined keywords, or for documents that were classified as biodiversity articles in the Keyword Plus category through the WoS ontology.
These almost 69,000 articles have been published between 1966 and February 2014.The first publication listed in WoS that explicitly mentions biodiversity appeared in 1987: "An urgent need to map biodiversity" by E. O. Wilson [15].This is the fourth publication in terms of publication date in our dataset and the only record for 1987.The two following years score 13 articles each, 1990 contributes 30 and 1991 already 79 articles.For the year 1992, we list more than 200 records, and in 1999 for the first time more than a thousand articles matching our search criteria were published.As the currently last fully updated year in WoS, 2012 contributes 8204 documents, almost 12% of all retrieved records.Figure 1 shows how the records accumulated non-linearly over time.More than half of the studies were published during the last five years.

Journals
The 68,799 articles referenced in our dataset have been published in altogether 3888 journals.Of these, an extremely limited number of journals, around 100 (2.7%), contribute 50% of all articles.The 50 periodicals containing the highest number of articles on biodivers* and bio-divers* are listed in Appendix 1.
Since 2007, there have always been more than 1000 different journals every year publishing biodiversity articles, with a maximum of over 1500 distinct periodicals in the year 2012 (the last fully represented year at the point of manuscript writing).The annual mean growth of the number of biodiversity-focused journals since the year 2000 lies at 11%, but currently decreases.The six journals containing the highest number of articles on biodivers* are plotted in Figure 2. Biodiversity and Conservation, currently still the journal with most articles (1780) on the topic, has existed since 1992.It has published in this field more than 100 publications annually since 2005.PLoS One was created in 2006, and has been very fast in accumulating biodiversity articles (1642; 497 publications in 2012 alone).With a high likelihood, PLoS One will soon be the journal featuring the highest number of articles on biodiversity.Conservation Biology and Science already published documents on biodiversity in 1988.

Authors
A number of 68,602 articles (after removal of 197 anonymous publications) in the dataset has been authored by 124,984 individual workers.Of these, about a third (35%) have authored multiple publications within our dataset.An imaginary 'median author' from our dataset would have published one paper on biodiversity.The most productive author in our dataset in terms of published article number published 176 articles.A list of the 50 most frequent authors in the dataset is given in Appendix 2.
Figure 3 shows the number of distinct authors per year.Since 2003, each year more than 5000 authors publish on biodiversity.New authors are attracted quickly to the field, with an average increase of annual authors of 22 percent since 2000.A maximum of 36,905 authors was reached for 2012.
Usually more than one author per publication is involved in biodiversity studies.The most common "authoring model" includes two authors per article (more than a fifth of all cases: 14,536 articles), closely followed by three authors per article (13,409).Single-authored (10,567) studies and those written by four (10,560) workers are almost equally represented.Together with publications by five joint authors (7032), these five models of authorship (1 -5 authors) make up more than 80% of the total referenced literature.
Figure 4 shows the number of average co-authorships occurring each year, which rises from 1.5 authors in 1988 to 4.7 in 2012, in almost linear form.This stands in contrast to earlier findings which observed a stagnating number of co-authors [5].As our figures have been obtained by dividing the total number of authors for a given year through the total number of publications in that year, one might argue that individual publications with exceptionally high numbers of co-authors might skew this estimator.For example, the publication in our dataset    Therefore we looked at the median number of authors: it starts with 1 (1966-1993) and increases through 2 (1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002) and 3 (2003-2009) until 4 (2010-2014).For the entire dataset, the co-author median lies at three.

Number of Pages
We evaluated the article length in terms of pages for 63,289 articles, after removal of 5510 publications with missing or ambiguous page number information.Figure 5 shows the average number of pages per publication over the period 1977 to 2014.Together, the publications include 827,895 pages, with a median of nine pages per publication.After an initially lower number of pages per article, the page number increased in the late 80s and early 90s (with the prevalence of empirical studies vs. a heavier initial focus on political questions?).Since then, also with an increasing statistical consolidation, average page numbers per year have been continuously oscillating around a value close to ten pages.

Citations
Appendix 3 lists the 50 most frequently cited publications as identified by WoS until February 2014.The mostcited article on biodivers* so far, with 5800 citations, is "Biodiversity hotspots for conservation priorities" by Myers et al. (2000) in Nature.This publication is followed by three studies with between 2000 and 3000 citations each (published in 1997, 2000 and 2004) and a group of 30 publications with a citation score between 1000 and 3000 citations, the youngest of these issued in the year 2009.

Most Frequently Used Meaningful Words in Publication Titles
From all 60,433 titles present in the dataset (until 2012), we extracted the most common "meaningful" words, i.e. containing to a higher or lesser degree evidence on scientific content of the associated article.Table 1 details the 50 most common of these terms, along with their development over time.The development of the "top ten" terms is shown graphically and for individual years in Figure 6.
The search term for generating this study's dataset-biodiversity and derivatives-constitutes the most common term in article titles with overall roughly 9900 hits.It is followed by diversity and derivatives (~8300) and species (~7500).Forest, community and conservation (and respective derivatives) each scored between 5000 and 6000 hits.While biodiversity had substantially more hits per year in comparison with other terms in early years (until around 2003), the increase rates of the other common terms caught up (and partly are growing faster).Especially noticeable increase or decrease in growth rates have been noted for some of the analyzed terms.Increase: bacterial, Brazil, China, climate (steep increase), community, fish, water.Decrease: conservation, ecology, landscape, populations, richness (Liu et al., however, observed an increase of use for species richness [12] as of 2009), structure, genetic, sea.
The dataset was also partly investigated beyond the 50 most common terms.Figure 7 shows tendencies for pooled terms from connotation groups we considered interesting: a comparison of aquatic vs. terrestrial-associated title terms, of animals vs. plants and added to this a curve for title terms indicating molecular biodiversity    research.Terrestrial studies (as derived from title word hits) prevail over aquatic in terms of numbers, but not in terms of increase rate.Molecular biodiversity publications are increasing (especially 2012 could indicate an incipient steepening of the slope), but growth is moderate.Plant studies on biodiversity by far outcompete animal studies in terms of total hits and of growth rate (but see [16] on prevalence of animal studies in Colombian biodiversity research).However, it has to be kept in mind that the search is based on very generic terms and should in principle be conducted using a taxonomic thesaurus.Table 2 lists the pooled hits for different continents, as obtained from hits for individual countries out of the 1000 most frequent title words in our database.The country names mentioned in titles suggest a strong focus on Asian biodiversity (very roughly double hits than for South America, Europe, or North America).The focus on Africa, especially in relation to the continent's size, seems disproportionally small.Table 2. Occurrences of most frequently mentioned country names, pooled for continents.Terms were obtained from a list of the 1000 most frequent title words in our database (only nouns considered, no narrower or wider geographic terms, e.g.Africa, Caribbean, England, Ghats).Individual countries: Africa (Kenya, Madagascar, Tanzania), Asia (China, India, Japan, Indonesia, Philippines, Thailand, Turkey), Europe (Finland, France, Germany, Italy, Norway, Poland, Portugal, Spain, Sweden), North America (Canada, Costa Rica, Mexico, USA), Oceania (Australia, New Zealand), South America (Argentina, Brazil, Chile, Colombia, Ecuador).terrestrial focus (Hendriks and Duarte [5] had noticed a strong focus on terrestrial biota), a list of 17 terms was compiled for each of the two connotation groups and subjected to pooled searches: aquatic, basin, benthic, estuary, freshwater, hydrolog*, lagoon, lake, limnic, marine, ocean, plankton, pond, river, sea, water, watershed vs. alpine, canop*, continent, desert, forest, grassland, hill, land, lowland, meadow, mountain, plane, prairie, savanna, steppe, terrestrial, wood.Some of these terms are not proprietary to one of the groups (e.g.meadow, forest, basin), but have been assigned to the respective group with assumedly much higher use frequency.

Most Frequently Used Meaningful Words in Abstract
Out of 55,950 collected abstracts (until 2012), the word species is by far the most commonly used with 164,712 hits, more than twice as much as the next most frequent word complexes diversity/diverse or the search term for this study's dataset generation: biodiversity/biodiverse.This relation is also obvious from Figure 8, which illustrates the development of the 10 most used terms in scientific abstracts since 1988.For 2012, species scored 19,529 hits, while diversity/diverse had 7104.The curve indicating use of the term species is much steeper than those of all other nine terms, which show overall similar increase rates.

Discussion
The present bibliometric study analyzes articles containing biodiversity (or derivatives of the word), collected from the WoS databases.How representative can such a dataset be?Of course, not all of biodiversity research feature biodiversity as a keyword or mention the word in title or abstract.Also, WoS obviously does not rank all biodiversity-relevant journals.However, we assume that the large dataset we retrieved holds a representative number of samples to mirror the tendencies a hypothetical complete dataset would deliver, while avoiding the danger of including false positive hits for biodiversity research.The results of Hendriks and Duarte [5], who compared their data with a manually screened reference dataset, corroborate this assumption.But arguably the focus of global biodiversity research could be more species-oriented than presented here, as genetically and ecology-oriented journals would seem to be ranked by WoS with a higher probability.On the other side, WoS is reported to miss also an increasing proportion of new publication channels (e.g.conference proceedings or open archives) [17] [18], the influence of which on the data can only be hypothesized.
In the published literature (as logged in WoS) biodiversity appears for the first time in 1987.Multiple publications on the topic are issued for the first time in 1988.In the following years, a strong, exponential increase in publication rate took place.However, since 2008, this increase rate is dropping (14% in 2010; 10% in 2011; 6% in 2012), leading to a rate which lies currently around a standard growth level of scientific literature, below 10% per year [19] [17] [20].
This finding from publication output (see Figure 1 and Figure 9) is mirrored in the curve for the number of authors that are active in the field of biodiversity research each year (Figure 2 and Figure 9): in 2010, the number of workers grew by 19 % in comparison to 2009.In 2011, it increased by 15%, and finally in 2012, only by 10%.For 2013 (not complete yet in WoS), the increase is currently still at 0%, so for this year, an increase below 10 % is likely.
Also for the number of journals publishing biodiversity research each year, a more moderate growth rate becomes obvious, with annual growth rates of 7% in 2010; 6% in 2011 and 2% in 2012 (Figure 9).This tendency, i.e. the potential normalization of global biodiversity research growth, down to an average increase rate of scientific output, has to our knowledge not been shown before in the bibliometric literature.It is as yet difficult to extrapolate if this "micro-trend" will persist, as the time frame serving as evidence for our findings is limited to four years.The development of biodiversity literature should be followed carefully.However, before looking for extrinsic causes, one should keep in mind that a possible reason for putative growth deceleration could be coupled with potential incipient usage saturation for the term biodiversity and with a terminological shift.The fact that Google Trends [21], since 2008, shows neither increase nor decrease in interest in the term biodiversity favors this explanation.
At least in the analyzed title words, the conspicuous initial growth rate of biodivers* is surpassed by that of other terms after the first roughly 10 years (in terms of increase rate; in terms of uses per year: after almost 20 years; see Figure 6).This could be interpreted as demonstrating the normalization of use for biodiversity as a buzzword.It probably also expresses the consolidation of biodiversity discipline(s) and a stronger focus on empirical work which does not mention the term in all titles (but rather in the keywords or abstracts).
Moving away from the term biodiversity, Figure 6 also shows another interesting result: the absolute use of conservation decreased noticeably in last analyzed year 2012.This is especially relevant as three out of four of the journals with most hits for the search term biodivers* bear the word conservation in their title.Only one of these journals shows a very marked downward trend for biodivers* occurrences since 2010.Curiously, a publication released in 2011 [22] pointed out the increasing use of the term conservation (for whales and relatives).
In contrast to conservation, the term community (and derivatives) rose noticeably in relation to other title words during the last years.While hits for species are still rising rapidly, the raising community matches indicate an increasing parallel focus on biodiversity research above species level (which [5] considered underrepresented until 2005).Within the abstracts, however, the word species occurred at least three times as often as any other term, probably through the necessity to express concepts elaborated on in the corresponding paper through multiple mention of the word (often also in publications above species-level, which will mention species in the abstract although they may very often not bear it in the title).Also for the word climate, use frequency increased steeply, giving evidence for the augmenting interactions of climatology with biodiversity research (possibly correlated with the later reports of the Intergovernmental Panel on Climate Change).
Molecular aspects of biodiversity research (Figure 7) still remain limited so far (see [5]), especially when considering the advent of a wide array of fundamentally new molecular technologies during the last years.However, the last analyzed year could show the beginning of a steeper increase for molecular biodiversity studies.
Liu and colleagues [12] identified the USA as the single most productive country for biodiversity articles, followed, among the "top 10", by several European countries, Australia, Canada, China and Brazil.A few years earlier, Hendriks and Duarte [5] ascertained that USA and EU were conducting nearly 90% of all biodiversity research.Using the title words extracted from our dataset, we roughly complemented the sociological dimension of the mentioned bibliometric analyses with data on the geographic areas that were targeted by the researchers (see Table 2).The country names mentioned in the dataset's titles suggest a very strong focus on Asian biodiversity, followed by South America, Europe, North America, Oceania, and finally Africa.The focus on Africa, especially in relation to the continent's size, seems disproportionally small.From the presented numbers, the partial geographic imbalance between allocation of research efforts and actual distribution of biological diversity becomes obvious.
Not only in under-studied areas, but also in threatened habitats and for the understanding of biodiversity, much research is still necessary.Frequent monitoring of biodiversity research, also on focused (sub) topics or geographic areas, can be a helpful instrument for effective management of and research on biodiversity."Biological diversity is a global asset of incalculable value to present and future generations" (K.Annan, cited in [23]).

Figure 1 .
Figure 1.Number of publications per year.While the first decade of this century saw an average annual increase of 19% in publication output, the second decade (2011 and 2012) started with a mean growth rate of 8%, as indicated by the terminal flattening of the curve in Figure 1.

Figure 3 .
Figure 3. Number of authors per year.

Figure 4 .
Figure 4. Average number of co-authors per publication.Note: we use the term "co-author" without differentiating between first author and associated author/s.
co-authorsYearwith the highest number of authors was produced by 81 workers.

Figure 5 .
Figure 5. Average number of pages per publication per year.

Figure 6 .
Figure 6.The ten most frequently used "meaningful" words in titles over the years 1987 to 2012.

Figure 7 .
Figure 7. Selected terms from the retrieved titles until 2012.*: To roughly assess aquatic vs.terrestrial focus (Hendriks and Duarte[5] had noticed a strong focus on terrestrial biota), a list of 17 terms was compiled for each of the two connotation groups and subjected to pooled searches: aquatic, basin, benthic, estuary, freshwater, hydrolog*, lagoon, lake, limnic, marine, ocean, plankton, pond, river, sea, water, watershed vs. alpine, canop*, continent, desert, forest, grassland, hill, land, lowland, meadow, mountain, plane, prairie, savanna, steppe, terrestrial, wood.Some of these terms are not proprietary to one of the groups (e.g.meadow, forest, basin), but have been assigned to the respective group with assumedly much higher use frequency.

Figure 8 .
Figure 8. Development of the ten most frequently used "meaningful" words in the collected abstracts.

Figure 9 .
Figure 9. Annual growth (compared to resp.previous year) of number of publications, authors and journals.

Table 1 .
Most frequent 50 terms and occurrences, collected from the abstracts in the dataset until 2012.*: For 2011 and 2012, the number of hits was normalized to allow comparability in five-year-units, underlaying (conservatively) a linear growth; original hit numbers are given in parentheses.