The Consistency Measurement in Document Publications and Citation Using t -Index

Recent time handling uncertainty and its measurement is considered as one of the major issues by data science and applied mathematics researchers. It becomes more complex when the dynamicity exists in data sets. One of the suitable examples is Scopus data sets which changes every time. In this case, precise measurement of consistency in document and citation publications is considered as one of the issues. It becomes more complex when the parame-ter like h-index and document count can be also manipulated over the period of time. To resolve this issue, a time-based index called as “t-index” is illustrated in this paper with an example. This method measures the randomness in document publication and citation using the average h-index and its entropy measurement.


Introduction
Recent time, the research performance analysis of any researchers or institute has become crucial tasks. It became more complex when this analysis depends on large data base like Scopus. The parameters like h-index somehow try to fix this issue [1]. However several limitations found in h-index later in case of multiple co-authors or large number of documents publication h-index can be manipulated [2]. It provided way for other metrics like i10-index [3], g-index [4], e-index [5], s-index [6], as well as other metrics [7] [8] [9] [10]. Recently, these metric was analyzed using the Scopus data set [11] [12] [13] and found that they affect impact factor in significant ways [14] [15]. In this process a problem arises while measuring the consistency of any institute beyond the document publications and citation. The reason is every institute has fix number of authors, lab and infrastructure which can produce certain results in given period of time for the publications. However some of the institute tried to manipulate the system to earn more money or continue education as business rather than research. To achieve this goal, the paper is published as multiple authors, co-author name without having any expertise. In this case problem arises while investigation of founding institute or author for the given concept in case of technical paper. The review papers also create issue as the review paper gets more citation which distributed equally among them. Due to which, problem arises in analysis of same h-index institute or author or low h-index or higher h-index institute. It is reported that many authors have less h-index and citation but received Nobel Prize.
Hence the precise measurement of consistency is major issues by research communities. This paper focused on controlling this issue using average h-index and its entropy measurement.
To measure the randomness in document publications and citation entropy theory [16] is used in this paper. One of the reason is this theory is considered as one of the effective methods for randomness and uncertainty analysis [17] [18]. This paper tried to connect the entropy theory to measure the uncertainty and randomness in document publications and citations by monkey and ghost researchers [19]. These types of researchers tried to manipulate the system using co-authors [20]. It can be observed via number of papers and co-authors. These types of authors have much number of co-authors but less number of papers. As reflected in Scopus. Same time they have more than 100 distinct areas of expertise which seems impossible. Some time they tried to increase the citation in dynamic way via their organized conference. It can be observed by document publication and its citation trend based on time basis [21]. One of the examples is these authors may publish more than 200 papers in Scopus per year. It means per week day almost a paper which looks infeasible. These become more complex when posthumous, honorable authors name is added to get document count and citation. One of the reasons for this type of acts is that every author used to get same document count, citation and h-index which impact their intellectual measurement [22] [23]. It becomes more crucial when the papers are retracted from the Scopus. The problem arises with its document count and citation while intellectual measurement. Hence the impact of work is more necessary than impact of journal for intellectual measurement. This issue become more complex while analyzing the current research trends or domain based expert for the multi-decision process to stop brain drain [24]. These things happened because the quality of document publications and citation is matter of Turiyam [25]. The reason is document publications and citation used to increase or decrease based on domain rather than technicality of papers [26] [27] [28]. It also depends on types of papers like review paper get more citation than technical papers and domains wise [26]. Hence the consistency of work should be measured rather than impact of journal, citation, document count, or h-index [29]. It becomes more crucial when the papers are retracted from the Scopus [30]. These studies motivated the author to introduce a method based on Shannon entropy and average h-index based on time. The objective is find some alternative way to measure the randomness in document publications and citation as shown in Figure 1.
One of the significant outcomes of the proposed method is that it provides a way to characterize the consistent and inconsistent performance of any institutes.
Remaining part of the paper is structured as follows: Section 2 provides preliminaries about h-index and other metrics related to this paper. Section 3 contains the proposed method with its illustration in Section 4 followed by conclusions, acknowledgement and references.

Preliminaries
In this section, some of the related metric for the t-index is explained for better understanding: Definition 1: (h-index) [1]: It is defined as, the n research paper of an author has more than n-number of citations which can be investigated using the algorithm shown in Table 1. The limitation of this index arises when multiple co-authors arises. Same time highly cited paper after sometime become irrelevant. It means the h-index does not provide precise analysis based on time based citation analysis and its influence measurement. To resolve this issue mock h-index is introduced.
Definition 2: (Mock h-Index) [9]: It is introduced to measure the quantity which is statistically similar to h-index and has dimensions same as h-index.
It can be observed that this index also does not provide any analysis based on time based on its randomness measurement.
Definition 3: (m-quotient) [7]: It is defined as m ≈ h/n where n is the number of years passed since the first publication of the author. This indexing somehow tries to fix the large citation. However the small change in h-index affects the large changes in m-index. Same time this indexing unable to measure the randomness in citation. To deal with it Shannon entropy is considered as useful [6]. This paper focused on Shannon entropy to introduce it for measuring the randomness in citation.
Definition 4: (Entropy) [16]: It measures the randomness or uncertainty in the given data set as average information content based on uniformity of a distribution as follows: where P is the probability distribution function of the random variable x i . Recently, it is applied for uncertainty measurement for data analysis. This paper focused on measuring the randomness in citation based on time window. To achieve this goal, a method is proposed in the next section.

Proposed Method (t-Index)
In this section a method is proposed to measure the randomness in citation and its measurement using the entropy theory. Let us suppose, an author received c i number of citations in N years, for the paper published in the i th year of his research career. In this case, the entropy can be computed as: , and t C is the number of total citations received by the author. Although entropy characterizes the uniformity of the distribution, but we need to normalize its value to make it comparable across different distributions, for which we divide it by the factor: where N is the number of years in the academic career of a researcher, and is characterized by the difference in years between first publication and the last publication of the author. Since ≤ , so the value of T is very small, we scale it up by using inverse of logarithm, that is the natural exponential function. Thus, we have a quantity that gives us the measure of uniformity in the yearly distribution of citation, i.e.,

T T u e ′ =
It can also be interpreted as research consistency of an individual over the years. Now it can be refined using the time frame as follows: Where, 4 is an arbitrary constant of choice. It is used to scale the value of t which can be changed based on user requirement. The reason is most of the time expert wanted to measure the performance based on last 3 to 4 years. In this way any one can evaluate distinct t 1 , t 2 for the distinct time frame in which two cases arises as: i) t 1 = t 2 : It means the performance of chosen authors or institute is consistent. ii) t 1 > t 2 or vice versa: It means the individual performance is somehow better in t 1 time frame and vice versa. It means the t-index will be higher in case citations for each research paper will be greater than the number of papers published in that year whereas h-index used to be unaffected. In this way, one can easily approximate the lower bound of t-index as zero in case zero publication. The upper bound of t-index can be approximated using the complexity of entropy with n possible values has an upper bound of log n , therefore log T N ≤ as shown in Figure 2. In the next section the proposed method is illustrated using computer science data sets collected for some of the institutes using Scopus. The comparison among t-index and h-index for the same institutes is also given for better understanding.

Illustrations
This paper introduced the measurement of citation using entropy theory and time based h-index using the data set shown in [12]. The data analysis is done using pandas library from Python as discussed detail in [26]. The reason is h-index can be manipulated using multiple co-authors and random citation [29] [30]. To resolve this issue a method is proposed in Section 4 called as t-index. The h-index of some Indian institutes and its t-index computation is shown in Table 1. It can be observed that, t-index is higher even for lower values of h. It means the institutes having consistent performance over the year and does not contain randomness in citation or document publication includes higher t-index which cannot be identified via h-index.  The following information can be extracted from Table 1 and Figure 3. 1) The t-index is higher in case of less randomness and uncertainty in document publications even though h-index is low. It means the t-index measures the consistency among document publications and citation. It does not affected by older and younger issues which happened in case of h-index.
2) It can be observed that IIT Hyderabad and IISC Bangalore have almost equal t-index. It means IIIT Hyderabad is consistent in research estimation as equal to IISC Bangalore in the given span of time. However the IISC Bangalore contains maximum h-index when compared to IIIT Hyderabad.
3) The old universities like BHU, AMU, Mumbai, Madras, or Allahabad university t -index is low even though their h-index is above the average h-index of country. It means these Universities has not worked consistently in the given academic span.
4) It can be observed that the IIT Delhi has less document count but they have highest t-index. It means they are consistent over the period. However IIT Kanpur has less t-index which means the IIT Kanpur is not consistent over the period.
They got some good quality papers in the given period. In similar manner private VIT, Amity and Thapar are not consistent as per t-index whereas the Amrita, SRM and Sathyabama tried to be consistent. In similar way other institutes performance can be analyzed using t-index. The data can be taken from SCOPUS.
5) The proposed method shows that the consistency in research publication in the given period can be measured based on author per publications and its outcome.
In this way, the proposed method able to find the consistent document publication and citation based on lower h-index also as shown in Figure 3. It may help in controlling the brain drain [24]. However it fails to measure the retracted papers citations, multiple authors weight age, undomain papers, posthumous author papers, journal to journal citation, conference to conference citation, within organization citation and its consistency [26] [30]. Hence, the author will focus on solving following problems in near future: 1) Some time the non-indexed paper in Scopus also contains much quality than Scopus. It can be measured by novelty of work or may be citations. In this case the author will focus on measuring those non-indexed papers and its quality for performance measurement in future.
2) Some of the conference paper contains much quality than Journal papers also. In this case precise measurement of conference papers and its content for intellectual measurement is one of the crucial tasks.
3) There are many non-English papers contains much quality in Russian language, Chinese, German, Hebrew, Hindi, Parsi, Sanskrit, Bengali,Tamil and other languages in the world. These papers are not indexed in Scopus which quality and performance measurement is another issue. It means the measuring Linguistics diversity and its indexing in Scopus is another issue for the intellectual measurement rather than monopoly of English.
4) The regional, gender, and other factors for document and citations measurement is distinct issues which need to be addressed. The paper publications from the Scholar from MIT and a Small College of India in same Journal cannot be considered as equal intellectual measurement. However it requires a new metric to measure the regional, gender, or other factors to measure the performance of an individual or institutes.
5) The diversity of citation, awareness about citations and its work, content based citation, influenced citations, and technicality of work measurement is another challenge of the researchers. The reason is review paper received more citations whereas the technical paper may receive less citation. In this case the author of review paper should be considered as more intellectual or technical paper author is another challenge. It requires new metric to characterize the citation based on acceptation, rejection and uncertain regions as it depends on awareness of researchers. 6) Domain wise intellectual measurement of any institute or author is another issue for the researchers. One of the reasons is that the papers publications in mathematics domains are harder than chemistry or biology domain. Same time the number of Journals, number of working researchers, or demands of some domains is lesser when compared to other domains. In this case, the precise measurement of intellectual based on document count ranking of journal or citations is difficult tasks.
7) The impact of funded project, authors and collaboration while measuring the performance and intellectual is another issues. The reason is conflict arises while measuring the founding author or institute of given work.
8) The impact of inconsistency in document publications and citations forced the brain drain. The reason is researchers prefer impact of work rather than im-pact of Journal. This is another issue for the researchers to measure the quality of work rather than journal.
9) The precise measurement of retracted papers and its citation is another concern for the researchers while measuring the intellectual.
10) The impact factor just measures the two years of document publications and citations rather than generalization. It just predicts the current trend rather than quality of work. The reason is citation of paper is based on expert awareness rather than its quality. It is totally authors who cite the base paper of given area or not. Hence the citation is beyond the ranking of journal and its indexing. In this case, an alternative of impact factor and other metric like Altmetrics based performance measurement is another issues.
11) The unwanted citation and its measurement are other issues for the research communities. Some time researchers cite the unwanted papers rather than any founding or base papers. They never give reference to break through results as those papers are old and do not help the current impact factor of journal. Due to which, many researchers cited two year recent papers. The measurement of unwanted citation and its characterization is another issue while intellectual. One of the reasons is that the understanding of founding or breakthrough papers came after the hard work. It is totally based on human Turiyam rather than acceptation of keyword, rejection of keyword or uncertainty. Another issues arises when an author do not want that his/her paper should be cited. He/she wants that people read his method and get inspire for various applications. In this case the intellectual measurement is another difficult task which needs to be addressed.
It is believed that the current paper will be helpful for the research organization, Accreditation, NAAC, NBA and other agencies to measure the consistency and its impact of research.

Conclusion
This paper focused on measuring randomness and uncertainty in document publications and citation using Scopus data sets. To achieve this goal, a method is proposed using hybridization of time based h-index and the Shannon entropy.
It is shown that the proposed method measure the consistency of two or more institutes in the given period unaffected from (low or high) h-index as shown Table 1. In future the author work will focus on introducing some other metric for depth analysis of analyzing the performance of any author institute using Scopus data set.