^{1}

^{*}

^{1}

The motivation of this paper is to show how to use the information from given distributions and to fit distributions in order to confirm models. Our examples are especially for disciplines slightly away from mathematics. One minor result is that standard deviation and mean are at most a more or less good approximation to determine the best Gaussian fit. In our first example we scrutinize the distribution of the intelligence quotient (IQ). Because it is an almost perfect Gaussian distribution and correlated to the parents’ IQ, we conclude with mathematical arguments that IQ is inherited only which is assumed by mainstream psychologists. Our second example is income distributions. The number of rich people is much higher than any Gaussian distribution would allow. We present a new distribution consisting of a Gaussian plus a modified exponential distribution. It fits the fat tail perfectly. It is also suitable to explain the old problem of fat tails in stock returns.

In finance, economics, and many social sciences distributions are important. However, there are two closely connected puzzling items. Firstly, there is an almost dogmatic assumption that there are Gaussian distributions only (with few exceptions). Secondly, there are partly strange methods to prove that something must have a Gaussian distribution. The mathematics of distributions is essentially a product of the 19^{th} century, for an overview consider e.g. [

A typical paper from this time is [^{th} century in finance.

Another reason for the wrongly assumed fat-tail in finance is that stock prices are fluctuating chaotically rather than randomly. The mathematical description of chaos is pretty old, and a modern summary (and application to physics) can be found in [

The name “fat-tail” originates from physics. However, the wrong doings mentioned above are not present there, for a quite recent example see [

Dealing with distributions other than Gaussian does rarely cause problems similar to the one mentioned here. As stated above, especially social sciences assume a Gaussian distribution for almost everything. Only if there is a proof for another distribution, it is not used. As an example consider the exponential distribution mostly applicated to describe queues [

So far for a brief summary of the use of distributions especially in non-mathematical sciences, the purpose of this paper is not to fix the mentioned problems especially when using a Gaussian distribution wrongly or without justification. This is hardly possible. Our goal is to explain the use of distributions in two general situations:

· An experiment or reality shows a strictly known distribution (e.g. Gaussian).

· An experiment or reality shows a strange situation (e.g. Gaussian with fat tail).

To give a complete answer to the two points would mean writing a textbook. Of course, such books have existed for roughly 100 years, see e.g. [

Our first example is the distribution of the intelligence quotient normally referred to as IQ. There are lots of data worldwide, and they have one thing in common: They are perfectly Gaussian (unfortunately without fat tail). From this it is almost trivial to show in chapter 2 that the IQ is inherited or at least not changed by conscious action like training. This gives the first mathematical proof for an old and almost religious argumentation about nature versus nurture.

In chapter 3 we will scrutinize income distributions. We have chosen this example for two reasons. Firstly, it is an often debated subject especially since the best-selling book of Piketty [

As a result we get a narrower distribution for the not-very-rich if the super-rich are allowed to have a wider distribution. In other words, without the super-rich there would be a less equal distribution within the “normal” people.

Fitting the data within subchapter 3.3 is extremely complicated. It shows the frontiers of numerical mathematics. Therefore we are deferring some of the mathematical derivations to chapter 4.

Chapter 5 gives a summary and ideas for further research.

In

Even much more universal than the average IQ in the developed world is the width of the distribution. It is wider for men than women. Again, this is no proof for nature or nurture. However, setting on nurture would mean that there is a universal difference in the education of boys and girls. Though there are differences in education, it would be at least puzzling that this difference is persistent in so many societies.^{1 }

Though this paper is on mathematics rather than psychology, there is meanwhile an agreement under academic psychologists that IQ is inherited. In a review [

Nevertheless, there are hardliners (even in academic psychology) sticking to nurture instead of nature. Because the authors of this paper cannot read minds, we can only speculate about the reason for it. It looks like ideology takes over science. Of course, an inherited IQ has political consequences. It would be much harder to argue for a more equal income within countries and especially between developed countries and the rest of the world where average IQ is partly significantly lower. In order to find counter arguments against the mainstream of academic psychologists it is sometimes said that IQ is a poorly defined measure. One has to say that there are lots of advances ever since Binet and Simon developed the first IQ tests in the early 20^{th} century. There are also broader measures like “fluid intelligence” which include the ability of abstract thinking and problem solving. Furthermore, many other data like the scores of Americas standardized entrance exams for university (e.g. GMAT) correlate nicely to the IQ. So we have extremely many sets of data which are all seeing nature instead of nurture.

Sometimes it is also stated that one can learn how to get a higher score in an IQ test. It would make the entire concept ridiculous. It is even easy to “prove” it by taking an IQ test several times. However, this is nothing but corrupting the system. Taking the same or a very similar math exam many times will also improve results.

Up to now we have just summarized about nurture versus nature in IQ. The motivation for this chapter was to find a clear-cut mathematical proof that IQ or fluid intelligence is (essentially) inherited. However we are not sure whether it will convince the above-mentioned hardliners. To our own surprise it is extremely simple.

If IQ increases by training, it should be identical to learning in the sense of learning curves. Learning curves are used in (industrial) engineering and especially production management. Maybe first noted in [

t − α (1)

where α is a critical positive exponent. Purely numerical it fits most situations fairly well. Though widely used, it is wrong. Equation (1) is a result of a random walk [

Instead of Equation (1), learning takes the following form:

e − t / τ − 1 (2)

where τ is a typical learning time. If IQ is essentially acquired by “learning,” one would have the same picture in the IQ distribution. Proportionally to how much IQ points you have already gained, it will be more and more difficult to get an additional IQ point. We have a differential equation of the form I Q ˙ ∝ − IQ leading to an exponential distribution of IQ:

p ( IQ ) = λ ⋅ e − λ ⋅ ( IQ − IQ 0 ) with IQ ≥ IQ 0 (3)

where λ is a parameter determining the quality of the education program. A small λ means intense education for everybody and a big λ means no education. IQ_{0} is the IQ at birth which may have a very narrow Gaussian distribution. The plot in

The difference between an inherited distribution of IQ (_{0}. There should not be an IQ below it. Unfortunately this is in contrast to all observations of human IQ.

Some may say that there is both: nurture and nature. Of course there may be a Gaussian distribution of IQ_{0} and some IQ achievements due to education. However the decision whether nature or nurture has (by far) the upper hand is easy. Does our measured IQ distribution look more like

The above-mentioned hardliners may say that the skills to educate (i.e. λ) is Gaussian distributed within the parents. But this would still not lead to a Gaussian distribution. Though the distribution for high IQ values would look almost Gaussian, most strikingly this new distribution would still be very asymmetric and never show IQ < IQ 0 values in contrast to observations in the real world. Of course it is possible to destroy IQ either by physical or mental injury. As an example of the latter one, one may consider the tragic figure Kaspar Hauser^{2} living in Germany around Nuremberg from about 1812 till 1833. However, there is never so much abuse to explain symmetry.

If it is possible to increase IQ (massively) by education or training, such methods could be applied to a centralized child education. It should have led to a massive increase in IQ in systems like the Soviet Union or mainland China.

All the above clearly states that IQ cannot be improved. It is a result of a random mix of the genes of the two parents. Due to the central limit theorem such randomness leads to a Gaussian distribution. It is very hard to imagine that any other mechanism creates a Gaussian distribution. Actually, mankind is not able to create (complete) randomness by e.g. computer programs. Astrophysicists are sometimes in need to scrutinize signals for exact randomness. They still have to rely on natural sources like radioactive decay in order to have a precise reference. Therefore the Gaussian distribution proves randomness and no conscious actions.

Some may argue now that the IQ is not inherited but a result of randomness having nothing to do with nature or nurture. It would also lead to a perfect Gaussian distribution. However, there are correlations between the parents’ and offspring’s IQ. These can only be there if either nature or nurture plays a major role.

However it is impossible to judge whether this randomness is finished by conception. Something during the embryonic growth may contribute. At least for the trait homosexuality in women there seems to be strong evidence for it. Early childhood may also have an influence on IQ as long as it cannot be influenced consciously. To judge whether such unconscious influences exist is impossible to decide because the statistics are identical^{3}.

In breeding animals genetic selection is meanwhile quite common. It is done by producing many embryos from one pair of parents in a Petry dish. The genes are scrutinized in order to find e.g. the embryo with the highest potential for a cow giving lots of milk. In humans such selection should theoretically be possible too. Though the genes for high IQ are not discovered yet, it will be possible someday. If done massively (albeit moral and ethical concerns), it would lead more and more to an exponential distribution of IQ. If done only by the rich who can afford it, it would lead to a mixture which is a Gaussian with a fat tail. It would be essentially the same model as we will suggest in subchapter 3.3.

As a start consider the monthly net household income of 2017 in Germany as given in

Though this paper is on mathematics rather than economics, it is an interesting question especially in the age of globalization and Piketty’s book [

varies very much over the years. Furthermore, the ratio of income from wealth to work is often overstated.

Here we will consider income only, be it from wealth or work. Within our accuracy goal this distinction is unimportant. We will also always go for net income. Of course the net income depends on the political system and things like minimum wages and social support. On the other hand, people try to increase their net income. Again within our accuracy it does not matter very much.

In subchapter 3.1 we will quickly state how values like the one from

We will show that the classical approach using the mean and standard deviation is wrong for principle reasons. A least square fit or better least absolute value fit [

In subchapter 3.2 we explain why one should not take values from the columns of ^{th} degree were nothing can be learned from. But even with the results from subchapter 3.2, households of a monthly income of ?0,000 and over should not exist.

In subchapter 3.3 we will construct a new model by using the historical results of Chapman [

Finding a Gaussian describing

p ( E ) = 1 σ 2 π ⋅ e − ( x − μ ) 2 2 σ 2 (4)

Please note that we have only an income range in each column of

μ ≈ 2664.03 € , σ ≈ 1686.12 € (5)

Though these results are pretty simple to get, a least square fit of

( 3792 40205 − ∫ − ∞ 900 € d E p ( E ) ) 2 + ⋯ + ( 2276 40205 − ∫ 6000 € ∞ d E p ( E ) ) 2 → min (6)

with respect to μ and σ contained in p ( E ) of Equation (4). The (numerical) solution of the minimization yields:

μ ≈ 2275.75 € , σ ≈ 1302.15 € (7)

As one sees there is quite some difference between the values in Equation (5) and Equation (7). Please note that this has nothing to do with the assumption of 450 ?and 7000 ? average of first and last column, respectively. It is easily possible to show that any assumption for the averages of the first and last column of

There is obviously a mistake in finding μ and σ in a Gaussian distribution by using the mean and standard deviation of the given data, even if the raw data (not clustered) were used. And this mistake can be quite big. We stress it here because this (wrong) procedure is standard for finding values of μ and σ in most non-mathematical sciences. The reason behind it is quite simple. If the given data are for sure exactly Gaussian, it is correct to assume that μ equals the mean and σ the standard deviation. However, this is something which will (almost) never be the case. The standard deviation is a non-linear function of the data. Although approximately Gaussian distributed data will be nicely fitted by a Gaussian, the standard deviation of these data is not necessarily an approximation for σ. It can be quite different as this example shows. Though the mean is a linear function of the data, it will not be identical to μ either. This has to do with the fact that σ and μ cannot be fitted independently.

Just for completeness we note that the least square fit is an approximation only as has been shown in [

| 3792 40205 − ∫ − ∞ 900 € d E p ( E ) | + ⋯ + | 2276 40205 − ∫ 6000 € ∞ d E p ( E ) | → min (8)

This minimization is numerically quite challenging. Maybe that is the reason why the (wrong) least square fit and not the (correct) least absolute value fit is normally used. Of course, in many cases least square fit and least absolute value fit will lead to similar results. However, here it is not the case. Though numerically tough, it is a well-defined problem with a unique solution. For our values we will get:

μ ≈ 2096 € , σ ≈ 1228 € (9)

The deviation of Equation (9) from Equation (7) is far from being negligible. And Equation (9) is even more different from Equation (5) than Equation (7). Though it is not the main part of this paper, we have two statements especially for non-mathematically sciences using statistics:

· Taking the standard deviation for σ and the mean for μ to fit a Gaussian distribution like in Equation (4) is generally wrong.

· The least square fit is an approximation only. The correct least absolute value fit will lead to quite different results especially in non-linear fits where the data vary over orders of magnitude.

Some critics might say that our Gaussian approach is faulty from the beginning. This is because a Gaussian distribution runs from minus to plus infinity. And negative incomes are impossible. Please note that this is always the case because nothing runs from minus to plus infinity. The IQ shows a perfect Gaussian distribution though there is no negative IQ. With income it is not absurd to assume negative values. With e.g. very low IQ and/or very poor health it is not possible to survive without support from the community which is nothing but a negative income. But be it as it may, of course one can start with a Gaussian running from zero to infinity. Because it needs a new normalization, Equation (4) will read now

q ( E ) = 1 σ 2 π ⋅ ( 1 + erf ( μ 2 σ ) ) ⋅ e − ( E − μ ) 2 2 σ 2 (10)

where erf ( x ) denotes the error function defined as

erf ( x ) = 2 π ∫ 0 x d t e − t 2 (11)

within this approach it is also possible to get μ and σ from the mean and standard deviation. However, the mean and standard deviation are given by Equation (25) and Equation (26), respectively. Now we have to solve two coupled non-linear equations:

2664.03 € = m ( μ , σ ) ∧ ( 1686.12 € ) 2 = s ( μ , σ ) 2 (12)

The solution of the couple Equation (12) is possible numerically only. As a result one will get

μ ≈ 1946.86 € , σ ≈ 2180.27 € (13)

Please note that getting μ and σ this way is incorrect for the same reason as the result in Equation (5) is wrong.

As stated above the correct way finding μ and σ is a least square fit. It takes the form

( 3792 40205 − ∫ 0 900 € d E p ( E ) ) 2 + ⋯ + ( 2276 40205 − ∫ 6000 € ∞ d E p ( E ) ) 2 → min (14)

The solution can be obtained numerically only:

μ ≈ 2104.98 € , σ ≈ 1620.90 € (15)

As explained above and in [

| 3792 40205 − ∫ 0 900 € d E p ( E ) | + ⋯ + | 2276 40205 − ∫ 6000 € ∞ d E p ( E ) | → min (16)

A numerical solution yields:

μ ≈ 1625.60 € , σ ≈ 1707.45 € (17)

This last result can be considered “exact” within our fit procedure. In the next subchapter we will learn that the fit procedure does not necessarily give a result from which one can learn something.

With Equation (17) we have the best possible Gaussian fit for the distribution of

The last two columns in ^{4} are the statistical measures we have chosen. The median is the “middle income.” It is the income to choose if one has only one number instead of the entire distribution. Some people take the mean as an alternative measure for it. Why this is wrong can be found in [

One might argue that the standard deviation is also a global measure. So fitting with it like in subchapter 3.1 should not be a bad idea. Firstly, we have to

# of households | mean income | median income | P90/P10 | |
---|---|---|---|---|

GER | 4,0749,525 | 2148.83 ?o:p> | 1888.50 ?o:p> | 3.7 |

USA | 126,220,000 | 2957.65 ?o:p> | 2394.29 ?o:p> | 6.1 |

UK | 27,800,000 | 1915.89 ?o:p> | 1567.63 ?o:p> | 4.2 |

DK | 2,686,035 | 2213.79 ?o:p> | 2006.73 ?o:p> | 2.9 |

note that the standard division is a quite complicated expression as given in Equation (26) or even more complicated as indicated below Equation (26). Secondly, the standard deviation has a very limited meaning here. In measurements like the mass of an elementary particle one will expect one value. In several measurements one will get different results though they should be equal. It does make sense to build a mean. And one should test whether the measured data have a Gaussian distribution. (If not, something is systematically wrong) The Gaussian distribution should be narrow if the measurements are accurate. As a measure for accuracy one may take the easily obtainable standard deviation. However, in income distributions such parameter does not make sense at all. There is a reason why we have an income distribution. It is not an error in measurement. Here we assumed that it has to do with the distribution of skills such as IQ. The deviations will teach us (in subchapter 3.3) what other effects rather than skills contribute.

In an extreme socialist country, it may be stated that everybody should have the same income as ordered by a socialist income committee. In such country it would be a reasonable idea to measure the real income. The mean should be the value set by the committee and the standard deviation tells how good the socialist ideology has been implemented. This shows another misunderstanding of statistics in non-mathematical sciences. But as stated in the very beginning of the introduction, 100 years old books like [

The fitting with median income and P90/P10 from ^{5}. Therefore we have to take the relative deviation. Furthermore, the exact mean is a constraint. Put this together we have to minimize the following with respect to μ and σ:

| n ( μ , σ ) − median 1 2 ( n ( μ , σ ) + median ) | + | p 90 / 10 ( μ , σ ) − P 90 / P 10 1 2 ( p 90 / 10 ( μ , σ ) + P 90 / P 10 ) | → min (18)

with the constraint

m ( μ , σ ) = mean (19)

n ( μ , σ ) , p 90 / 10 ( μ , σ ) , and m ( μ , σ ) must be taken from Equation (31), Equation (33), and Equation (25), respectively. The values for median, P90/P10, and mean come from

Nevertheless, Equation (18) and Equation (19) are a well-defined problem with a solution. The constraint of Equation (19) makes it a one-dimensional problem in two dimensions. In

| n ( μ , σ ) − median 1 2 ( n ( μ , σ ) + median ) | + | p 90 / 10 ( μ , σ ) − P 90 P 10 1 2 ( p 90 / 10 ( μ , σ ) + P 90 P 10 ) | (20)

with the constraint of Equation (19) and the values of Germany from ^{6}. The minimum in a correspondingly changed

The similarity makes two things likely. Firstly, our fit procedure is not just luck. Secondly, the income distribution is essentially Gaussian as long as global measures like median or Gini are concerned. It is also intriguing to compare the result of the fit procedure of subchapter 3.1^{7} with our results here. It will show how wrong the approach of subchapter 3.1 is. The first column of

The results for all four countries are summarized in

OECD.Stat | fit 3.2 | fit 3.1 | |
---|---|---|---|

mean | 2148.83 ?o:p> | 2148.83 ?o:p> | 2147.55 ?o:p> |

median | 1888.50 ?o:p> | 2127.46 ?o:p> | 1993.36 ?o:p> |

P90/P10 | 3.7 | 3.7000 | 8.1030 |

Gini | 0.294 | 0.2496 | 0.3508 |

μ | σ | Gini this fit | Gini OECD.Stat | |
---|---|---|---|---|

GER | 2106.37 ?o:p> | 992.902 ?o:p> | 0.250 | 0.294 |

USA | 2631.14 ?o:p> | 1918.25 ?o:p> | 0.315 | 0.391 |

UK | 1852.56 ?o:p> | 967.013 ?o:p> | 0.267 | 0.351 |

DK | 2202.08 ?o:p> | 848.398 ?o:p> | 0.213 | 0.261 |

μ and σ are just fit parameters. They do not have the meaning of mean and standard deviation like in the Gaussian distribution of Equation (4). Due to the constraint of Equation (19) we have in effect only one fit parameter. With one parameter only, the Gaussian model used in this subchapter describes reality nicely. This is especially surprising as an income distribution is not a result of one or a few constants as it is often the case in e.g. physics. Here many million people interact, and all have a free will. Obviously we have a quite just world. As most skills (especially IQ) are Gaussian distributed, the income is in accordance with it.

Having a closure look at

In the previous subchapter we have shown how to fit an income distribution with one effective fit parameter. The results are quite fine. However, they deviate for the rich. With the numbers of

# H H G E R = 3.86 × 10 − 8 , # H H U S A = 8437 # H H U K = 5.14 × 10 − 10 , # H H D K = 5.23 × 10 − 14 (21)

A net household income of 10,000 ?is for sure not common in neither of the four countries as the median is roughly four times lower in each of the four. However, it does exist^{8}, and for sure it is possible without any income from wealth. In contrast to it, Equation (21) teaches that such households should not exist in three out of four countries considered. And even in the USA the number is incredibly tiny. It should be bigger by at least a factor of 10^{3}.

Though the necessary skills to create income are fairly well Gaussian distributed, at least higher incomes are much more likely than any Gaussian distribution would predict. One thing is the income of a leader. This is a person who has subordinates. And part of the value created by these subordinates will contribute to the income of the leader. Generally this is just because only the leader has the skills enabling the subordinates to create so much value. It is an explanation why leaders may have hourly wages several times the wages of their subordinates. However, these leadership skills will also show a Gaussian distribution. It will not lead to the observed fat tail in income distributions from work. It is also impossible that the households with lower income are betrayed by their bosses. It would be possible in totalitarian states but for sure not in the four OECD countries considered here. Democracy and a working labor market will always lead to justice. If a boss pays too little, the most skilled workers will leave making the company less profitable. It is the same as with ordinary goods. The market determines the price.

That people will allow for a redistribution, be it by tax or even free giving is not impossible even in market economies. Using an extended Edgeworth cube [

However, without a working labor market everything is possible. Considering median incomes it is hard to imagine that the labor market is not working in this area. This is probably even accepted by trade unions and the like. They demand minimum wages, child support, etc. just because the free market is creating a too broad spread of incomes. A labor market works if there are many similar positions and many potential people able to fill the position. For people having several times the median income, there are less and less potential positions. Even if they imagine that they are creating much more value for their bosses, it will be much more difficult to find an alternative. One chance will be a spinoff. But it is rarely realistic. Rightly, there is also no lobby for people having several times the median income. They may suffer from injustice but not from financial hardship.

Now we have shown why a labor market may partly not work. It is another question who takes why an advantage of it. The answer to this question is only at first glance obvious. Unlike e.g. chimpanzees, humans are not homines oeconomici. They are not altruist either. They go for more money in order to become richer than their peers.^{9} It is the case especially within rich people. This is neither new nor is it just a gut feeling. Recently historic data from the second half of the 19^{th} century has been analyzed in detail [

Putting this into our Gaussian distribution of Equation (10) would mean a sigma growing with income. Though we do not say that this ansatz is not worth pursuing, it has two disadvantages. Firstly, it comes a little bit unmotivated. Secondly, it bears technical problems. If σ = σ ( E ) in Equation (10), the leading power of σ ( E ) must be less than linear. In other words, σ ( E ) ∝ E a with a < 1 in order to make normalization possible. With such low powers the effect is pretty tiny (besides making the math complicated especially for 1 / 2 < a < 1 ). This technicality can be fixed by introducing an income cutoff. Having a maximum possible income in the world is even realistic. Our income distribution like in Equation (10) is in that sense unrealistic because it will give a (very small) probability that someone has ten times the world income. On the other hand, setting a cutoff value seems arbitrary. It looks like an unmotivated fit parameter. Therefore we did not pursue this ansatz.

Our model used goes back to the effect that richer people tend to be leaders getting their money from advising subordinates. Getting up the hierarchy the number of people will be less and less. This alone would lead to an exponential distribution like in Equation (3). To make the number of assumptions as small as possible, we say that everybody tries to get money through subordinates. However, the will to do it andthe possibility is proportional to income. This leads to an E 2 term in front of the exponential distribution:

E 2 ⋅ e − E λ

We have to add this modified exponential distribution to our Gaussian one of Equation (10). After normalization we will have

q ( E ) = 1 2 σ 2 π ⋅ ( 1 + erf ( μ 2 σ ) ) ⋅ e − ( E − μ ) 2 2 σ 2 + E 2 4 λ 3 ⋅ e − E λ (22)

Formally we have now an identical optimization as given in Equation (18) and Equation (19). Instead of q ( E ) from Equation (10) we have to use Equation (22) now. Our problem is the following:

| n ( μ , σ , λ ) − median 1 2 ( n ( μ , σ , λ ) + median ) | + | p 90 / 10 ( μ , σ , λ ) − P 90 / P 10 1 2 ( p 90 / 10 ( μ , σ , λ ) + P 90 / P 10 ) | → min (23)

with the constraint

λ = 2 3 mean + μ 3 ⋅ Γ r ( − 1 2 , μ 2 2 σ 2 ) − 2 1 + erf ( μ 2 σ ) (24)

n ( μ , σ , λ ) and p 90 / 10 ( μ , σ , λ ) are given in Equation (39) and Equation (40), respectively. This minimization problem is well defined. The constraint is even simpler as in Equation (19). However, the highly non-linear functions must be determined numerically which consumes quite some CPU-time and RAM. The reason behind it is stated in the next chapter. There we also explain why it is virtually impossible to use the Gini g ( μ , σ , λ ) in Equation (23).

Making a 3D-Plot of Equation (23) (with λ substituted via Equation (24)) shows the areas of local minima. One should not just calculate points and connect them. Gradients should be considered too. This will make sure that there is really and minimum. It will increase the number of points to be calculated by perhaps a factor of ten times ten. But it is necessary because the minimum will be typically at a non-analytic point due to the absolute values in Equation (23). A software like Mathematica is very helpful here. As the problem lies in the inverse functions, Mathematica analyses the original functions and tries to get at least piece-wise analytic inverse functions. This is of course not always possible. So one has to choose by hand which interval should be considered. Even this way it costs quite some CPU-time. And it is neither straight forward nor can it be automated. Having identified the area with the smallest local minimum it has proven practical to find its value iteratively by guessing the value for σ and then making a one-dimensional plot over μ which will yield a minimum at a certain value of μ. With this value of μ one can plot over σ, and so forth until sufficient accuracy has been reached.

In

μ | σ | λ | # of households | |
---|---|---|---|---|

GER | 1719 ?o:p> | 539 ?o:p> | 859 ?o:p> | 1.44∙10^{4} |

USA | 1772 ?o:p> | 928.2 ?o:p> | 1361 ?o:p> | 1.43∙10^{6} |

UK | 1341 ?o:p> | 416 ?o:p> | 830 ?o:p> | 6.96∙10^{3} |

DK | 1955 ?o:p> | 340 ?o:p> | 824 ?o:p> | 6.27∙10^{2} |

in footnote 7, we do not know the exact number of these households. Else it would be smart to use it as a quantity to be fitted directly.

In ^{10}. The total distribution is the sum of both curves. The exponential part clearly enlarges the peak at the median income. The number of households getting about the median income also increases. In addition one gets a “fat tail.”

In subchapter 3.1 we defined in Equation (10) a “Gaussian” distribution q which runs from zero to infinity. Of course it is straight forward to calculate the mean m ( μ , σ ) and variance (=square of standard deviation) s ( μ , σ ) 2 :

m ( μ , σ ) = ∫ 0 ∞ d E E ⋅ q ( E )

s ( μ , σ ) 2 = ∫ 0 ∞ d E ( E − m ( μ , σ ) ) 2 ⋅ q ( E )

These integrals are tedious but straight forward to solve. A lengthy calculation yields

m ( μ , σ ) = μ ( 2 − Γ r ( − 1 2 , μ 2 2 σ 2 ) ) 1 + erf ( μ 2 σ ) (25)

s ( μ , σ ) 2 = 1 ( 1 + erf ( μ 2 σ ) ) 3 ( e − μ 2 2 σ 2 2 π μ σ ( 1 + erf ( μ 2 σ ) ) 2 + 2 ( μ 2 + σ 2 + erf ( μ 2 σ ) ( 2 ( σ − μ ) ( μ + σ ) + ( μ 2 + σ 2 ) erf ( μ 2 σ ) + μ 2 ( 4 − Γ r ( − 1 2 , μ 2 2 σ 2 ) ) Γ r ( − 1 2 , μ 2 2 σ 2 ) ) ) − ( 5 μ 2 + σ 2 + ( μ 2 + σ 2 ) erf ( μ 2 σ ) ( 2 + erf ( μ 2 σ ) ) − 4 μ 2 Γ r ( − 1 2 , μ 2 2 σ 2 ) + μ 2 Γ r ( − 1 2 , μ 2 2 σ 2 ) 2 ) Γ r ( − 1 2 , μ 2 2 σ 2 ) ) (26)

The error function erf has been defined in Equation (11) already. Γ r is the regularized incomplete gamma function with

Γ r ( a , x ) = Γ ( a , x ) Γ ( a ) (27)

where Γ ( a , x ) and Γ ( a ) are incomplete and “normal” gamma function, respectively with

Γ ( a , x ) = ∫ x ∞ d t t a − 1 e − t and Γ ( a ) = ∫ 1 ∞ d t t a − 1 e − t + ∑ k = 0 ∞ ( − 1 ) k k ! ( k + a ) (28)

Please note the sum in Equation (28). Normally, the gamma function is displayed by an integral only. But this only works for positive arguments. In the entire paper the first argument of Γ r is −1/2. So we have

Γ r ( − 1 2 , x ) = − 1 2 π Γ ( − 1 2 , x ) = − 1 2 π ∫ x ∞ d t t − 3 / 2 e − t (29)

Equation (29) does not lead to much simplification in the numerical calculations.

To be consistent with an absolute value fit one should consequently write

s ( μ , σ ) = ∫ 0 ∞ d E | E − m ( μ , σ ) | ⋅ q ( E )

This integral is easily solved by splitting it in one running from 0 to m ( μ , σ ) and one running from m ( μ , σ ) to ∞ . Though the solution is straight forward, it is much more complicated than Equation (26) and has about double its length.

In subchapter 3.2 we used the median, which we will denote n here (because m is already used for the mean). For any distribution q ( E ) normalized to one the median is

n = Q − 1 ( 1 2 ) with ∂ Q ( x ) ∂ x = q ( x ) (30)

As always, the exponent −1 denotes the reverse function with f − 1 ( f ( x ) ) = x . Applying this to the q ( E ) of Equation (10) (Gaussian from zero to infinity) leads to

n ( μ , σ ) = μ + σ 2 ⋅ erfc − 1 ( 1 − 1 2 erfc ( μ σ 2 ) ) (31)

where erfc is the complementary error function with erfc ( x ) = 1 − erf ( x ) .

The P90/P10 (abbreviated as p 90 / 10 ) ratio one gets for a general (normalized) distribution q ( E )

∫ − ∞ P 10 d E p ( E ) = 1 10 ⇒ P 10 = Q − 1 ( 1 10 )

and

∫ P9 0 ∞ d E p ( E ) = 1 10 ⇒ P9 0 = Q − 1 ( − 1 10 )

to

p 90 / 10 = Q − 1 ( − 1 10 ) Q − 1 ( 1 10 ) (32)

where Q is defined as in Equation (30). Applied to the q ( E ) of Equation (10) (Gaussian from zero to infinity) leads to

p 90 / 10 ( μ , σ ) = μ + σ 2 ⋅ erfc − 1 ( 1 10 + 1 10 erf ( μ σ 2 ) ) μ + σ 2 ⋅ erfc − 1 ( 9 10 + 9 10 erf ( μ σ 2 ) ) (33)

Here erf and erfc are defined as in Equation (11) and Equation (31), respectively.

Though not used here, a few words about the Gini coefficient g. It is defined for distributions q ( E ) running from zero to infinity only.

g = 1 − 2 ∫ 0 1 d f ∫ 0 E ( f ) d η η ⋅ q ( η ) ∫ 0 ∞ d η η ⋅ q ( η ) (34)

with

E = E ( f ) as inverse function of f ( E ) = ∫ 0 E d η q ( η ) (35)

Taking the q ( E ) of Equation (10) leads to after a tedious but straight forward calculation

g ( μ , σ ) = 1 − ( μ + 2 σ 2 π e − μ 2 2 σ 2 + μ ⋅ erf ( μ σ 2 ) + 2 σ ⋅ erf ( 2 erfc − 1 ( 1 + erf ( μ σ 2 ) ) ) − 2 σ π ( 1 + erf ( μ σ 2 ) ) ) ( erfc ( μ σ 2 ) − 2 ) μ ( 1 + erf ( μ σ 2 ) ) ( Γ r ( − 1 2 , μ 2 2 σ 2 ) − 2 ) (36)

Γ r is defined as in Equation (27) and Equation (28). Though this expression looks pretty clumsy, it consists of functions which can be evaluated with arbitrary accuracy. This is in contrast to the Gini coefficient we would need in subchapter 3.3.

Calculating the mean m ( μ , σ , λ ) for the distribution of Equation (22) is simple because an integral over a sum is the sum of the integrals.

m ( μ , σ , λ ) = μ ( 1 − 1 2 Γ r ( − 1 2 , μ 2 2 σ 2 ) ) 1 + erf ( μ 2 σ ) + 3 2 λ (37)

The constraint from Equation (19) m ( μ , σ , λ ) = mean can be solved for λ.

λ = 2 3 mean + μ 3 ⋅ Γ r ( − 1 2 , μ 2 2 σ 2 ) − 2 1 + erf ( μ 2 σ ) (38)

with Equation (38) the additional parameter λ can be eliminated.

To calculate the median n ( μ , σ , λ ) for the distribution of Equation (22) is formally like in Equation (30).

n = Q − 1 ( 1 2 ) with Q ( x ) = 1 − e − x λ 4 λ 2 ( x 2 + 2 x λ + 2 λ 2 ) − erfc ( x − μ σ 2 ) 4 − 2 erfc ( μ σ 2 ) (39)

Unfortunately there is no closed inverse function of Q ( x ) . Building an inverse function is numerically simple. But the amount of data is very big within our problem. Of course, we can insert λ from Equation (38) into Q ( x ) of Equation (39). This leaves us with two parameters μ and σ which are supposed to be determined eventually. In the sense of a Mont Carlo simulation one may assume 10^{3} different values for each parameter. So we have 10^{6} different functions Q ( x ) . In order to build an inverse function, we may have to assume 10^{3} different values for x. It leaves us with 10^{9} values which must be calculated. As each calculation contains integrals, this will consume quite some computing power.

Building the P90/P10 ratio for the distribution from Equation (22) causes the same problem. Formally p 90 / 10 ( μ , σ , λ ) is given like in Equation (32)

p 90 / 10 ( μ , σ , λ ) = Q − 1 ( − 1 10 ) Q − 1 ( 1 10 ) with Q ( x ) = 1 − e − x λ 4 λ 2 ( x 2 + 2 x λ + 2 λ 2 ) − erfc ( x − μ σ 2 ) 4 − 2 erfc ( μ σ 2 ) (40)

The parameter λ can be eliminated with Equation (38). Again, we will need the inverse function of Q ( x ) given in Equation (40). In order to find it we also have to calculate 10^{9} data points. (However, it is mostly an identical calculation)

Just for completeness we will also show how to calculate the Gini coefficient for the distribution given in Equation (22). The Gini g ( μ , σ , λ ) is formally given in Equation (34). The necessary function f ( E ) in Equation (35) is easily determined to be

f ( E ) = 1 − e − E λ 4 λ 2 ( E 2 + 2 E λ + 2 λ 2 ) − erfc ( E − μ σ 2 ) 4 − 2 erfc ( μ σ 2 ) (41)

For obvious reasons f and Q are identical. Putting it together we have

g ( μ , σ , λ ) = 1 − 2 m ( μ , σ , λ ) ∫ 0 1 d f ∫ 0 E ( f ) d η η ⋅ q ( η ) (42)

m ( μ , σ , λ ) is given in Equation (37). λ can be eliminated with Equation (38). E ( f ) is the reverse function from f ( E ) given in Equation (41), and q ( η ) is defined in Equation (22). Inserting all this will make Equation (42) look much more complicated. But this is not the real problem. Getting the function E ( f ) one needs to make 10^{9} calculations as stated above. Furthermore we have to take two integrals in Equation (42). Even going for a not too high accuracy, we may divide each integration interval into 100 pieces. This leads us with 10^{9+2+2} datasets. Even storing these 10^{13} data is critical. Therefore, a fit with the Gini instead or in addition to the median is impossible. Making some simplifications was not possible; at least the authors did not find any way. Finding the inverse function E ( f ) of f ( E ) may involve far less than the mentioned 10^{9} data. A smart software like Mathematica is able to find gradients in f ( E ) and also proves continuity. With it, it is (mostly) able to construct an E ( f ) in a much simpler way with the required accuracy. However, this E ( f ) is useless for the two (numerical) integrations in Equation (42).

We have shown how to use distributions, and what conclusions can be drawn. We have taken two examples and used mathematics which is well-known for over 100 years. This alone would disqualify our work as a journal publication. But we have chosen two particular examples: IQ distribution and income distribution. These examples belong to psychology and neighboring fields, and economics and finance and the like. These disciplines have in common that they use distributions frequently though they are not too close to mathematics. Over simplified two statements are prevalent there:

· Every distribution is a Gaussian distribution.

· The mean and the standard deviation determine μ and σ, respectively.

At most a Χ^{2} test is applied in order to prove Gaussian behavior. Applying the central limit theorem wrongly sometimes produces Gaussian distributions which are by no means justified. The statements make life easy, but they are wrong and may lead to false conclusions. To show examples for it was the main motivation to write this paper.

Our first example attacks the first statement. IQ is distributed almost perfectly Gaussian. At first glance that seems to confirm the statement. However, if distributions are always Gaussian, a Gaussian IQ distribution is a tautology. A Gaussian distribution appears only if something happens by chance. It is very difficult to produce such distribution otherwise, as everybody might know once trying to fake lab data with the tools of the early 1980 ties. The accidently mixing of genes generates a certain IQ. In chapter 2 we concluded from this that IQ must be inherited or at least not being created by conscious actions. Though this is assumed by the vast majority of academic psychologist, we have presented mathematical proof for it.

As our proof is clear-cut, it is hard to imagine any further research. However, two things may be worth scrutinizing. One is the width of the IQ distribution. It differs for men and women. But even the total IQ distribution does most likely not have the same width in every country. The big problem is finding sources of data for it. Even for the average IQ in different countries there are no complete reliable data. In developing countries, it is particularly dim. Data for the width are not available, at least not for the authors. However, many factors might contribute to IQ and especially its distribution. Suspects are the frequency of marrying cousins, fidelity, religion, and many more.

In this paper we came also across cultural inheritance [

Our second examples are income distributions. Income is based on skills such as IQ. Many of them show a Gaussian distribution. Therefore it is a reasonable assumption that income shows a Gaussian distribution. Subchapters 3.1 and especially 3.2 confirm it in most points. Puzzling is a “fat tail”. It means that there are far too many rich than any Gaussian distribution can predict. Before solving this problem, we showed in subchapter 3.1 that even if assuming a Gaussian distribution from minus to plus infinity, fitting μ by the mean and especially σ by the standard deviation can be tremendously wrong. In subchapter 3.2 we assumed a Gaussian distribution with positives incomes only. Such “half” Gaussian is a much more realistic assumption in many cases ranging from infection rates to certain measurement errors. There the standard deviation has nothing to do with σ. In Equation (26) we have displayed a lengthy formula for the standard deviation in such distribution. It is a (complicated) function of μ and σ. Though we got in subchapter 3.2 a decent fit with our half Gaussian, it is still far from possible to explain the fat tail.

In subchapter 3.3 we introduced in Equation (22) a distribution consisting of a Gaussian part and a (modified) exponential distribution. We were motivated to it by the evaluation of historical data by Chapman [^{16} (DK) to 10^{3} (USA) to much more realistic values. With monthly net incomes of ?0,000 and especially above, the income from (inherited) wealth will be more and more important. Therefore it does not make sense to scrutinize very high incomes in detail within our model.

As a further research within income distributions one could extend the procedure of subchapter 3.3 to other countries and by using other measures to fit. As the (numerical) mathematics is very complicated, the authors will not presume actions in this direction. We also see no way to simplify or automate the numeric calculations, though it would be very welcome. Maybe tools of big data can help.

It would be worthwhile investigating in other areas where the results of [

A quite obvious extension of our results is fat tails in finance. The general mechanism must be the same. The income from stocks is nothing more than the sum of the values created by workers. Furthermore, even companies are neither homines oeconomici nor “machinae oeconomici”. Companies are always led by humans. Therefore it comes as no surprise that many bosses want to make their company more profitable than a rival company, even if the total profit of both companies reduces this way. Quite a few (proud) stock holders will accept it.

The reason why we have not taken the fat tail from finance as an example has many sources. There are fundamental errors in the work of Fama [

This publication is dedicated to the late Thomas Dierks. A distinguished scientist, and high school classmate and comrade of M.G.

The authors declare no conflicts of interest regarding the publication of this paper.

Grabinski, M. and Klinkova, G. (2020) Scrutinizing Distributions Proves That IQ Is Inherited and Explains the Fat Tail. Applied Mathematics, 11, 957-984. https://doi.org/10.4236/am.2020.1110063