_{1}

Estimating causal effects is a principal goal in epidemiology and other branches of science. Nonetheless, what constitutes an effect and which measure of effect is pre-ferred are unsettled questions. I argue that, under indeterminism, an effect is a change in the tendency of the outcome variable to take each of its values, and then present a critical analysis of commonly used measures of effect and the measures of frequency from which they are calculated. I conclude that all causal effects should be quantified using a unifying measure of effect called the log likelihood ratio (which is the log probability ratio when the outcome is a discrete variable). Furthermore, I suggest that effects should be estimated for all causal contrasts of the causal variable (i.e., expo-sure), on all values of the outcome variable, and for all time intervals between the cause and the outcome. This goal should be kept in mind in practical approximations.

A great many disagreements in science originate in the debate between determinism and indeterminism, and that is also the case when considering measures of effect. I shall begin the article with a preliminary section (Section 2) that provides a quick explanation of determinism and indeterminism, but rather than revisiting the debate, I will develop this article from an indeterministic viewpoint and leave a critique of determinism for another day.

Section 3 begins with a trivial question: What is a measure of effect? The answer: A measure of effect is a way to quantify a change in tendency. Much of Section 3 specifies more precisely which tendencies and which changes in them are of interest. Section 4 contains a thorough discussion of measures of frequency and their use in quantifying tendency. The discussion culminates in one satisfactory measure capable of generically quantifying the tendency of interest.

Section 5 focuses on measures of effect, as derived from measures of frequency. Two common measures will be considered: The ratio and the difference. One key argument decides between the two, which is then applied to all measures of effect. That argument results in many closely related measures, all equally capable of quantifying changes in tendency. Among them one measure has nicer mathematical properties, which lead me to consider it the ideal measure of effect. Lastly, in Section 6, I consider a few complementary points that are not stressed earlier in the article, including philosophical interpretations of probability as they relate to quantifying tendency.

There are several schools of thought regarding how scientific knowledge should be advanced, at the heart of each is a set of axioms, not always clearly articulated. For now, let’s consider three major schools of thought: determinism, indeterminism, and individualized-indeterminism. These three viewpoints diverge at the very essence of causation: How do causes influence their effects? Specifically, what happens to an effect (variable G) once all of its causes (variables A, B, and C) take values? (

Let A, B, and C represent the amount of vitamins A, B, and C in the blood, respectively, and let G be blood glucose level. Suppose Mr. White, Ms. Black, Mrs. Brown, and Dr. Green all have identical amounts of those vitamins: A = a, B =b, and C = c.

・ According to determinism, once G realizes, it must take the same value, say g_{0}, for any of these people. Thus, Mr. White will have G = g_{0}, and Ms. Black will have G = g_{0}, as will Mrs. Brown and Dr. Green. In fact, anyone who has A = a, B = b, and C = c must have G take the value g_{0}.

・ According to indeterminism, everyone who has A = a, B = b, and C = c shares the same tendency of having G = g for any value g. (I will use the notation _{0}, T(G = g_{0} | Being Mr. White) = T(G = g_{0} | Being Ms. Black), and for G = g_{1}, T(G = g_{1} | Being Mrs. Brown) = T(G = g_{1} | Being Dr. Green). Furthermore, it is standard to assume that

・ Individualized-indeterminism, contrary to the other two viewpoints, does not accept a universal rule of causation. Rather, everyone has his or her own tendency of having G = g. That is, T(G = g_{0} | Being Mr. White) need not equal T(G = g_{0} | Being Ms. Black). Under an individualized model, effects are person-specific.

Under indeterminism, there are only two types of measures of effect: Those that deal with tendencies and those that deal with averages of such tendencies. Since indeterminism asserts that causation was built with underlying tendencies, any measure of effect that deals with tendencies describes the building blocks of causation. Other measures, such as arithmetic means, do not quantify the underlying tendencies. They are functions of many such tendencies, which may not even exist, as in the Cauchy distribution.

Even worse, averages can indicate null effects when clearly something has changed. To illustrate the point, let’s join Mrs. Brown on her visit to Dr. Green, who recently conducted a study of the effect of A on G, while assuming null effects of B and C (_{0}, G has a bimodal distribution with a mean of 85 mg/dL that peaks at 10 mg/dL and at 300 mg/dL. He also estimated that when A = a_{1} (a_{1} > a_{0}), the probability distribution of G is approximately Gaussian with a mean of 85 mg/dL.

“Mrs. Brown, your current value of A is a_{0},” starts Dr. Green. “If you were contemplating increasing it to a_{1}, know the effect I found in my study was null on a mean difference scale. As such, I see no reason for you to add more vitamin A to your diet.”

“Null on a mean difference scale?” wonders Mrs. Brown. “I’ve heard that outside of our hypothetical world some guy is writing a story about why the mean difference is not a good measure of effect. Would you mind telling me the effect on another scale?”

“On other scales the effect is very large, and according to them you should increase your vitamin A level from a_{0} to a_{1}. But don’t worry. The mean difference scale is just fine.”

“Great, I think I’m going to go eat a giant carrot.”

Let’s leave Mrs. Brown to her lunch and jump into the framework of measures of effect that deal with tendencies. Under this framework, an effect is a change in tendency, and a measure of effect quantifies that change.

Referring back to

Suppose we hold a theory that A is the only variable among A, B, and C, that has a non-null effect on G. Thus, we proceed to estimate

I would prefer a grey approach. One in which we can salvage

Moreover, suppose we are supplied with the value of

Before discussing how to quantify the change in tendency or even the tendency itself, we must answer a basic question: Tendency of what? As the example in Section 2 illustrates, the tendency is always of some variables to take some values. The tendency of a single variable taking one of its values will be called an individual tendency. If multiple variables (or multiple values) are being considered, the tendency will be referred to as a group tendency.

When the variables in question do not depend on one another, group tendencies are just a function of individual tendencies. As such, they suffer from the same drawbacks as averages. In this case, there is no point in considering group tendencies. When the variables do depend on each other, group tendencies are a function of the individual tendencies along with other tendency-like quantities. Since in most cases, we cannot know a priori whether variables depend on one another, we cannot guarantee that we are estimating a legitimate group tendency.

I have just argued that the tendency is of a single variable having a given value. The important thing to remember is that a variable exists at a time point: G_{1} (G at time 1) is not the same variable as G_{2} (G at time 2). Yet the tendency of having a value within a time interval is often calculated in research. For example, the tendency of having a stroke within the next ten years; that is, the tendency of at least one stroke status in the next ten years having the value “stroke”. But there are an infinite number of stroke status variables during that time interval: stroke status one minute into the study, after a year, as well as after 2 years 6 months 6 days 50 minutes and 8 seconds. In the end, the only tendencies that matter are those of individual variables―variables at a time point.

Furthermore, tendencies over time intervals, like averages, can indicate null effects when clearly something has changed. Consider two food items, food A and food B, that lead to heartburn. Suppose people who eat food A are likely to start having heartburn five minutes after eating, whereas people who eat food B are equally likely to get heartburn, but only after three hours. Suppose further that in both cases, heartburn will last for about an hour (

Lastly, recall that the tendency is fixed only when all of the causes take specified values. Furthermore, all of the causes must be concurrent (i.e., at the same time point). Otherwise, we are asking about the tendency of the outcome to occur given the current state of the world and a future state that may not be realized. Although such hypothetical tendencies may be calculated, they do not correspond to the building blocks of causation.

Having just argued that the tendency is of a variable at a given time point taking a single value, let’s add time indices to the causal diagram in

It is common practice to estimate the tendency of an event [

To begin, we should distinguish between two types of variables: natural variables and derived variables. Natural variables are properties of physical objects; they make up the causal structure of the universe. Derived variables, in contrast, are variables whose values are determined mathematically. When treated as causes or effects of interest, derived variables account for a bias (termed “thought bias”) that arises when a “causal parameter” is estimated and no such parameter exists [

An event is the value of a derived variable, not a natural variable, as shown next. Allison ( [

Although there are several arguments as to why derived variables are not natural variables [

In _{1}, but knowing the value of X at time t_{1} does not determine whether an event occurred at that time. In fact, as far as event status is concerned, it makes no difference what value X takes at time t_{1}. Since X = 0 before t_{1} and X = 1 after t_{1}, the event occurs at t_{1} regardless of the value X takes at time t_{1}. Therefore, the so-called event at t_{1} does not describe the status of the world at time t_{1}. It follows that an event is not a value of a natural variable.

That completes my reasoning for not considering the tendency of the occurrence of events. Rather, I would like to study effects (changes in tendency) where the tendency is of a natural variable taking one of its values. Even still, Dr. Green and Dr. Black insist that events are the proper way to study the world. (I have reached an impasse with fictional characters!) I should probably stop talking to them and make a declaration.

The Declaration of Values:

When in the course of scientific events it becomes necessary for one scientist to dissolve the bands which have connected him with an inappropriate methodology and to assume another, a decent respect to the opinions of other scientists requires that he should declare the reasoning which impels him to the separation.

I hold these truths to be self-evident, that all variables are created equal, that they are endowed with certain values, and that among these no value may be demarcated an event.―That the tendency I wish to consider is of a natural variable having a given value at a specified time point, not the occurrence of an event.

Finally, let me formally answer the question, “Tendency of what?”, in one brisk sentence using

The tendency of interest is the tendency of a natural variable (G) at a given time (t_{0} + ∆t) having a value (g) conditional on all of its causes (A, B, C) at a prior time (t_{0}) having specified values (a, b, c). (Notation:

That tendency can vary with ∆t, the time between the causes (

My answer is short: on every value of the outcome. But prevailing answers to that question make quite a list. For continuous variables, most researchers estimate the arithmetic mean difference, or rarely, the geometric mean ratio. They should direct their attention to the story of Mrs. Brown’s vitamin A deficiency in subsection 3.1.

For binary variables, some propose to estimate the effect on the desired value, whereas others advocate for estimating the effect on the unwanted value [

Another solution avoids the issue of desired and undesired values altogether: Use a measure of effect that contains the information for the effect on all values, such as the proportion difference or the odds ratio [

For categorical (non-binary) variables, Allison and others discuss a modeling method―multinomial regression―that estimates the effect on all values of the outcome [

To illustrate why you might be interested in all such effects, consider the effect of a binary variable E on a trinary variable D. Let 0 and 1 be the two values of E and suppose that D has one value which is considered good, another value which is considered bad, and a third value to which we are indifferent. I will creatively name the good value good and the bad value bad. Suppose that on some scale the effect of E changing from 0 to 1 on D = good is very large. That is, D is much more likely to take its good value if E = 1 than if E = 0. Should you prefer that E takes the value 1 rather than the value 0? Not necessarily. E = 1 might also make it much more likely for D to take the value bad, which is mathematically possible for a trinary D. So the preferred value of E depends on the magnitude of the effects on both D = good and D = bad.

In general, it is necessary to consider the effect on all values of the outcome when making decisions based upon effects, just as one would consider possible side effects of some treatment along with possible benefits. The last point is usually obscured for binary variables since an increase in the probability of the good value is always accompanied by a decrease in the probability of the bad value. Nonetheless, even for binary variables we should estimate the effect on both values of the outcome. To paraphrase Sheps, since the two effects may give very different impressions, it is always well advised to consider both comparisons [

The effect of _{1} vs. a_{0} is _{1} vs. a_{0}); the value of the outcome on which the effect is calculated (g); and the time interval between the cause and the outcome (∆t). We are ultimately interested in the set of such effects for all causal contrasts, all values of the outcome, and all time intervals between the cause and effect. This goal should be kept in mind in unavoidable, practical approximations.

How should tendency be quantified? Naturally, by some measures of frequency. I propose, however, that a single measure of frequency should be used to quantify the tendency of interest for any variable having any of its values at any time point. From a causal standpoint, there is no fundamental difference between different variables, different values, and different time points. To clarify, by “a single measure of frequency” I mean a measure of frequency that can be fully described by one mathematical framework as it applies to all variables, values, and time points. Furthermore, that mathematical framework should prescribe only one way of quantifying each tendency of interest; it should not allow for two different ways to quantify the same tendency.

In this section, I consider numerous measures of frequency and argue for or against them based upon whether they are capable of quantifying the tendency of interest in all cases. For now, let’s consider quantifying the tendency for discrete variables. Later, I will discuss how the arguments may be extended to continuous variables.

For the rest of Section 4, if M is a measure of frequency, _{0} < t, and c is a vector representing the values they take. For example, probability, denoted by P, is a measure of frequency. So

Several measures of frequency arise in survival analysis. In one way or another, all of them describe the tendency of an event to occur. Since event-status is not a natural variable (Subsection 3.5), none of them can generically quantify the tendency of a natural variable taking one of its values. Fortunately, measures of frequency in survival analysis can be modified slightly so as not to refer to events, and such measures may be able to quantify the tendency of a natural variable taking one of its values.

First, let’s consider the cumulative distribution function, _{0}, the beginning of the study, and time t. That is,

Next, I’ll modify _{1}, after a period of time during which

In the previous equation, the only reference to the event is the condition “for some time

The measure of frequency,

There is an exception, however, when events―forgive me for using the word―are nearly irreversible. In that case, almost everyone who has had X = x_{1} prior to time t will have X_{t} = x_{1}. Therefore,

The survivor function,

Again there is an exception, although with more constraints than before. If events― cringe!―are nearly irreversible and X is a binary variable, whose other value is denoted

Even when the exceptions hold, neither _{t} having the value_{t} having the value

Next in survival analysis is the probability density function,

As before,

The above expression does not consider events; it describes X having a value. And the limit guarantees that X is considered at a time point. Great! Too bad the limit doesn’t exist for every value of the outcome.

Proposition:

Proof: Let X be a discrete variable with values

Since

exists. Furthermore,

Still,

Since the numerator of

doesn’t approach zero while the denominator does,

does not exist. QED

The above proof shows that at every time point there is at least one value of a discrete variable for which the tendency of interest cannot be quantified by

The hazard function,

The hazard function is the probability density function conditional on not having had the event by time t. That is,

Upon removing “all” of the references to the event in the hazard function, we are left with the following measure of frequency:

It is debatable whether the condition “

Suppose, instead, that

Still,

So far I have mentioned in passing only one, possibly suitable, measure of frequency: The probability at a time point.

For discrete variables, it most certainly can. We will need, however, to accept axiomatically that

For continuous variables,

Two common measures of frequency are related to the probability at a time point: the probability over a time interval and the rate. Those measures primarily differ in their consideration of time. Both, however, consider the probability of having a value within a time interval and therefore, cannot generically quantify the tendency at a time point. You might think to take their limits as the time interval shrinks to zero; those limits either don’t exist or equal the probability at a time point. As such, they offer no new potential candidates for quantifying the tendency. Nonetheless, I think it is worthwhile to discuss their relation to the probability at a time point.

The probability over a time interval is

For larger time intervals,

To illustrate that relation, let’s consider why that measure of frequency is used. It is used when the probability at a time point is very small (

For argument’s sake, let’s consider an idealized model. Assume that every participant in the study is observed for the entire duration of the study; that

The idealized model is similar to a study in which the outcome of interest is rare and the duration (D) during which a study participant has the outcome is more or less identical. For example, consider a study in which X is heart attack status, and

In general,

If the two probabilities,

The incidence rate can be altered to fix the above problems. I will call the resulting measure of frequency the event-less rate. First, the number of people who have the event needs to be replaced by

In the idealized model discussed above, the event-less rate would be approximately equal to

Most of the measures of frequency discussed so far cannot quantify the tendency of interest at all, because they do not consider the outcome at a single time point. Only the hazard function applies to a time point, but it introduces irrelevant conditioning. Whatever continuous analogs those measures may have, they may at best serve to approximate a relevant measure of frequency.

So far only two measures are applicable to quantifying tendency: the event-less probability density function,

Before discussing that extension, I will mention a probability density function of a different sort. Instead of the probability density function_{t} takes real values and

After a quick inspection, you will see that

That extension is best explained with some new concepts. Consider a generic variable,

We might then consider the following measure of frequency:_{t} takes a value belonging to the set_{t} to take one of many values, not a single value. Taking the limit as r approaches zero would ensure that only a single value is considered, but the limit is zero for continuous variables:

That problem can be avoided if we first divide

For those familiar with measure theory, the likelihood will be a representative of the Radon-Nikodym derivative, dP/dμ, under mild restrictions (i.e.,

At first glance, the likelihood at a time point suffers three problems. First, the limit below need not exist:

We can, however, accept axiomatically that it exists just as we accepted that

It remains to show how _{t} taking real values.

For discrete

and let μ be the counting measure. That is, μ(A) is the number of elements in A. Then,

for discrete

For continuous X_{t} taking real values, let d be the Euclidean distance on

where we viewed r as being

In summary, the likelihood combines

Among the measures of frequency discussed so far, only the likelihood at a time point has been shown to possibly quantify the tendency of interest. All other measures of frequency either cannot quantify the tendency of interest, or are merely approximations of the likelihood at a time point under certain conditions. And although there may be an infinite number of ways to quantify tendency, I will consider only one more common measure of frequency: the odds.

Many articles purport that the odds is a bad measure of frequency because it approximates the probability only in certain cases, or more specifically that the odds ratio is a bad measure of effect because it approximates the probability ratio only in certain cases [

The odds at a time point is defined as

We can instead try to extend an odds-like measure that has been used for non-binary outcomes [_{t} whose values are

The partial odds, however, introduces a new problem not present in the odds. Consider a discrete variable X_{t} with values _{t} taking the value_{t} taking the value

As promised I now return to discuss the likelihood at a time point, the only contender still quantifying tendency. As I have not done so before, let’s explicitly verify that the likelihood,

Being a measure of frequency, _{0}) having specified values (c).”

In order to quantify the tendency, the likelihood must satisfy a few more properties. First, it must be defined for all variables, all values, and all time points. I have axiomatically accepted in subsection 4.7 that the likelihood is defined in all such cases. Second, unlike the time point probability of continuous variables, the likelihood must never be fixed a priori. That is, it should vary in all cases with the tendency of the outcome. In constructing the likelihood,

One issue remains. There might be yet another measure of frequency capable of quantifying the tendency of interest; the notions of tendency and measure of frequency are not explicit enough to ensure that no such measure exists. If there was such a measure, would it be preferred to the likelihood or not, and how are we to decide? To solve such a problem, perhaps we should not only accept that the likelihood can quantify the tendency of interest, but rather accept axiomatically that the likelihood at a time point is, in fact, the tendency of interest.

Having concluded that the likelihood is the “ideal” way to quantify tendency, let’s move on to deciding which measure of effect to use. Numerous functions may be proposed to quantify the change in tendency. I will consider just two―the ratio and the difference― and then argue generically against almost all others.

To begin, let me explicitly define the ratio and the difference. Consider the effect of E_{0} on D_{1}. Specifically, consider the effect of E_{0} (e_{1} vs. e_{0}) on D_{1} taking the value d, and let C_{0} be a vector of all the causes of D_{1} at time 0 except E_{0}, and let c be a vector representing the values they take. On a ratio scale that effect is

The debate between proponents of the ratio and proponents of the difference is filled with many comments, which I find insignificant. I will note previous arguments and explain their lack of importance in deciding between the ratio and the difference. Finally, I will present the only substantive argument I have found capable of deciding between the two measures.

The choice of a measure of effect depends, in part, on the goal in mind, and mine is simply to quantify the change in tendency. The arguments I will discuss in this section are irrelevant in the sense that they arise from other goals―namely, estimating averages in a population, and enhancing people’s understanding of the data.

One branch of determinism occupies itself with target populations (i.e., estimating effects in finite populations). To that end, the difference is thought to be more applicable than the ratio. Greenland, for instance, notes that the difference of the average probability in a population equals the average probability difference, whereas the ratio of the average probability is not the average probability ratio [

Some people prefer a measure of effect that “enhances people’s understanding” or is “easily interpretable” [

Cook and Sackett present another related argument against the ratio [

The ratio and the difference have different mathematical properties that are relevant to the debate, but those mathematical properties are of little value without substantiating their necessity on philosophical grounds. I have listed below the mathematical arguments of which I am aware. All of them, except for the first, have no philosophical grounds; the first is based on philosophical grounds against which I have already argued.

1. The effect on a difference scale is symmetric for binary variables. That is, if you know the effect on one value of the outcome you know the effect on both. For binary variables, the symmetry of the difference scale solved the dilemma people had: On which value to estimate the effect [

2. Some people prefer that effects approach extreme values as they approach deterministic limits [

3. Earlier in the article, I reviewed a few measures of frequency related to the probability at a time point. Under certain conditions, those measures are approximately proportional to the probability at a time point. Furthermore, the constant of proportionality may be the same in exposed and unexposed, in which case it will cancel on the ratio scale. Thus, those measures of effects may be used to estimate the probability ratio under certain conditions. On a difference scale, however, the constants of proportionality will not cancel. As such, we cannot estimate the probability difference from related measures of frequency unless the constants of proportionality are known. Nonetheless, if the difference was found to be the preferred measure of effect, it wouldn't matter that the ratio can be estimated by those measures.

4. For continuous variables, the difference scale has units while the ratio scale does not. The units can be said to make it difficult to appreciate the magnitude of an effect. Once the units are understood, however, the difference scale for continuous variables is not much more difficult to interpret than the difference scale for discrete variables. (Note: different units correspond to different functions d and μ in the definition of the likelihood. See Subsection 4.7.)

The arguments noted so far are in no way definitive. There is only one argument that I consider to be important in deciding between the ratio and difference. To explain that argument we first need to consider the way in which effects are estimated.

To estimate an effect, auxiliary causal theories must always be invoked. Those theories can be represented in a causal diagram. For example, _{0}àD_{1}; the auxiliary theories are represented by the arrows C_{−1}àI_{0}, I_{0}àD_{1}, and C_{−1}àE_{0}. Those theories are essential for estimating effects unbiasedly. For example, according to _{−1} or I_{0} to remove confounding bias. In general, the causal diagram to which we hold indicates on which variables we must condition to remove bias.

Removing bias, however, is not the only way to improve a study. In general, a study is better the more likely it is to produce estimates closer to the true effect. As such, we must strike a balance between minimizing bias and minimizing variance. Therefore, it is not necessary to condition on all of the variables mandated by a causal diagram. If we don’t condition on all the relevant variables, then the magnitude of bias may be described by a bias term, which depends, in part, on the effects specified in auxiliary theories.

In general, the bias term is a function of effects, tendencies, and other likelihoods. The bias depends on the effects in auxiliary theories, and occasionally it depends on the effect being estimated. It may also depend on various tendencies. Lastly, the bias term may contain marginal likelihoods. These are likelihoods which do not include conditioning on the causes of the outcome. As such, they are not assumed fixed between studies.

Taking _{0}àD_{1} without conditioning on C_{−1} or I_{0} as needed to remove bias. Then, the bias term depends on the effects C_{−1}àI_{0}, I_{0}àD_{1}, and C_{−1}àE_{0}; on various tendencies; and on the marginal likelihood_{−1}.

We were not able to say, however, on which various tendencies the bias term depends because the answer depends on the measure of effect. In general, different measures of effect have different bias terms. On the difference scale, for instance, the bias term is another term added onto the effect. That is, every study that estimates an effect on the difference scale actually estimates the effect plus a bias term. And when that term equals zero, the study is said to be unbiased. On the ratio scale, the bias term is actually a bias factor. That is, every study that estimates an effect on the ratio scale actually estimates the effect multiplied by a bias factor. And when that factor equals one, the study is said to be unbiased.

Note that I am not using the term “bias” to mean that the expected value of the estimator differs from the parameter being estimated. Rather, I use it to mean that the parameter being estimated differs from the parameter we wish to estimate [_{0}, then we are estimating_{0}, then we are estimating

Now that we have described the bias term, we return to our earlier comment about striking a balance between minimizing bias and minimizing variance. Since the bias term depends on effects that are not known a priori, the balancing act between bias and variance is qualitative. We are not interested in the exact amount of bias, but whether the bias is large enough to be of concern, or small enough to be ignored. Such a description of bias requires that the auxiliary theories be qualitative, not quantitative. It also requires that we know the relation between the bias term and different effects. I will refer to those relations as the rules of causal diagrams.

In many cases, the rules of causal diagrams can be described as follows. When certain relevant effects are null, the bias term indicates that no bias will be present. For example, if any of the effects C_{−1}àI_{0}, I_{0}àD_{1}, or C_{−1}àE_{0} were null, then no bias will be present when estimating E_{0}àD_{1} without conditioning. As the magnitude of these effects increases, the bias term usually indicates that more and more bias is present. We say “usually” because the various tendencies and marginal likelihoods could lead to less or even no bias. When precisely no bias is present because of the tendencies and likelihoods, we say that a lucky cancellation has occurred.

Since auxiliary theories are qualitative, such precise cancellations are usually not of importance. Instead, we need a general description of the rules of causal diagrams as illustrated above. There should be some magnitudes of effects for which no bias is present; the more the effects differ from those special cases, the more bias should usually be present. Furthermore, the measure of effect used in these rules should be the measure of effect we decide upon. That is, we should not write the bias term for the difference in terms of the ratio nor should we write the bias factor for the ratio in terms of the difference. Under that constraint, it is possible that what one measure considers to be a lucky cancellation could be explained by another measure of effect. It seems strange that we could know about bias on one scale purely from the effects on another scale. Therefore, I would prefer a measure of effect for which lucky cancellations cannot be explained by another measure.

In many cases, there is no problem because on both the difference scale and the ratio scale, bias is often absent when certain effects are null, and the bias usually increases as the effects strengthen. Since a null effect on the two scales is equivalent, both the difference and the ratio are often capable of explaining their respective bias. There is, however, an exception pertaining to colliding bias, specifically bi-path colliding bias.

Consider Diagram A of _{0} and R_{0} are two marginally independent causes of K_{1}. Q_{0} and R_{0} are dependent conditional on K_{1} = k if and only if they modify each other’s effects on K_{1} = k on the probability ratio scale [_{0}, R_{0}, and K_{1} are all discrete, and is suspected to hold for any Q_{0}, R_{0}, and K_{1}. The result of conditioning on K_{1} = k when Q_{0} and R_{0} modify each other’s effects on a probability ratio scale is depicted in Diagram B of _{0} and r_{0} on the arrows; conditioning on K_{1} = k is denoted by a box around K_{1}; the two lines over the adjacent arrows denote that the arrows no longer contribute to associations; and the dashed line between Q_{0} and R_{0} indicates that the association between them has changed after conditioning.

If the diagram in _{0} and R_{0} can produce bias. Consider the M-structure depicted in _{1} = k when estimating the effect E_{1}à_{0} and R_{0} do not modify each other’s effect on K_{1} = k on the ratio scale (_{0} and R_{0} on the ratio scale, the larger the bias usually is on both the difference scale and the ratio scale (

Therefore, the rule governing bias for either scale in

Nonetheless, it is not clear from the above reasoning that the ratio is an acceptable measure of effect. In the next subsection, however, I show that by the above reasoning the ratio is essentially the only possible, appropriate measure of effect. Still it might be the case that a lucky cancellation on the ratio scale could be explained by another measure of effect. If so, the above reasoning does not give an absolute preference for the ratio. To verify that that is not the case, we would need to check all the rules of causal diagrams as they apply to the ratio.

So far I have concluded that the ratio is preferred to the difference. Nonetheless, there are many other possible measures of effect. In general, a measure of effect is a function,

First, I would like effects to be quantified by real number (or possibly a real number with units). It may be denoted by other things, but it seems strange to quantify a change between real numbers (possibly with units) with a complex number, a 19-dimensional vector, or anything else.

Second, effects are intuitively considered to be similar if the relevant tendencies are similar. As such, I would like such effects to be denoted by similar real numbers. That is, I would like

Note that both the difference and the ratio scale satisfy that property. (For the ratio, this holds so long as the likelihood is never zero. See Subsection 4.7.).

Third, there should be a real number representing a null effect on the new scale. Why? Because the null is unique! While we may argue what is considered a large effect or a small effect, the null represents the special case when the outcome is indifferent to the value of the cause. Even the ratio and the difference agree as to what is considered a null effect. I would like the new measure of effect to share in that consensus. To make the previous statement mathematically rigorous, there should be a real number C representing a null effect such that

Lastly, we consider the rules of causal diagrams. As shown in _{0} and R_{0} as quantified on the ratio scale may change upon conditioning on the collider K_{1}. Whether such a change will occur depends on effect modification on the ratio scale. Yet a non-null association on the ratio scale is accompanied by a non-null association on the new scale, because associations and effects are quantified in the same way. It will follow that whether bias will be present on the new scale in

To make the last statement mathematically rigorous, consider the causal diagram in _{0} when R_{0} = 1 and the effect of Q_{0} when R_{0} = 2. On the ratio scale, these two effects are

To be able to tell whether effect modification is present on the ratio scale based solely upon effects on the new scale, there must be some relation, R, such that

A quick check will show that R is an equivalence relation. In fact, we will prove that R is just equality on the range of

Next, we prove the claims made in the last paragraph.

Proposition:

Proof: First, we will prove by contradiction that R is equality on the range of

For each λ, _{0} such that

For each u,

Next, let M be the minimum of

Then

Next, we will prove the statement of the proposition. By Equation (26),

Until now, I have listed some of the properties I want in a measure of effect. The result is that only continuous, injective functions of the ratio have all of those properties. Among them I have yet to give a reason to prefer one over the rest. Since they all contain the same information, the only reason to prefer one over the others has to do with their mathematical properties. In particular, their mathematical properties that pertain to bias and variance―the two integral parts of a study that are related to the chosen measure of effect.

I would like the new measure of effect,

Next, let’s consider the issue of the variance. To begin, let Λ be a random variable of the estimates of the effect on a ratio scale of E_{0} (e_{1} vs. e_{0}) on D_{1} = d. That is, Λ is a random variable of the estimates of

Furthermore, the same variance on the ratio scale is intuitively interpreted as a smaller spread for larger effects. For example, a variance of 1 for an effect of 24 is considered a smaller variance than a variance of 1 for an effect of 1.5. That is illustrated in the practice of quantifying the spread on a ratio scale by dividing the variance by the estimated effect. Logarithms, which squish larger values closer together, would naturally have the variance exhibit such a property.

There is one bizarre practice regarding the variance that I haven’t mentioned. When estimating the effect on the ratio scale (

Lastly, how do we choose the constant c? Once again, consider the variance. I mentioned before that the variance on a ratio scale is sometimes divided by the estimated effect as a measure of spread. If the effect is estimated to be null that would just be the variance. Furthermore, if Λ has a very tight distribution about the null, then

Therefore,

Therefore, the natural logarithm of the ratio (or log ratio, for short) is the only measure of effect that satisfies all the nice mathematical properties I have outlined. It will take time to get used to describing effects on the log ratio scale as opposed to the ratio or any other scale. But for all of its advantages I think the effort is worthwhile.

And so we have reached the climax of the article. The change in tendency should be quantified by the natural logarithm of the ratio of likelihood. For example, the effect of _{1} vs. e_{0}) on

what is generally considered the effect of E on D, we would have to estimate the previous quantity for all causal contrasts e_{1} vs e_{0}, all values d, and all time intervals ∆t. That is the ultimate meaning of causal knowledge.

In practice, we can estimate the log likelihood ratio by relying on the assumption that the likelihood

where

To illustrate the log ratio scale, consider the effect of a binary variable E_{0} on all fives values of a discrete variable D_{1}. The top of _{1} is discrete. The bottom of _{0} (e_{1} vs. e_{0}) on D_{1} taking each of its values. I have included in the bottom of

D_{1} = d_{1} | D_{1} = d_{2} | D_{1} = d_{3} | D_{1} = d_{4} | D_{1} = d_{5} | |
---|---|---|---|---|---|

E_{0} = e_{1} | 0.16 | 0.29 | 0.35 | 0.12 | 0.08 |

E_{0} = e_{0} | 0.16 | 0.35 | 0.29 | 0.01 | 0.19 |

Log ratio | 0 | −0.19 | 0.19 | 2.485 | −0.86 |

Ratio | 1 | 0.83 | 1.21 | 12 | 0.42 |

A few things are worth noting about the effects in _{1} = d_{1} is denoted by the number 0 on the log ratio scale. The effect on D_{1} = d_{2} and D_{1} = d_{3} are reciprocals of each other on the ratio scale and additive inverses of each other on the log ratio scale. Furthermore, for effects close to the null the effect on the log ratio scale is approximately the effect on the ratio scale minus one. For example, the effect on D_{1} = d_{3 }on the log ratio scale is 0.19 ≈ 0.21 = 1.21 − 1, which is the effect on the ratio scale minus one. Finally, very large effects on the ratio scale are denoted by much smaller numbers on the log ratio scale. For example, the effect on D_{1} = d_{4} is 12 on the ratio scale, but only 2.485 on the log ratio scale.

“I would like...” and “I would prefer...” are how I began some of my sentences in this article. But why do my preferences matter? It’s not that my preferences matter, so much as that they detail how I propose to study science. To explain further, we must first answer: what is science?

I have yet to encounter a good definition of science, scientific knowledge, or scientific theories. A rudimentary “definition” tells us that science is the study of how the universe works, a statement that is clearer than most definitions but otherwise incredibly vague. What is the universe? What is meant by “how it works”? And what about it are we to study exactly?

The answers to the first two questions arise from axioms of science. I partially answered the third at the beginning of this article: I argued in favor of estimating an effect (a change in tendency). That is the knowledge in which we are interested. Well, at least I’m interested in it; you may not be. Still, the answer I supplied is vague. So I have filled this article with statements explaining more thoroughly what I want to study about the universe and how I think it should be studied. Those statements were indicated with the phrases “I would like...”, “I would prefer...”, and “should”. Now I would like to move on to the next subsection.

Several measures of frequency may be used, under certain circumstances, to approximate the probability ratio (i.e., the likelihood ratio for discrete outcomes), and therefore the log likelihood ratio as well [

For example, the odds ratio from a cohort study need not be viewed as a separate measure of effect, but rather as a probability ratio with a bias factor. Specifically, the odds ratio in a cohort study equals the probability ratio times a bias factor, which is the effect of the exposure on not having the disease. If E and D are binary variables with values denoted 0 and 1, then the effect of E (1 vs. 0) on having the disease (D = 1) on the odds ratio scale (OR) is related to the probability ratio (PR) of having the disease as follows:

Therefore, the log of the odds ratio equals the log likelihood ratio plus a bias term as follows:

The same bias term can be developed for the log odds ratio from a case-control study in which the exposure (E) effects selection (S) only through the disease (D). That is, under the causal structure EàDàS.

The tendency of interest includes conditioning on all causes of the outcome. But without an explicit definition of the word “cause”, that may be confusing. The word “cause” has multiple meanings. In Section 2, for example, I used “cause” to mean one thing in determinism and another in indeterminism. Within indeterminism, there may be two definitions of the word:

Definition 1: A cause of a natural variable _{ }such that the effect of _{1} vs. x_{0}) on

Definition 2: A cause of a natural variable _{0} < t_{1}.

Definition 1 specifies that a cause must have a non-null effect on the outcome, whereas Definition 2 omits that requirement. I prefer Definition 2 since it defines cause so as to differentiate between “having a null effect” and “having no effect”. Although the two phrases are often considered synonymous, they need not be. Note that “null” is the German word for zero, and in math a distinction is often made between “zero” and “no”. “No” generally refers to something not existing, whereas “zero” refers to something that exists and takes the value zero.

Consider the two variables _{0} < t_{1}. The effect of

Variables that have null effects are qualitatively different from variables that have no effect. We may estimate null effects, and in fact, they are very similar to effects that are near the null. As such, I do not want to use Definition 1 that lumps together variables with null effects and variables with no effects under the title “not a cause”. I suggest that within indeterminism Definition 2 be taken as the definition of the word “cause”.

Just like the word “cause”, probability has several definitions, which are related to its philosophical interpretations [

In indeterminism, tendencies are objective. Furthermore, they apply only to single cases. For example, consider tossing a coin repeatedly. The tendency of having the coin land on heads applies to each and every coin toss. That is, we may speak of the tendency of the first toss landing on heads, the third toss landing on heads, and the seventh toss landing on tails. Thus, our interpretation of probability must be objective and apply to single cases.

For such a philosophical interpretation, I prefer a view similar to Miller’s formulation of Popper’s propensity theory [

Gillies claims that such a philosophical interpretation does not allow for empirical estimation of probability, which is necessary to test scientific theories [

Even with such an approach, it may not be feasible to estimate effects from empirical data. That, however, is the consequence of a bias-centered approach. Conditioning on everything that produces even the smallest amount of bias may prevent the estimation of an effect, but removing bias is not the only matter of importance in a study. The goal of any study is to be as likely as possible to produce estimates that are as close as possible to the true causal parameter. As such, it is beneficial in some cases to allow bias in exchange for a “small enough” variance. In every study, researchers need to decide how to balance bias and variance. It is then possible to estimate effects while holding to a single-case propensity theory.

The above discussion clarifies that probability (or likelihood in general) may be interpreted so as to quantify the tendency of interest. The discussion, however, is incomplete without a philosophical definition of probability. In all of its definitions, probability is somehow related to proportions. In an objective interpretation, probability is related to the proportions in a sample that must be infinite in some sense. A few methods introduce an infinite sample as a finite sample becoming larger and larger. As far as I know, none successfully describes single-case probabilities.

I propose that probability be defined as a proportion in an infinite sample. In such a definition, I do not consider a finite sample getting larger but the infinite sample in and of itself. For example, a single-case probability of a particular coin toss landing heads is the proportion of heads in an infinite number of theoretical replications of that exact coin toss.

The proportion in an infinite sample (i.e., over infinite replications) cannot be calculated by simple division as is the case in finite samples. Rather, it will be taken as a primitive notion, a notion that is not defined but understood fundamentally. (Primitive notions are common and necessary. For example, “time point” is a primitive notion.)

Finally, the axioms of probability can be derived from some of the definitions of probability [

This article began with a thorough discussion of measures of effect from an indeterministic viewpoint. Under indeterminism, a measure of effect quantifies a change in tendency; that tendency is of a natural variable at a given time having a given value―conditional on all of its causes at a prior time having specified values. I then argued in favor of estimating effects on all values of the outcome.

Starting with Section 4, the majority of the article was devoted to finding an appropriate measure of effect based upon properties I wanted such a measure to possess. After considering all of those properties, I found only one acceptable measure of effect: the natural logarithm of the ratio of the likelihood at a time point. Although best among contenders, it is still just a helpful human-made computation on likelihoods, which I proposed to equate with tendencies. Under indeterminism, the causal structure contains only tendencies, not contrasts between tendencies (Subsection 3.2).

The question remains, though, whether there are other possible measures of effect besides the log likelihood ratio. The short answer is no; I think that one generic measure of effect should be used to quantify effects in all cases. The long answer is that there is still work to be done on the matter. If we do not accept axiomatically that the likelihood at a time point equals the tendency, then a rigorous definition of a measure of frequency must be given in order to consider all other measures of frequency. Lastly, all the rules of causal diagrams for the log likelihood ratio need to be developed in order to verify that a lucky cancellation on the log ratio scale cannot be explained by another measure of effect.

Shahar, D.J. (2016) Deciding on a Measure of Effect under Indeterminism. Open Journal of Epidemiology, 6, 198-232. http://dx.doi.org/10.4236/ojepi.2016.64022