The Partial Jacobian Matrix and the Chain Rule for Functions on Products of Real n-Spaces

Amod Agashe

doi:10.4236/am.2025.164022

Applied Mathematics > Vol.16 No.4, April 2025

The Partial Jacobian Matrix and the Chain Rule for Functions on Products of Real n-Spaces

Amod Agashe
Department of Mathematics, Florida State University, Tallahassee, FL, USA.
DOI: 10.4236/am.2025.164022 PDF HTML XML 20 Downloads 83 Views

Abstract

The usual chain rule is for compositions of functions from $R^{n}$ to $R^{m}$ , for some $m$ and $n$ . We consider functions whose domains and codomains are products of various $R^{n}$ ’s, and introduce the notion of a partial Jacobian matrix, using which we formulate a chain rule in this setting that should be useful for calculations as well as for some theoretical purposes.

Keywords

Chain Rule, Partial Derivative, Jacobian Matrix, Partial Jacobian Matrix, Normed Vector Spaces, Manifolds

Share and Cite:

Agashe, A. (2025) The Partial Jacobian Matrix and the Chain Rule for Functions on Products of Real n-Spaces. Applied Mathematics, 16, 420-427. doi: 10.4236/am.2025.164022.

1. Introduction

The goal of this article is to fill some potential gaps in the literature and introduce some notation that should be useful in applications. To motivate the main result, which may look abstract, we consider a concrete example. The reader who just wants to see and use the main result of this paper may jump directly to Section 3.

Let $f : R^{2} \to R^{2}$ be given by $x (u, t) = u + t$ , $y (u, t) = u - t$ , and $g : R^{2} \to R$ be given by $z = x^{2} + y$ . Thinking of $z$ as a function of $(u, t)$ via $f$ , if one wants to find the rate of change of $z$ with respect to $u$ at the point $(1, 1)$ , we can use the chain rule

$\frac{\partial z}{\partial u} = \frac{\partial z}{\partial x} \frac{\partial x}{\partial u} + \frac{\partial z}{\partial y} \frac{\partial y}{\partial u},$ (1)

to get $\frac{\partial z}{\partial u} = 2 x \cdot 1 + 1 \cdot 1 = 2 (u + t) + 1 = 2 u + 2 t + 1$ , which evaluated at $(1, 1)$ gives the answer 5. Note that found the answer without computing the function $g \circ f$ , which is ${(u + t)}^{2} + (u - t)$ , which we could have differentiated with respect to $u$ to get the same answer. The chain rule is especially useful if one wants to find the rate of change of $z$ with respect to $u$ , knowing the other rates of change in Equation (1); this happens often in real life situations, where one may not know the functions $f$ and $g$ explicity, but knows the instantaneous rates of change on the right side of Equation (1). Such problems are called related rates problems.

Now suppose we are given functions $f : R^{2} \times R^{3} \to R^{4} \times R^{5}$ and $g : R^{4} \times R^{5} \to R$ . We can compose the two to get the function $g \circ f : R^{2} \times R^{3} \to R$ . Let $z$ denote the coordinate of the codomain $R$ , and suppose we want to find the directional derivative of $z$ relative to a change in the coordinates of the $R^{2}$ in the domain of $g \circ f$ (keeping the coordinates of $R^{3}$ fixed) at some given point in the domain $R^{2} \times R^{3}$ . We could, of course, write down the function $g \circ f$ in terms of the coordinates of $R^{2} \times R^{3}$ and then take partial derivatives, etc., but that is more work than needed. In this article, we derive a chain rule (formula (8) below) where the calculation can be done in a manner similar to the chain rule in the previous paragraph, thus making the calculation much easier (see Remark 3.3). Note that in the example above, we did not give formulas to define $f$ and $g$ since the number of variables is too much; the reader can easily make up an explicit example. Our chain rule is also useful for related rate problems in this context, either for computations or for theoretical use. In fact, the reason for the author to write up this article was that a colleague in the Economics department asked for a chain rule in a context similar to the one above, and neither of us could find it in the form that was needed in the literature. So it is hoped that this article fills a potential gap in the literature, and can be quoted for the sake of completeness in papers that need this version of the chain rule.

The organization of the rest of this article is as follows. In Section 2, we recall the classical setting alluded to two paragraphs above, and introduce a notation for the Jacobian matrix that may be new, and is very convenient. In Section 3, we consider a more general setting of the type discussed in the previous paragraph, and give the more general chain rule alluded to above, which involves a new object that we call a partial Jacobian matrix. Finally, in Section 4, we give the proof of our generalization of the chain rule.

2. The Classical Setting

Suppose $f : R^{n} \to R^{m}$ is differentiable. Let $x$ denote the vector of coordinate functions on the domain (so $x_{1}, \dots, x_{n}$ , the components of $x$ , are the coordinates of $R^{n}$ ) and let $y$ denote the vector of coordinate functions on the codomain. A standard notation for the Jacobian matrix of $f$ is $\frac{\partial (f_{1}, \dots, f_{m})}{\partial (x_{1}, \dots, x_{n})}$ and $J_{f}$ ; let us introduce another notation:

Definition 2.1. Let

$\frac{d y}{d x}$

denote the Jacobian matrix of $f$ .

The notation above is clearly related to the earlier notation (our notation may not be new, but we have not seen it before), and also makes notational sense: the

$(j, i)$ -th entry of $\frac{d y}{d x}$ is $\frac{\partial y_{j}}{\partial x_{i}}$ .

Let $g : R^{m} \to R^{k}$ be differentiable, and let $z$ denote the vector of coordinate functions on the codomain. Letting $D$ denote the derivative as usual, the chain rule says that for $a \in R^{n}$ , we have

$D (g \circ f) (a) = D g (f (a)) \circ D f (a) .$ (2)

This looks nice in theory and is very convenient for theoretical purposes, but has to be unraveled in applications and may be too abstract for people working in applied areas. A more concrete restatement is the matrix version:

$J_{g \circ f} (a) = J_{g} (f (a)) J_{f} (a),$ (3)

where the product on the right is matrix multiplication. Equation (3) still has to be unraveled, as we shall do soon, but first note that in the notation of Definition 2.1, the equation becomes

$\frac{d z}{d x} = \frac{d z}{d y} \frac{d y}{d x},$ (4)

where again the product on the right is the matrix product. The rule looks much nicer with the new notation, and moreover, if $n = m = k = 1$ , then it becomes the usual chain rule in the Leibnitz notation, namely

$\frac{d z}{d x} = \frac{d z}{d y} \frac{d y}{d x} .$

As usual, for $j = 1, \dots, m$ , let $f_{j}$ denote the $j$ -th component function of $f$ ; thus $f = (f_{1}, \dots, f_{m})$ . Recall that for $i = 1, \dots, n$ , and for $a \in R^{n}$ , the symbol $D_{i} f_{j} (a)$ denotes the partial deriviative of $f_{j}$ at $a$ with respect to the $i$ -th coordinate; thus $D_{i} f_{j} (a)$ is $\frac{\partial y_{j}}{\partial x_{i}}$ evaluated at $a$ . The following version of the chain rule is very explicit; it is well known, and the only reason that we are stating it as a result is that it does not seem to be easy to find the statement in the literature (for example, in ([1], Theorem~2.9) and ([2], pp. 61-62), it is only stated for $k = 1$ , and in [3], it is only mentioned in the proof of Proposition 3.10).

Proposition 2.2. For $i = 1, \dots, n$ , and $ℓ = 1, \dots, k$ , we have for $a \in R^{n}$ ,

$D_{i} {(g \circ f)}_{ℓ} (a) = \sum_{j = 1}^{m} D_{j} g_{ℓ} (f (a)) D_{i} f_{j} (a)$ , and(5)

$\frac{\partial z_{ℓ}}{\partial x_{i}} = \sum_{j = 1}^{m} \frac{\partial z_{ℓ}}{\partial y_{j}} \frac{\partial y_{j}}{\partial x_{i}} .$ (6)

Proof. This follows from Equation (3), the definition of the Jacobian matrix, and the definition of matrix multiplication. □

Equation (6) above may be the most useful version of the chain rule in applications.

3. A More General Setting

For $i = 1, \dots, n$ and $j = 1, \dots, m$ , let $X_{i}$ and $Y_{j}$ each be $R^{p}$ for some $p$ (so for example, we could have $X_{1} = R^{2}$ , $Y_{2} = R^{5}$ , etc.), and let $u_{i}$ and $v_{j}$ denote the dimensions of $X_{i}$ and $Y_{j}$ respectively (so $X_{i} = R^{u_{i}}$ and $Y_{j} = R^{v_{j}}$ ). Let $f : X_{1} \times \dots \times X_{n} \to Y_{1} \times \dots \times Y_{m}$ be differentiable (regarding the domain and the codomain as finite dimensional vector spaces in the natural way). Note that if each of $X_{i}$ and $Y_{j}$ are $R$ , then $X_{1} \times \dots \times X_{n} = R^{n}$ and $Y_{1} \times \dots \times Y_{m} = R^{m}$ as vector spaces, so we are in the classical setting of Section 2; thus our setup generalizes the classical setting. The function $f$ above can be thought of as a function whose inputs and outputs, instead of being tuples of numbers, are tuples of vectors (not necessarily in the same vector space). Such functions arise often in applied areas such as Economics and Engineering.

For $i = 1, \dots, n$ and $j = 1, \dots, m$ , let $x_{i}$ denote the vector of coordinate functions on $X_{i}$ (so $x_{i, 1}, \dots, x_{i, u_{i}}$ , the components of $x_{i}$ , are the coordinates of $X_{i}$ ), and let $y_{j}$ denote the vector of coordinate functions on $Y_{j}$ .

Definition 3.1. For each $i, j$ , denote by

$\frac{\partial y_{j}}{\partial x_{i}}$

the matrix whose $(r, s)$ -th entry for $s = 1, \dots, u_{i}$ and $r = 1, \dots, v_{j}$ is $\frac{\partial y_{r}}{\partial x_{s}}$ ; we call it a partial Jacobian matrix.

The reason for this notation and terminology is that while this object looks like the Jacobian matrix in the classical setting of Section 2, which might suggest the notation $\frac{d y_{j}}{d x_{i}}$ that was used for the Jacobian matrix therein, it plays the role of a partial dervative in the chain rule analog of (6), as we shall soon see (Equation (8) below).

For $ℓ = 1, \dots, k$ , let $Z_{ℓ}$ each be $R^{p}$ for some $p$ , and let $z_{ℓ}$ denote the vector of coordinate functions on $Z_{ℓ}$ . Let $g : Y_{1} \times \dots \times Y_{m} \to Z_{1} \times \dots \times Z_{k}$ be differentiable. Thinking of each domain and codomain as real vector spaces in the natural way and letting $D$ denote the derivative as usual, the chain rule again says that for $a$ in the domain of $f$ ,

$D (g \circ f) (a) = D g (f (a)) \circ D f (a) .$ (7)

This equation is mostly of theoretical use, and is difficult to use in practice. Also, we are not distinguishing between the components of each domain (e.g., the components $X_{1}, \dots, X_{n}$ of the domain of $f$ ) in any way or treating an individual component special. A similar criticism holds for the analogs of equations (3) and (4), of which Equation (3) holds verbatim even in this situation.

We have the following version of the chain rule, which we prove in the next section:

Theorem 3.2. In the setup above, for $ℓ = 1, \dots, k$ , and $i = 1, \dots, n$ , we have

$\frac{\partial z_{ℓ}}{\partial x_{i}} = \sum_{j = 1}^{m} \frac{\partial z_{ℓ}}{\partial y_{j}} \frac{\partial y_{j}}{\partial x_{i}} .$ (8)

The equation above distinguishes between the components of each domain, unlike Equation (7) above, and is the analog of Equation (6) from the classical setting, which it agrees with when each of $X_{i}$ , $Y_{j}$ , and $Z_{ℓ}$ is $R$ . Equation (8) may be the most concrete form of the chain rule in this general setting, and should be especially useful for calculations, but also for some theoretical purposes.

Remark 3.3. As an example of an application, if $u \in X_{i}$ is a unit vector, then the directional derivative¹ of the composite map $X_{i} \to X_{1} \times \dots \times X_{n} \overset{g \circ f}{\to} Z_{1} \times \dots \times Z_{k} \to Z_{ℓ}$ (where the first map is the natural inclusion, and the last map is the natural projection) at a point $a \in X_{1} \times \dots \times X_{n}$ in the direction of $u$ is

$\frac{\partial z_{ℓ}}{\partial x_{i}} \cdot u = (\sum_{j = 1}^{m} \frac{\partial z_{ℓ}}{\partial y_{j}} \frac{\partial y_{j}}{\partial x_{i}}) \cdot u = \sum_{j = 1}^{m} (\frac{\partial z_{ℓ}}{\partial y_{j}} \frac{\partial y_{j}}{\partial x_{i}} \cdot u)$

evaluated at $a$ (this follows from Theorem 3.2 and ([2], Theorem~5.1), for example).

Next we give a more theoretical version of Theorem 3.2, for which we need a generalization of the usual partial derivative.

Definition 3.4. For each $j$ , let $f_{j}$ denote the composite function

$X_{1} \times \dots \times X_{n} \overset{f}{\to} Y_{1} \times \dots \times Y_{m} \to Y_{j},$

where the second map is the natural projection. Let $a$ be an element of $X_{1} \times \dots \times X_{n}$ . For each $i$ , let $ι_{a, i}$ denote the map $X_{i} \to X_{1} \times \dots \times X_{n}$ that takes $x \in X_{i}$ to the element of $X_{1} \times \dots \times X_{n}$ whose $p$ -th component is the $p$ -th component of $a$ , except for $p = i$ , when it is $x$ . Denote by $D_{i} f_{j} (a)$ the derivative at the $i$ -th component of $a$ of the composite map

$X_{i} \overset{ι_{a, i}}{\to} X_{1} \times \dots \times X_{n} \overset{f}{\to} Y_{1} \times \dots \times Y_{m} \to Y_{j},$

where the last map is the natural projection; we call it the $i$ -th partial derivative of $f_{j}$ at $a$ .

An alternate definition, which is perhaps more conceptual, is given after Definition 4.1 in the next section. Note that the matrix of $D_{i} f_{j} (a)$ in the standard bases is $\frac{\partial y_{j}}{\partial x_{i}}$ evaluated at $a$ . If each of $X_{i}$ and $Y_{j}$ are $R$ , then it is the usual partial derivative $D_{i} f_{j} (a)$ ; we remark that even in this situation, our conceptual definition above (or the equivalent one in the next section) in terms of maps seems to be missing in the literature (e.g., it is not there in [1] [2], and [3]), and could be useful for theoretical purposes (e.g., it is immediate that the partial derivative of a sum is the sum of the partial derivatives, since the partial derivative inherits that property of the derivative, from our definition). Note that in what follows, the definitions of $D_{j} g_{ℓ}$ and $D_{i} {(g \circ f)}_{ℓ}$ are analogous to that of $D_{i} f_{j}$ .

Theorem 3.5. In the setup above, for $ℓ = 1, \dots, k$ , and $i = 1, \dots, n$ , we have for $a$ in $X_{1} \times \dots \times X_{n}$ ,

$D_{i} {(g \circ f)}_{ℓ} (a) = \sum_{j = 1}^{m} D_{j} g_{ℓ} (f (a)) \circ D_{i} f_{j} (a) .$ (9)

The equation above also distinguishes between the components of each domain, unlike Equation (7) above, and is the analog of Equation (5) from the classical setting, which it agrees with when each of $X_{i}$ , $Y_{j}$ , and $Z_{ℓ}$ is $R$ .

Remark 3.6. As mentioned in the second proof of Theorem 3.5 in the next section, the proof does not use the fact that the $X_{i}$ , $Y_{j}$ , and $Z_{k}$ are finite dimensional vector spaces over the real numbers, just that they are normed vectors spaces, so in particular Theorem 3.5 applies even if any of them are normed vector spaces of infinite dimension and/or over the complex numbers (over the complex numbers, the derivative is in the Fréchet sense, for example, as in ([4], Section VIII.1)).

Remark 3.7. Recall that given a smooth map $f : X \to Y$ between two smooth manifolds, the derivative induces a map $d f : T X \to T Y$ between the tangent bundles of the manifolds. The classical chain rule translates to a chain rule for such maps. Similarly our chain rule (9) applies to maps between products of manifolds, and gives a chain rule for products of manifolds, which we now state more precisely. For each $i$ , $j$ , $ℓ$ as above, let $X_{i}$ , $Y_{j}$ , and $Z_{ℓ}$ be manifolds² and let $f : X_{1} \times \dots \times X_{n} \to Y_{1} \times \dots \times Y_{m}$ and $g : Y_{1} \times \dots \times Y_{m} \to Z_{1} \times \dots \times Z_{k}$ be morphisms. Then the classical chain rule says that $d (g \circ f) = d g \circ d f$ . Now let $d_{i} f_{j}$ denote the map induced by $X_{i} \to X_{1} \times \dots \times X_{n} \overset{f}{\to} Y_{1} \times \dots \times Y_{m} \to Y_{j}$ (where the first map is the natural inclusion and the second map is the natural projection) on the corresponding tangent bundles; similar definitions apply to $d_{j} g_{ℓ}$ and $d_{i} {(g \circ f)}_{ℓ}$ . Then our discussion above gives the following application and analog of Theorem 3.5: $d_{i} {(g \circ f)}_{ℓ} = \sum_{j = 1}^{m} d_{j} g_{ℓ} \circ d_{i} f_{j}$ . In fact, by the previous remark, this will apply to certain infinite-dimensional manifolds, in particular to Banach manifolds.

4. Proofs of Theorems 3.2 and 3.5

We continue using the notation of the previous section. First note that Theorem 3.2 and Theorem 3.5 follow from each other by considering the basis of a linear transformation in the standard basis. We now prove Theorem 3.2; later, we shall give an independent proof of Theorem 3.5.

Let $p$ be an indexing variable for the entries of $z_{ℓ}$ and $q$ an indexing variable for the entries of $x_{i}$ . By definition, the $(p, q)$ -th entry in the matrix on the left side of Equation (8) is $\frac{\partial z_{ℓ, p}}{\partial x_{i, q}}$ . Now $z_{ℓ, p}$ is a function of $x_{i, q}$ via the inter-mediate variables $y_{j, r}$ for $j = 1, \dots, m$ and $r = 1, \dots, v_{j}$ for each $j$ . So by the classical version (6) of the chain rule, we see that the left side of Equation (8) is

$\sum_{j = 1}^{m} \sum_{r = 1}^{v_{j}} \frac{\partial z_{ℓ, p}}{\partial y_{j, r}} \frac{\partial y_{j, r}}{\partial x_{i, q}} .$ (10)

The $(p, q)$ -th entry of the term $\frac{\partial z_{ℓ}}{\partial y_{j}} \frac{\partial y_{j}}{\partial x_{i}}$ on the right side of Equation (8) is the product of the $p$ -th row of $\frac{\partial z_{ℓ}}{\partial y_{j}}$ and the $q$ -th column of $\frac{\partial y_{j}}{\partial x_{i}}$ . Now the $p$ -th row of $\frac{\partial z_{ℓ}}{\partial y_{j}}$ is

$[\begin{array}{l} \frac{\partial z_{ℓ, p}}{\partial y_{j, 1}} \dots \frac{\partial z_{ℓ, p}}{\partial y_{j, v_{j}}} \end{array}]$

and the $q$ -th column of $\frac{\partial y_{j}}{\partial x_{i}}$ is

$[\begin{matrix} \frac{\partial y_{j, 1}}{\partial x_{i, q}} \\ ⋮ \\ \frac{\partial y_{j, v_{j}}}{\partial x_{i, q}} \end{matrix}] .$

So the product of the row and column becomes

$\sum_{r = 1}^{v_{j}} \frac{\partial z_{ℓ, p}}{\partial y_{j, r}} \frac{\partial y_{j, r}}{\partial x_{i, q}} .$

As we sum these terms over $j$ on the right side of Equation (8), we get the expression (10), which recall is the left side of Equation (8). This proves that the two sides of Equation (8) are equal, which proves Theorem 3.2, and thus Theorem 3.5 as well.

We now give a second proof of Theorem 3.5, which gives another proof of Theorem 3.2 as well. In fact, the proof of Theorem 3.5 that we give below does not use the fact that $X_{i}$ , $Y_{j}$ , and $Z_{ℓ}$ are finite dimensional, and so applies to normed vector spaces (even over the complex numbers) that need not be finite dimensional. Another reason for giving a second proof of Theorem 3.5 is to note how the second proof is conceptual and illustrates where the linearity of the derivative is used, while the first proof is computational and hides some of the ideas of the proof.

The following definitions will be convenient in the proof.

Definition 4.1. If $h \in X_{i}$ , then let $h^{i}$ denote the vector in $X_{1} \times \dots \times X_{n}$ whose components are all 0 except the $i$ -component, which is $h$ ; so $h^{i}$ is the image of $h$ under the natural inclusion $X_{i} \to X_{1} \times \dots \times X_{n}$ , and $h = \sum_{i} h^{i}$ . Let $a \in X_{1} \times \dots \times X_{n}$ . If $W$ is a vector space and $ψ : X_{1} \times \dots \times X_{n} \to W$ is a linear map, then define $D_{i} ψ (a) : X_{1} \to W$ by

$D_{i} ψ (a) (h) = D ψ (a) (h^{i});$ (11)

it is a linear map.

Note that the definition of $D_{i} ψ (a)$ above is consistent with Definition 3.4: when $W = Y_{j}$ and $ψ = f_{j}$ , we get $D_{i} ψ (a) = D_{i} f_{j} (a)$ .

Let $h \in X_{i}$ . Then

$\begin{matrix} D_{i} {(g \circ f)}_{ℓ} (a) (h) = D {(g \circ f)}_{ℓ} (a) (h^{i}) (by equation (11)) \\ = (D g_{ℓ} (f (a)) \circ D f (a)) (h^{i}) (by the usual chain rule (7)) \\ = D g_{ℓ} (f (a)) (D f (a) (h^{i})) \end{matrix}$ (12)

Now the $j$ -th component of $D f (a) (h^{i})$ is $D f_{j} (a) (h^{i})$ , so

$D f (a) (h^{i}) = \sum_{j} {(D f_{j} (a) (h^{i}))}^{j}$ (by the first sentence in Definition 4.1)

Putting this in Equation (12) above, we get

$\begin{array}{l} D_{i} {(g \circ f)}_{ℓ} (a) (h) = D g_{ℓ} (f (a)) (\sum_{j} {(D f_{j} (a) (h^{i}))}^{j}) \\ = \sum_{j} D g_{ℓ} (f (a)) {(D f_{j} (a) (h^{i}))}^{j} (since the derivative is linear) \\ = \sum_{j} D_{j} g_{ℓ} (f (a)) (D f_{j} (a) (h^{i})) (by equation (11)) \\ = \sum_{j} D_{j} g_{ℓ} (f (a)) (D_{i} f_{j} (a) (h)) (by equation (11) again) \\ = \sum_{j} (D_{j} g_{ℓ} (f (a)) \circ D_{i} f_{j} (a)) (h) \\ = (\sum_{j} D_{j} g_{ℓ} (f (a)) \circ D_{i} f_{j} (a)) (h) \end{array}$

Since $h$ was arbitrary, this shows that $D_{i} {(g \circ f)}_{ℓ} (a) = \sum_{j} D_{j} g_{ℓ} (f (a)) \circ D_{i} f_{j} (a)$ , as needed.

Acknowledgements

We are grateful to R. Vijay Krishna for several useful suggestions on this article.

NOTES

¹As defined on page 42 in ([2], Section 5), for example.

²We are not using $X_{i}$ , etc., to denote the manifolds, since the $X_{i}$ ’s correspond to local charts on the manifolds.

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper.

References

[1]	Spivak, M. (1965) Calculus on Manifolds. A Modern Approach to Classical Theorems of Advanced Calculus. W. A. Benjamin, Inc.
[2]	Munkres, J.R. (1991) Analysis on Manifolds. Addison-Wesley Publishing Company, Advanced Book Program.
[3]	Coleman, R. (2012) Calculus on Normed Vector Spaces. Universitext, Springer.
[4]	Dieudonné, J. (1969) Foundations of Modern Analysis, Vol. 10-I of Pure and Applied Mathematics. Academic Press.

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies