Multiplying by \(a\) is a linear map in one dimension: \(h \mapsto ah\text{.}\) Namely, we think of \(a \in L(\R^1,\R^1)\text{,}\) which is the best linear approximation of how \(f\) changes near \(x\text{.}\) We use this interpretation to extend differentiation to more variables.
Definition8.3.1.
Let \(U \subset \R^n\) be open and \(f \colon U \to \R^m\) a function. We say \(f\) is differentiable at \(x \in U\) if there exists an \(A \in L(\R^n,\R^m)\) such that
We will show momentarily that \(A\text{,}\) if it exists, is unique. We write \(Df(x) \coloneqq A\text{,}\) or \(f'(x) \coloneqq A\text{,}\) and we say \(A\) is the derivative of \(f\) at \(x\text{.}\) When \(f\) is differentiable at every \(x \in U\text{,}\) we say simply that \(f\) is differentiable. See Figure 8.4 for an illustration.
For a differentiable function, the derivative of \(f\) is a function from \(U\) to \(L(\R^n,\R^m)\text{.}\) Compare to the one-dimensional case, where the derivative is a function from \(U\) to \(\R\text{,}\) but we really want to think of \(\R\) here as \(L(\R^1,\R^1)\text{.}\) As in one dimension, the idea is that a differentiable mapping is “infinitesimally close” to a linear mapping, and this linear mapping is the derivative.
Notice the norms in the definition. The norm in the numerator is on \(\R^m\text{,}\) and the norm in the denominator is on \(\R^n\) where \(h\) lives. Normally it is understood that \(h \in \R^n\) from context (the formula makes no sense otherwise). We will not explicitly say so from now on. Let us prove, as promised, that the derivative is unique.
Proposition8.3.2.
Let \(U \subset \R^n\) be an open subset and \(f \colon U \to \R^m\) a function. Suppose \(x \in U\) and there exist \(A,B \in L(\R^n,\R^m)\) such that
So \(\frac{\snorm{(A-B)h}}{\snorm{h}} \to 0\) as \(h \to 0\text{.}\) Given \(\epsilon > 0\text{,}\) for all nonzero \(h\) in some \(\delta\)-ball around the origin we have
For any given \(v \in \R^n\) with \(\snorm{v}=1\text{,}\) if \(h = (\nicefrac{\delta}{2}) \, v\text{,}\) then \(\snorm{h} < \delta\) and \(\frac{h}{\snorm{h}} = v\text{.}\) So \(\snorm{(A-B)v} < \epsilon\text{.}\) Taking the supremum over all \(v\) with \(\snorm{v} = 1\text{,}\) we get the operator norm \(\snorm{A-B} \leq \epsilon\text{.}\) As \(\epsilon > 0\) was arbitrary, \(\snorm{A-B} = 0\text{,}\) or in other words \(A = B\text{.}\)
Example8.3.3.
If \(f(x) = Ax\) for a linear mapping \(A\text{,}\) then \(f'(x) = A\text{:}\)
Let us show that \(f\) is differentiable at the origin and compute the derivative directly using the definition. If the derivative exists, it is in \(L(\R^2,\R^2)\text{,}\) so it can be represented by a \(2\)-by-\(2\) matrix \(\left[\begin{smallmatrix}a&b\\c&d\end{smallmatrix}\right]\text{.}\) Suppose \(h =
(h_1,h_2)\text{.}\) We need the following expression to go to zero.
This expression does indeed go to zero as \(h \to 0\text{.}\) The function \(f\) is differentiable at the origin and the derivative \(f'(0)\) is represented by the matrix \(\left[\begin{smallmatrix}1&2\\2&3\end{smallmatrix}\right]\text{.}\)
Proposition8.3.5.
Let \(U \subset \R^n\) be open and \(f \colon U \to \R^m\) be differentiable at \(p \in U\text{.}\) Then \(f\) is continuous at \(p\text{.}\)
Proof.
Another way to write the differentiability of \(f\) at \(p\) is to consider
\begin{equation*}
r(h) \coloneqq f(p+h)-f(p) - f'(p) h .
\end{equation*}
The function \(f\) is differentiable at \(p\) if \(\frac{\snorm{r(h)}}{\snorm{h}}\) goes to zero as \(h \to 0\text{,}\) so \(r(h)\) itself goes to zero. The mapping \(h \mapsto f'(p) h\) is a linear mapping between finite-dimensional spaces, hence continuous and \(f'(p) h \to 0\) as \(h \to 0\text{.}\) Thus, \(f(p+h)\) must go to \(f(p)\) as \(h \to 0\text{.}\) That is, \(f\) is continuous at \(p\text{.}\)
Differentiation is a linear operator on the space of differentiable functions.
Proposition8.3.6.
Suppose \(U \subset \R^n\) is open, \(f \colon U \to \R^m\) and \(g \colon U \to \R^m\) are differentiable at \(p \in U\text{,}\) and \(\alpha \in \R\text{.}\) Then the functions \(f+g\) and \(\alpha f\) are differentiable at \(p\text{,}\)
The limits as \(h\) goes to zero of the right-hand sides are zero by hypothesis. The result follows.
If \(A \in L(\R^n,\R^m)\) and \(B \in L(\R^m,\R^k)\) are linear maps, then they are their own derivative. The composition \(BA \in L(\R^n,\R^k)\) is also its own derivative, and so the derivative of the composition is the composition of the derivatives. As differentiable maps are “infinitesimally close” to linear maps, they have the same property:
Theorem8.3.7.Chain rule.
Let \(U \subset \R^n\) and \(V \subset \R^m\) be open sets, \(f \colon U \to
\R^m\) be differentiable at \(p \in U\text{,}\)\(f(U) \subset V\text{,}\) and let \(g \colon V \to \R^\ell\) be differentiable at \(f(p)\text{.}\) Then \(F \colon U \to \R^{\ell}\) defined by
Without the points where things are evaluated, we write \(F' = {(g \circ f)}' = g' f'\text{.}\) The derivative of the composition \(g \circ f\) is the composition of the derivatives of \(g\) and \(f\text{:}\) If \(f'(p) = A\) and \(g'\bigl(f(p)\bigr) = B\text{,}\) then \(F'(p) = BA\text{,}\) just as for linear maps.
Proof.
Let \(A \coloneqq f'(p)\) and \(B \coloneqq g'\bigl(f(p)\bigr)\text{.}\) Take a nonzero \(h \in \R^n\) and write \(q \coloneqq f(p)\text{,}\)\(k \coloneqq f(p+h)-f(p)\text{.}\) Let
\begin{equation*}
r(h) \coloneqq f(p+h)-f(p) - A h .
\end{equation*}
Then \(r(h) = k-Ah\) or \(Ah = k-r(h)\text{,}\) and \(f(p+h) = q+k\text{.}\) We look at the quantity we need to go to zero:
First, \(\snorm{B}\) is a constant and \(f\) is differentiable at \(p\text{,}\) so the term \(\snorm{B}\frac{\snorm{r(h)}}{\snorm{h}}\) goes to 0. Next, because \(f\) is continuous at \(p\text{,}\)\(k\) goes to 0 as \(h\) goes to 0. Thus \(\frac
{\snorm{g(q+k)-g(q) - Bk}}
{\snorm{k}}\) goes to 0, because \(g\) is differentiable at \(q\text{.}\) Finally,
As \(f\) is differentiable at \(p\text{,}\) for small enough \(h\text{,}\) the quantity \(\frac{\snorm{f(p+h)-f(p)-Ah}}{\snorm{h}}\) is bounded. Hence, the term \(\frac
{\snorm{f(p+h)-f(p)}}
{\snorm{h}}\) stays bounded as \(h\) goes to 0. Therefore, \(\frac{\snorm{F(p+h)-F(p) - BAh}}{\snorm{h}}\) goes to zero, and \(F'(p) = BA\text{,}\) which is what was claimed.
Subsection8.3.2Partial derivatives
There is another way to generalize the derivative from one dimension. We hold all but one variable constant and take the regular one-variable derivative.
Definition8.3.8.
Let \(f \colon U \to \R\) be a function on an open set \(U \subset \R^n\text{.}\) If the following limit exists, we write
We call \(\frac{\partial f}{\partial x_j} (x)\) the partial derivative of \(f\) with respect to \(x_j\text{.}\) See Figure 8.5. Here \(h\) is a number, not a vector.
For a mapping \(f \colon U \to \R^m\text{,}\) we write \(f = (f_1,f_2,\ldots,f_m)\text{,}\) where \(f_k\) are real-valued functions. We then take partial derivatives of the components, \(\frac{\partial f_k}{\partial x_j}\text{.}\)
Partial derivatives are easier to compute with all the machinery of calculus, and they provide a way to compute the derivative of a function.
Proposition8.3.9.
Let \(U \subset \R^n\) be open and let \(f \colon U \to \R^m\) be differentiable at \(p \in U\text{.}\) Then all the partial derivatives at \(p\) exist and, in terms of the standard bases of \(\R^n\) and \(\R^m\text{,}\)\(f'(p)\) is represented by the matrix
The limit is in \(\R^m\text{.}\) Represent \(f\) in components \(f = (f_1,f_2,\ldots,f_m)\text{.}\) Taking a limit in \(\R^m\) is the same as taking the limit in each component separately. So for every \(k\text{,}\) the partial derivative
exists and is equal to the \(k\)th component of \(f'(p)\, e_j\text{,}\) which is the \(j\)th column of \(f'(p)\text{,}\) and we are done.
The converse of the proposition is not true. Just because the partial derivatives exist, does not mean that the function is differentiable. See the exercises. However, when the partial derivatives are continuous, we will prove that the converse holds. One of the consequences of the proposition above is that if \(f\) is differentiable on \(U\text{,}\) then \(f' \colon U \to
L(\R^n,\R^m)\) is a continuous function if and only if all the \(\frac{\partial f_k}{\partial x_j}\) are continuous functions.
Subsection8.3.3Gradients, curves, and directional derivatives
Let \(U \subset \R^n\) be open and \(f \colon U \to \R\) a differentiable function. We define the gradient as
The gradient gives a way to represent the action of the derivative as a dot product: \(f'(x)\,v = \nabla f(x) \cdot v\text{.}\)
Suppose \(\gamma \colon (a,b) \subset \R \to \R^n\) is differentiable. Such a function and its image is sometimes called a curve, or a differentiable curve. Write \(\gamma =
(\gamma_1,\gamma_2,\ldots,\gamma_n)\text{.}\) For the purposes of computation, we identify \(L(\R^1)\) and \(\R\) as we did when we defined the derivative in one variable. We also identify \(L(\R^1,\R^n)\) with \(\R^n\text{.}\) We treat \(\gamma^{\:\prime}(t)\) both as an operator in \(L(\R^1,\R^n)\) and the vector \(\bigl(\gamma_1^{\:\prime}(t),
\gamma_2^{\:\prime}(t),\ldots,\gamma_n^{\:\prime}(t)\bigr)\) in \(\R^n\text{.}\) Using Proposition 8.3.9, if \(v\in \R^n\) is \(\gamma^{\:\prime}(t)\) acting as a vector, then \(h \mapsto h \, v\) (for \(h \in \R^1 = \R\)) is \(\gamma^{\:\prime}(t)\) acting as an operator in \(L(\R^1,\R^n)\text{.}\) We often use this slight abuse of notation when dealing with curves. The vector \(\gamma^{\:\prime}(t)\) is called a tangent vector. See Figure 8.6.
Suppose \(\gamma\bigl((a,b)\bigr) \subset U\) and let
For convenience, we often leave out the points where we are evaluating, such as above on the far right-hand side. With the notation of the gradient and the dot product the equation becomes
We use this idea to define derivatives in a specific direction. A direction is simply a vector pointing in that direction. Pick a vector \(u \in \R^n\) such that \(\snorm{u} = 1\text{,}\) and fix \(x \in U\text{.}\) We define the directional derivative as
where the notation \(\frac{d}{dt}\big|_{t=0}\) represents the derivative evaluated at \(t=0\text{.}\) When \(u=e_j\) is a standard basis vector, we find \(\frac{\partial f}{\partial x_j} = D_{e_j} f\text{.}\) For this reason, sometimes the notation \(\frac{\partial f}{\partial u}\) is used instead of \(D_u f\text{.}\)
Define \(\gamma\) by
\begin{equation*}
\gamma(t) \coloneqq x + tu .
\end{equation*}
Then \(\gamma^{\:\prime}(t) = u\) for all \(t\text{.}\) Let us see what happens to \(f\) when we travel along \(\gamma\text{:}\)
Equality is achieved when \(u\) is a scalar multiple of \((\nabla f)(x)\text{.}\) That is, when
\begin{equation*}
u =
\frac{(\nabla f)(x)}{\snorm{(\nabla f)(x)}} ,
\end{equation*}
we get \(D_u f(x) = \snorm{(\nabla f)(x)}\text{.}\) The gradient points in the direction in which the function grows fastest, in other words, in the direction in which \(D_u f(x)\) is maximal.
Subsection8.3.4The Jacobian
Definition8.3.10.
Let \(U \subset \R^n\) and \(f \colon U \to \R^n\) be a differentiable mapping. Define the Jacobian determinant 1
This last piece of notation may seem somewhat confusing, but it is quite useful when we need to specify the exact variables and function components used, as we will do, for example, in the implicit function theorem.
The Jacobian determinant \(J_f\) is a real-valued function, and when \(n=1\) it is simply the derivative. From the chain rule and the fact that \(\det(AB) = \det(A)\det(B)\text{,}\) it follows that:
The determinant of a linear mapping tells us what happens to area/volume under the mapping. Similarly, the Jacobian determinant measures how much a differentiable mapping stretches things locally, and if it flips orientation. In particular, if the Jacobian determinant is non-zero than we would assume that locally the mapping is invertible (and we would be correct as we will later see).
Exercises8.3.5Exercises
8.3.1.
Suppose \(\gamma \colon (-1,1) \to \R^n\) and \(\alpha \colon (-1,1) \to \R^n\) are two differentiable curves such that \(\gamma(0) = \alpha(0)\) and \(\gamma^{\:\prime}(0) = \alpha'(0)\text{.}\) Suppose \(F \colon \R^n \to \R\) is a differentiable function. Show that
Let \(f \colon \R^2 \to \R\) be given by \(f(x,y)
\coloneqq
\sqrt{x^2+y^2}\text{,}\) see Figure 8.7. Show that \(f\) is not differentiable at the origin.
8.3.3.
Using only the definition of the derivative, show that the following \(f \colon \R^2 \to \R^2\) are differentiable at the origin and find their derivative.
Suppose \(f \colon \R \to \R\) and \(g \colon \R \to \R\) are differentiable functions. Using only the definition of the derivative, show that \(h \colon \R^2 \to \R^2\) defined by \(h(x,y)
\coloneqq \bigl(f(x),g(y)\bigr)\) is a differentiable function, and find the derivative, at all points \((x,y)\text{.}\)
8.3.5.
Define a function \(f \colon \R^2 \to \R\) by (see Figure 8.8)
Show that the partial derivatives \(\frac{\partial f}{\partial x}\) and \(\frac{\partial f}{\partial y}\) exist at all points.
Show that for all \(u \in \R^2\) with \(\snorm{u}=1\text{,}\) the directional derivative \(D_u f\) exists at all points.
Show that \(f\) is continuous at the origin.
Show that \(f\) is not differentiable at the origin.
8.3.7.
Suppose \(f \colon \R^n \to \R^n\) is one-to-one, onto, differentiable at all points, and such that \(f^{-1}\) is also differentiable at all points.
Show that \(f'(p)\) is invertible at all points \(p\) and compute \({(f^{-1})}'\bigl(f(p)\bigr)\text{.}\) Hint: Consider \(x = f^{-1}\bigl(f(x)\bigr)\text{.}\)
Let \(g \colon \R^n \to \R^n\) be a function differentiable at \(q \in \R^n\) and such that \(g(q)=q\text{.}\) Suppose \(f(p) = q\) for some \(p \in \R^n\text{.}\) Show \(J_g(q) = J_{f^{-1} \circ g \circ f}(p)\) where \(J_g\) is the Jacobian determinant.
8.3.8.
Suppose \(f \colon \R^2 \to \R\) is differentiable and such that \(f(x,y) = 0\) if and only if \(y=0\) and such that \(\nabla f(0,0) = (0,1)\text{.}\) Prove that \(f(x,y) > 0\) whenever \(y > 0\text{,}\) and \(f(x,y) < 0\) whenever \(y < 0\text{.}\)
As for functions of one variable, \(f \colon U \to \R\) has a relative maximum at \(p \in U\) if there exists a \(\delta >0\) such that \(f(q) \leq f(p)\) for all \(q \in B(p,\delta) \cap U\text{.}\) Similarly for relative minimum.
8.3.9.
Suppose \(U \subset \R^n\) is open and \(f \colon U \to \R\) is differentiable. Suppose \(f\) has a relative maximum at \(p \in U\text{.}\) Show that \(f'(p) = 0\text{,}\) that is, the zero mapping in \(L(\R^n,\R)\text{.}\) Namely, \(p\) is a critical point of \(f\text{.}\)
8.3.10.
Suppose \(f \colon \R^2 \to \R\) is differentiable and \(f(x,y) = 0\) whenever \(x^2+y^2 = 1\text{.}\) Prove that there exists at least one point \((x_0,y_0)\) such that \(\frac{\partial f}{\partial x}(x_0,y_0) = \frac{\partial f}{\partial
y}(x_0,y_0) = 0\text{.}\)
8.3.11.
Define \(f(x,y) \coloneqq ( x-y^2 ) ( 2 y^2 - x)\text{.}\) The graph of \(f\) is called the Peano surface. 3
Named after the Italian mathematician Giuseppe Peano (1858–1932).
Show that \((0,0)\) is a critical point, that is \(f'(0,0) = 0\text{,}\) that is the zero linear map in \(L(\R^2,\R)\text{.}\)
Show that for every direction the restriction of \(f\) to a line through the origin in that direction has a relative maximum at the origin. In other words, for every \((x,y)\) such that \(x^2+y^2=1\text{,}\) the function \(g(t) \coloneqq f(tx,ty)\text{,}\) has a relative maximum at \(t=0\text{.}\) Hint: While not necessary Section 4.3 makes this part easier.
Show that \(f\) does not have a relative maximum at \((0,0)\text{.}\)
8.3.12.
Suppose \(f \colon \R \to \R^n\) is differentiable and \(\snorm{f(t)} = 1\) for all \(t\) (that is, we have a curve in the unit sphere). Show that \(f'(t) \cdot f(t) = 0\) (treating \(f'(t)\) as a vector) for all \(t\text{.}\)
8.3.13.
Define \(f \colon \R^2 \to \R^2\) by \(f(x,y) \coloneqq
\bigl(x,y+\varphi(x)\bigr)\) for some differentiable function \(\varphi\) of one variable. Show \(f\) is differentiable and find \(f'\text{.}\)
8.3.14.
Suppose \(U \subset \R^n\) is open, \(p \in U\text{,}\) and \(f \colon U \to \R\text{,}\)\(g \colon U \to \R\text{,}\)\(h \colon U \to \R\) are functions such that \(f(p) = g(p) = h(p)\text{,}\)\(f\) and \(h\) are differentiable at \(p\text{,}\)\(f'(p) = h'(p)\text{,}\) and
\begin{equation*}
f(x) \leq g(x) \leq h(x) \qquad \text{for all } x \in U
\end{equation*}
Show that \(g\) is differentiable at \(p\) and \(g'(p) = f'(p) = h'(p)\text{.}\)
8.3.15.
Prove a version of mean value theorem for functions of several variables. That is, suppose \(U \subset \R^n\) is open, \(f \colon U \to \R\) differentiable, \(p,q \in U\text{,}\) and the segment \([p,q] \in U\text{.}\) Prove that there exists an \(x \in [p,q]\) such that \(\nabla f (x) \cdot (q-p) = f(q)-f(p)\text{.}\)
For a higher quality printout use the PDF versions: https://www.jirka.org/ra/realanal.pdf or https://www.jirka.org/ra/realanal2.pdf