Skip to main content
Logo image

Section 8.5 Inverse and implicit function theorems

Note: 2–3 lectures
Intuitively, if a function is continuously differentiable, then it locally “behaves like” the derivative (which is a linear function). The idea of the inverse function theorem is that if a function is continuously differentiable and the derivative is invertible, the function is (locally) invertible.

Figure 8.11. Setup of the inverse function theorem in Rn.

To prove the theorem, we use the contraction mapping principle from Chapter 7, where we used it to prove Picard’s theorem. Recall that a mapping f:XY between metric spaces (X,dX) and (Y,dY) is a contraction if there exists a k<1 such that
dY(f(p),f(q))kdX(p,q)for all p,qX.
The contraction mapping principle says that if f:XX is a contraction and X is a complete metric space, then there exists a unique fixed point, that is, there exists a unique xX such that f(x)=x.

Proof.

Write A=f(p). As f is continuous, there is an open ball V centered at p such that
Af(x)<12A1for all xV.
Consequently, the derivative f(x) is invertible for all xV by Proposition 8.2.6.
Given yRn, define φy:VRn by
φy(x):=x+A1(yf(x)).
As A1 is one-to-one, φy(x)=x (x is a fixed point) if only if yf(x)=0, or in other words f(x)=y. Using the chain rule we obtain
φy(x)=IA1f(x)=A1(Af(x)).
So for xV, we have
φy(x)A1Af(x)<1/2.
As V is a ball, it is convex. Hence
φy(x1)φy(x2)12x1x2for all x1,x2V.
In other words, φy is a contraction defined on V, though we so far do not know what is the range of φy. We cannot yet apply the fixed point theorem, but we can say that φy has at most one fixed point in V: If φy(x1)=x1 and φy(x2)=x2, then x1x2=φy(x1)φy(x2)12x1x2, so x1=x2. That is, there exists at most one xV such that f(x)=y, and so f|V is one-to-one.
Let W:=f(V) and let g:WV be the inverse of f|V. We need to show that W is open. Take a y0W. There is a unique x0V such that f(x0)=y0. Let r>0 be small enough such that the closed ball C(x0,r)V (such r>0 exists as V is open).
Suppose y is such that
yy0<r2A1.
If we show that yW, then we have shown that W is open. If x1C(x0,r), then
φy(x1)x0φy(x1)φy(x0)+φy(x0)x012x1x0+A1(yy0)12r+A1yy0<12r+A1r2A1=r.
So φy takes C(x0,r) into B(x0,r)C(x0,r). It is a contraction on C(x0,r) and C(x0,r) is complete (closed subset of Rn is complete). Apply the contraction mapping principle to obtain a fixed point x, i.e. φy(x)=x. That is, f(x)=y, and yf(C(x0,r))f(V)=W. Therefore, W is open.
Next we need to show that g is continuously differentiable and compute its derivative. First, let us show that it is differentiable. Let yW and kRn, k0, such that y+kW. Because f|V is a one-to-one and onto mapping of V onto W, there are unique xV and hRn, h0 and x+hV, such that f(x)=y and f(x+h)=y+k. In other words, g(y)=x and g(y+k)=x+h. See Figure 8.12.

Figure 8.12. Proving that g is differentiable.

We can still squeeze some information from the fact that φy is a contraction.
φy(x+h)φy(x)=h+A1(f(x)f(x+h))=hA1k.
hA1k=φy(x+h)φy(x)12x+hx=h2.
By the inverse triangle inequality, hA1k12h. So
h2A1k2A1k.
In particular, as k goes to 0, so does h.
As xV, then f(x) is invertible. Let B:=(f(x))1, which is what we think the derivative of g at y is. Then
g(y+k)g(y)Bkk=hBkk=hB(f(x+h)f(x))k=B(f(x+h)f(x)f(x)h)kBhkf(x+h)f(x)f(x)hh2BA1f(x+h)f(x)f(x)hh.
As k goes to 0, so does h. So the right-hand side goes to 0 as f is differentiable, and hence the left-hand side also goes to 0. And B is precisely what we wanted g(y) to be.
We have g is differentiable, let us show it is C1(W). The function g:WV is continuous (it is differentiable), f is a continuous function from V to L(Rn), and XX1 is a continuous function on the set of invertible operators. As g(y)=(f(g(y)))1 is the composition of these three continuous functions, it is continuous.

Proof.

Without loss of generality, suppose U=V. For each yf(V), pick xf1(y) (there could be more than one such point), then by the inverse function theorem there is a neighborhood of x in V that maps onto a neighborhood of y. Hence f(V) is open.

Example 8.5.3.

The theorem, and the corollary, is not true if f(x) is not invertible for some x. For example, the map f(x,y):=(x,xy), maps R2 onto the set R2{(0,y):y0}, which is neither open nor closed. In fact, f1(0,0)={(0,y):yR}. This bad behavior only occurs on the y-axis, everywhere else the function is locally invertible. If we avoid the y-axis, f is even one-to-one.

Example 8.5.4.

Just because f(x) is invertible everywhere does not mean that f is one-to-one. It is “locally” one-to-one, but perhaps not “globally.” Consider f:R2{(0,0)}R2{(0,0)} defined by f(x,y):=(x2y2,2xy). It is left to the reader to verify the following statements. The map f is differentiable and the derivative is invertible. On the other hand, f is 2-to-1 globally: For every (a,b) that is not the origin, there are exactly two solutions to x2y2=a and 2xy=b (f is also onto). Notice that once you show that there is at least one solution, replacing x and y with x and y we obtain another solution.
The invertibility of the derivative is not a necessary condition, just sufficient, for having a continuous inverse and for being an open mapping. For example, the function f(x):=x3 is an open mapping from R to R and is globally one-to-one with a continuous inverse, although the inverse is not differentiable at x=0.
As a side note, there is a related famous, and as yet unsolved, problem called the Jacobian conjecture. If F:RnRn is polynomial (each component is a polynomial) and JF (the Jacobian determinant) is a nonzero constant, does F have a polynomial inverse? The inverse function theorem gives a local C1 inverse, but can one always find a global polynomial inverse is the question.

Subsection 8.5.1 Implicit function theorem

The inverse function theorem is a special case of the implicit function theorem, which we prove next. Although somewhat ironically we prove the implicit function theorem using the inverse function theorem. In the inverse function theorem we showed that the equation xf(y)=0 is solvable for y in terms of x if the derivative with respect to y is invertible, that is, if f(y) is invertible. Then there is (locally) a function g such that xf(g(x))=0.
In general, the equation f(x,y)=0 is not solvable for y in terms of x in every case. For instance, there is generally no solution when f(x,y) does not actually depend on y. For a more interesting example, notice that x2+y21=0 defines the unit circle, and we can locally solve for y in terms of x when 1) we are near a point on the unit circle and 2) we are not at a point where the circle has a vertical tangency, that is, where fy=0.
We fix some notation. Let (x,y)Rn+m denote the coordinates (x1,,xn,y1,,ym). We can then write a linear map AL(Rn+m,Rm) as A=[Ax Ay] so that A(x,y)=Axx+Ayy, where AxL(Rn,Rm) and AyL(Rm). First, the linear version of the theorem.
The proof is immediate: We solve and obtain y=Bx. Another way to solve is to “complete the basis,” that is, add rows to the matrix until we have an invertible matrix: The operator in L(Rn+m) given by (x,y)(x,Axx+Ayy) is invertible, and the map B can be read off from the inverse. Let us show that the same can be done for C1 functions.
The condition (f1,,fm)(y1,,ym)(p,q)=det(Ay)0 simply means that Ay is invertible. If n=m=1, the condition is fy(p,q)0, and W and W are open intervals. See Figure 8.13.

Figure 8.13. Implicit function theorem for f(x,y)=x2+y21 in U=R2 and (p,q) in the first quadrant.

Proof.

Define F:URn+m by F(x,y):=(x,f(x,y)). It is clear that F is C1, and we want to show that its derivative at (p,q) is invertible. Let us compute the derivative. The quotient
f(p+h,q+k)f(p,q)AxhAyk(h,k)
goes to zero as (h,k)=h2+k2 goes to zero. But then so does
F(p+h,q+k)F(p,q)(h,Axh+Ayk)(h,k)=(h,f(p+h,q+k)f(p,q))(h,Axh+Ayk)(h,k)=f(p+h,q+k)f(p,q)AxhAyk(h,k).
So the derivative of F at (p,q) takes (h,k) to (h,Axh+Ayk). In block matrix form, it is [I0AxAy]. If (h,Axh+Ayk)=(0,0), then h=0, and so Ayk=0. As Ay is one-to-one, k=0. Thus F(p,q) is one-to-one, and hence invertible. We apply the inverse function theorem.
That is, there exists an open set VRn+m with F(p,q)=(p,0)V, and a C1 mapping G:VRn+m, such that F(G(x,s))=(x,s) for all (x,s)V, G is one-to-one, and G(V) is open. Write G=(G1,G2) (the first n and the next m components of G). Then
F(G1(x,s),G2(x,s))=(G1(x,s),f(G1(x,s),G2(x,s)))=(x,s).
So x=G1(x,s) and f(G1(x,s),G2(x,s))=f(x,G2(x,s))=s. Plugging in s=0, we obtain
f(x,G2(x,0))=0.
As the set G(V) is open and (p,q)G(V), there exist some open sets W~ and W such that W~×WG(V) with pW~ and qW. Take W:={xW~:G2(x,0)W}. The function that takes x to G2(x,0) is continuous and therefore W is open. Define g:WRm by g(x):=G2(x,0), which is the g in the theorem. The fact that g(x) is the unique point in W follows because W×WG(V) and G is one-to-one.
Next, differentiate
xf(x,g(x))
at p, which is the zero map, so its derivative is zero. Using the chain rule,
0=A(h,g(p)h)=Axh+Ayg(p)h
for all hRn, and we obtain the desired derivative for g.
In other words, in the context of the theorem, we have m equations in n+m unknowns:
f1(x1,,xn,y1,,ym)=0,f2(x1,,xn,y1,,ym)=0,fm(x1,,xn,y1,,ym)=0.
The theorem guarantees a solution if f=(f1,f2,,fm) is a C1 map (the components are C1: partial derivatives in all variables exist and are continuous) and the matrix
[f1y1f1y2f1ymf2y1f2y2f2ymfmy1fmy2fmym]
is invertible at (p,q).

Example 8.5.7.

Consider the set given by x2+y2(z+1)3=1 and ex+ey+ez=3 near the point (0,0,0). It is the zero set of the mapping
f(x,y,z)=(x2+y2(z+1)3+1,ex+ey+ez3),
whose derivative is
f=[2x2y3(z+1)2exeyez].
The matrix
[2(0)3(0+1)2e0e0]=[0311]
is invertible. Hence near (0,0,0), we can solve for y and z as C1 functions of x such that for x near 0,
x2+y(x)2(z(x)+1)3=1,ex+ey(x)+ez(x)=3.
In other words, near the origin the set of solutions is a smooth curve in R3 that goes through the origin. The theorem does not tell us how to find y(x) and z(x) explicitly, it just tells us they exist.
An interesting, and sometimes useful, observation from the proof is that we solved the equation f(x,g(x))=s for all s in some neighborhood of 0, not just s=0.

Remark 8.5.8.

There are versions of the theorem for arbitrarily many derivatives: If f has k continuous derivatives (see the next section), then the solution has k continuous derivatives as well.

Exercises 8.5.2 Exercises

8.5.1.

Let C:={(x,y)R2:x2+y2=1}.
  1. Solve for y in terms of x near (0,1) (that is, find the function g from the implicit function theorem for a neighborhood of the point (p,q)=(0,1)).
  2. Solve for y in terms of x near (0,1).
  3. Solve for x in terms of y near (1,0).

8.5.2.

Define f:R2R2 by f(x,y):=(x,y+h(x)) for some continuously differentiable function h of one variable.
  1. Show that f is one-to-one and onto.
  2. Compute f. (Make sure to argue why f exists.)
  3. Show that f is invertible at all points, and compute its inverse.

8.5.3.

Define f:R2R2{(0,0)} by f(x,y):=(excos(y),exsin(y)).
  1. Show that f is onto.
  2. Show that f is invertible at all points.
  3. Show that f is not one-to-one, in fact for every (a,b)R2{(0,0)}, there exist infinitely many different points (x,y)R2 such that f(x,y)=(a,b).
Therefore, invertible derivative at every point does not mean that f is invertible globally.
Note: Feel free to use what you know about sine and cosine from calculus.

8.5.4.

Find a map f:RnRn that is one-to-one, onto, continuously differentiable, but f(0)=0. Hint: Generalize f(x)=x3 from one to n dimensions.

8.5.5.

Consider z2+xz+y=0 in R3. Find an equation D(x,y)=0, such that if D(x0,y0)0 and z2+x0z+y0=0 for some zR, then for points near (x0,y0) there exist exactly two distinct continuously differentiable functions r1(x,y) and r2(x,y) such that z=r1(x,y) and z=r2(x,y) solve z2+xz+y=0. Do you recognize the expression D from algebra?

8.5.6.

Suppose f:(a,b)R2 is continuously differentiable and the first component (the x component) of f(t) is not equal to 0 for all t(a,b). Prove that there exists an open interval IR and a continuously differentiable function g:IR such that (x,y)f((a,b)) if and only if xI and y=g(x). In other words, the set f((a,b)) is a graph of g.

8.5.7.

Define f:R2R2
f(x,y):={(x2sin(1/x)+x/2,y)if x0,(0,y)if x=0.
  1. Show that f is differentiable everywhere.
  2. Show that f(0,0) is invertible.
  3. Show that f is not one-to-one in every neighborhood of the origin (it is not locally invertible, that is, the inverse function theorem does not work).
  4. Show that f is not continuously differentiable.
Note: Feel free to use what you know about sine and cosine from calculus.

8.5.8.

(Polar coordinates)   Define a mapping F(r,θ):=(rcos(θ),rsin(θ)).
  1. Show that F is continuously differentiable (for all (r,θ)R2).
  2. Compute F(0,θ) for all θ.
  3. Show that if r0, then F(r,θ) is invertible, therefore an inverse of F exists locally as long as r0.
  4. Show that F:R2R2 is onto, and for each point (x,y)R2, the set F1(x,y) is infinite.
  5. Show that F:R2R2 is not an open mapping. Note that F|(0,)×R is an open mapping via Corollary 8.5.2. Hint: Where does a small open rectangle such as (ϵ,ϵ)×(ϵ,ϵ) go?
  6. Show that F|(0,)×[0,2π) is one-to-one and onto R2{(0,0)}.
Note: Feel free to use what you know about sine and cosine from calculus.

8.5.9.

Let H:={(x,y)R2:y>0}, and for (x,y)H define
F(x,y):=(x2+y21x2+2y+y2+1, 2xx2+2y+y2+1).
Prove that F is a bijective mapping from H to B(0,1), it is continuously differentiable on H, and its inverse is also continuously differentiable.

8.5.10.

Suppose UR2 is open and f:UR is a C1 function such that f(x,y)0 for all (x,y)U. Show that every level set is a C1 smooth curve. That is, for every (x,y)U, there exists a C1 function γ:(δ,δ)R2 with γ(0)0 such that f(γ(t)) is constant for all t(δ,δ).

8.5.11.

Suppose UR2 is open and f:UR is a C1 function such that f(x,y)0 for all (x,y)U. Show that for every (x,y) there exists a neighborhood V of (x,y) an open set WR2, a bijective C1 function with a C1 inverse g:WV such that the level sets of fg are horizontal lines in W, that is, the set given by (fg)(s,t)=c for a constant c is a set of the form {(s,t0)R2:sR,(s,t0)W}, where t0 is fixed. That is, the level curves can be locally “straightened.”
For a higher quality printout use the PDF versions: https://www.jirka.org/ra/realanal.pdf or https://www.jirka.org/ra/realanal2.pdf