Skip to main content
Logo image

Section 8.1 Vector spaces, linear mappings, and convexity

Note: 3 lectures

Subsection 8.1.1 Vector spaces

The euclidean space \(\R^n\) has already made an appearance in the metric space chapter. In this chapter, we extend the differential calculus we created for one variable to several variables. The key idea in differential calculus is to approximate differentiable functions by linear functions (approximating the graph by a straight line). In several variables, we must introduce a little bit of linear algebra before we can move on. We start with vector spaces and linear mappings of vector spaces.
While it is common to use \(\vec{v}\) or the bold \(\mathbf{v}\) for elements of \(\R^n\text{,}\) especially in the applied sciences, we use just plain old \(v\text{,}\) which is common in mathematics. That is, \(v \in \R^n\) is a vector, which means \(v = (v_1,v_2,\ldots,v_n)\) is an \(n\)-tuple of real numbers.
 1 
Subscripts are used for many purposes, so sometimes we may have several vectors that may also be identified by subscript, such as a finite or infinite sequence of vectors \(y_1,y_2,\ldots\text{.}\)
It is common to write and treat vectors as column vectors, that is, \(n\)-by-\(1\) matrices:
\begin{equation*} v = (v_1,v_2,\ldots,v_n) = { \left[ \begin{smallmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{smallmatrix} \right] } \end{equation*}
We do so when convenient. We call real numbers scalars to distinguish them from vectors.
We often think of vectors as a direction and a magnitude and draw the vector as an arrow. The vector \((v_1,v_2,\ldots,v_n)\) is represented by an arrow from the origin to the point \((v_1,v_2,\ldots,v_n)\text{.}\) When we think of vectors as arrows, they are not based at the origin necessarily; a vector is simply the direction and the magnitude, and it does not know where it starts. There is a natural algebraic structure when thinking of vectors as arrows. We can add vectors as arrows by following one vector and then the other. And we can take scalar multiples of vectors as arrows by rescaling the magnitude. See Figure 8.1.

Figure 8.1. Vector as an arrow in \(\R^2\text{,}\) and the meaning of addition and scalar multiplication.

Each vector also represents a point in \(\R^n\text{.}\) Usually, we think of \(v \in \R^n\) as a point if we are thinking of \(\R^n\) as a metric space, and we think of it as an arrow if we think of the so-called vector space structure on \(\R^n\) (addition and scalar multiplication). Let us define the abstract notion of a vector space, as there are many other vector spaces than just \(\R^n\text{.}\)

Definition 8.1.1.

Let \(X\) be a set together with the operations of addition, \(+ \colon X \times X \to X\text{,}\) and multiplication, \(\cdot \colon \R \times X \to X\text{,}\) (we usually write \(ax\) instead of \(a \cdot x\)). \(X\) is called a vector space (or a real vector space) if the following conditions are satisfied:
  1. (Addition is associative)     If \(u, v, w \in X\text{,}\) then \(u+(v+w) = (u+v)+w\text{.}\)
  2. (Addition is commutative)     If \(u, v \in X\text{,}\) then \(u+v = v+u\text{.}\)
  3. (Additive identity)     There is a \(0 \in X\) such that \(v+0=v\) for all \(v \in X\text{.}\)
  4. (Additive inverse)     For each \(v \in X\text{,}\) there is a \(-v \in X\text{,}\) such that \(v+(-v)=0\text{.}\)
  5. (Distributive law)     If \(a \in \R\text{,}\) \(u,v \in X\text{,}\) then \(a(u+v) = au+av\text{.}\)
  6. (Distributive law)     If \(a,b \in \R\text{,}\) \(v \in X\text{,}\) then \((a+b)v = av+bv\text{.}\)
  7. (Multiplication is associative)     If \(a,b \in \R\text{,}\) \(v \in X\text{,}\) then \((ab)v = a(bv)\text{.}\)
  8. (Multiplicative identity)     \(1v = v\) for all \(v \in X\text{.}\)
Elements of a vector space are usually called vectors, even if they are not elements of \(\R^n\) (vectors in the “traditional” sense). If \(Y \subset X\) is a subset that is a vector space itself using the same operations, then \(Y\) is called a subspace or a vector subspace of \(X\text{.}\)
Multiplication by scalars works as one would expect. For example, \(2v = (1+1)v = 1v + 1v = v+v\text{,}\) similarly \(3v = v+v+v\text{,}\) and so on. One particular fact we often use is that \(0 v = 0\text{,}\) where the zero on the left is \(0 \in \R\) and the zero on the right is \(0 \in X\text{.}\) To see this, start with \(0v = (0+0)v = 0v + 0v\text{,}\) and add \(-(0v)\) to both sides to obtain \(0 = 0v\text{.}\) Similarly, \(-v = (-1)v\text{,}\) which follows by \((-1)v+v = (-1)v + 1v = (-1+1)v = 0v = 0\text{.}\) Such algebraic facts which follow quickly from the definition will be taken for granted from now on.

Example 8.1.2.

The set \(\R^n\) is a vector space, addition and multiplication by a scalar is done componentwise: If \(a \in \R\text{,}\) \(v = (v_1,v_2,\ldots,v_n) \in \R^n\text{,}\) and \(w = (w_1,w_2,\ldots,w_n) \in \R^n\text{,}\) then
\begin{equation*} \begin{aligned} & v+w \coloneqq (v_1,v_2,\ldots,v_n) + (w_1,w_2,\ldots,w_n) = (v_1+w_1,v_2+w_2,\ldots,v_n+w_n) , \\ & a v \coloneqq a (v_1,v_2,\ldots,v_n) = (a v_1, a v_2,\ldots, a v_n) . \end{aligned} \end{equation*}
We will mostly deal with “finite-dimensional” vector spaces that can be regarded as subsets of \(\R^n\text{,}\) but other vector spaces are useful in analysis. It is better to think of even such simpler vector spaces abstractly abstract notion rather than as \(\R^n\text{.}\)

Example 8.1.3.

A trivial example of a vector space is \(X \coloneqq \{ 0 \}\text{.}\) The operations are defined in the obvious way: \(0 + 0 \coloneqq 0\) and \(a0 \coloneqq 0\text{.}\) A zero vector must always exist, so all vector spaces are nonempty sets, and this \(X\) is the smallest possible vector space.

Example 8.1.4.

The space \(C([0,1],\R)\) of continuous functions on the interval \([0,1]\) is a vector space. For two functions \(f\) and \(g\) in \(C([0,1],\R)\) and \(a \in \R\text{,}\) we make the obvious definitions of \(f+g\) and \(af\text{:}\)
\begin{equation*} (f+g)(x) \coloneqq f(x) + g(x), \qquad (af) (x) \coloneqq a\bigl(f(x)\bigr) . \end{equation*}
The 0 is the function that is identically zero. We leave it as an exercise to check that all the vector space conditions are satisfied. The space \(C^1([0,1],\R)\) of continuously differentiable functions is a subspace of \(C([0,1],\R)\text{.}\)

Example 8.1.5.

The space of polynomials \(c_0 + c_1 t + c_2 t^2 + \cdots + c_m t^m\) (of arbitrary degree \(m\)) is a vector space, denoted by \(\R[t]\) (coefficients are real and the variable is \(t\)). The operations are defined in the same way as for functions above. Suppose there are two polynomials, one of degree \(m\) and one of degree \(n\text{.}\) Assume \(n \geq m\) for simplicity. Then
\begin{multline*} (c_0 + c_1 t + c_2 t^2 + \cdots + c_m t^m) + (d_0 + d_1 t + d_2 t^2 + \cdots + d_n t^n) = \\ (c_0+d_0) + (c_1+d_1) t + (c_2 + d_2) t^2 + \cdots + (c_m+d_m) t^m + d_{m+1} t^{m+1} + \cdots + d_n t^n \end{multline*}
and
\begin{equation*} a(c_0 + c_1 t + c_2 t^2 + \cdots + c_m t^m) = (ac_0) + (ac_1) t + (ac_2) t^2 + \cdots + (ac_m) t^m . \end{equation*}
Despite what it looks like, \(\R[t]\) is not equivalent to \(\R^n\) for any \(n\text{.}\) In particular, it is not “finite-dimensional.” We will make this notion precise in just a little bit. One can make a finite-dimensional vector subspace by restricting the degree. For example, if \(\sP_n\) is the set of polynomials of degree \(n\) or less, then \(\sP_n\) is a finite-dimensional vector space, and we could identify it with \(\R^{n+1}\text{.}\)
Above, the variable \(t\) is really just a formal placeholder. By setting \(t\) equal to a real number, we obtain a function. So the space \(\R[t]\) can be thought of as a subspace of \(C(\R,\R)\text{.}\) If we restrict the range of \(t\) to \([0,1]\text{,}\) \(\R[t]\) can be identified with a subspace of \(C([0,1],\R)\text{.}\)
Items 2) and 3) ensure that addition and scalar multiplication are indeed defined on \(S\text{.}\) Item 1) is required to fulfill item iii from the definition of vector space. Existence of additive inverse \(-v\text{,}\) item iv, follows because \(-v = (-1)v\) and item 3) says that \(-v \in S\) if \(v \in S\text{.}\) All other properties are certain equalities that are already satisfied in \(X\) and thus must be satisfied in a subset.
It is possible to use other fields than \(\R\) in the definition (for example, it is common to use the complex numbers \(\C\)), but let us stick with the real numbers
 2 
If you want a very funky vector space over a different field, \(\R\) itself is a vector space over the field \(\Q\text{.}\)
.

Subsection 8.1.2 Linear combinations and dimension

Definition 8.1.7.

Suppose \(X\) is a vector space, \(x_1, x_2, \ldots, x_m \in X\) are vectors, and \(a_1, a_2, \ldots, a_m \in \R\) are scalars. Then
\begin{equation*} a_1 x_1 + a_2 x_2 + \cdots + a_m x_m \end{equation*}
is called a linear combination of the vectors \(x_1, x_2, \ldots, x_m\text{.}\)
For a subset \(Y \subset X\text{,}\) let \(\spn(Y)\text{,}\) or the span of \(Y\text{,}\) be the set of all linear combinations of all finite subsets of \(Y\text{.}\) We say \(Y\) spans \(\spn(Y)\text{.}\) By convention, define \(\spn(\emptyset) \coloneqq \{ 0 \}\text{.}\)

Example 8.1.8.

Let \(Y \coloneqq \bigl\{ (1,1) \bigr\} \subset \R^2\text{.}\) Then
\begin{equation*} \spn(Y) = \bigl\{ (x,x) \in \R^2 : x \in \R \bigr\} . \end{equation*}
That is, \(\spn(Y)\) is the line through the origin and the point \((1,1)\text{.}\)

Example 8.1.9.

Let \(Y \coloneqq \bigl\{ (1,1), (0,1) \bigr\} \subset \R^2\text{.}\) Then
\begin{equation*} \spn(Y) = \R^2 , \end{equation*}
as every point \((x,y) \in \R^2\) can be written as a linear combination
\begin{equation*} (x,y) = x (1,1) + (y-x) (0,1) . \end{equation*}

Example 8.1.10.

Let \(Y \coloneqq \{ 1, t, t^2, t^3, \ldots \} \subset \R[t]\text{,}\) and \(E \coloneqq \{ 1, t^2, t^4, t^6, \ldots \} \subset \R[t]\text{.}\) The span of \(Y\) is all polynomials,
\begin{equation*} \spn(Y) = \R[t] . \end{equation*}
The span of \(E\) is the set of polynomials with even powers of \(t\) only.
Suppose we have two linear combinations of vectors from \(Y\text{.}\) One linear combination uses the vectors \(\{ x_1,x_2,\ldots,x_m \}\text{,}\) and the other uses \(\{ \widetilde{x}_1,\widetilde{x}_2,\ldots,\widetilde{x}_n \}\text{.}\) We can write their sum using vectors from the union \(\{ x_1,x_2,\ldots,x_m \} \cup \{ \widetilde{x}_1,\widetilde{x}_2,\ldots,\widetilde{x}_n \}\text{:}\)
\begin{multline*} (a_1 x_1 + a_2 x_2 + \cdots + a_m x_m) + (b_1 \widetilde{x}_1 + b_2 \widetilde{x}_2 + \cdots + b_n \widetilde{x}_n) \\ = a_1 x_1 + a_2 x_2 + \cdots + a_m x_m + b_1 \widetilde{x}_1 + b_2 \widetilde{x}_2 + \cdots + b_n \widetilde{x}_n . \end{multline*}
So the sum is also a linear combination of vectors from \(Y\text{.}\) Similarly, a scalar multiple of a linear combination of vectors from \(Y\) is a linear combination of vectors from \(Y\text{:}\)
\begin{equation*} b (a_1 x_1 + a_2 x_2 + \cdots + a_m x_m) = b a_1 x_1 + b a_2 x_2 + \cdots + b a_m x_m . \end{equation*}
Finally, \(0 \in \spn(Y)\text{;}\) if \(Y\) is nonempty, \(0 = 0 v\) for some \(v \in Y\text{.}\) We have proved the following proposition.
Every linear combination of elements in a subspace is an element of that subspace. So \(\spn(Y)\) is the smallest subspace that contains \(Y\text{.}\) In particular, if \(Y\) is already a vector subspace, then \(\spn(Y) = Y\text{.}\)

Definition 8.1.12.

A set of vectors \(\{ x_1, x_2, \ldots, x_m \} \subset X\) is linearly independent
 3 
For an infinite set \(Y \subset X\text{,}\) we say \(Y\) is linearly independent if every finite subset of \(Y\) is linearly independent in the sense given. However, this situation only comes up in infinitely many dimensions.
if the equation
\begin{equation} a_1 x_1 + a_2 x_2 + \cdots + a_m x_m = 0\tag{8.1} \end{equation}
has only the trivial solution \(a_1 = a_2 = \cdots = a_m = 0\text{.}\) By convention, \(\emptyset\) is linearly independent. A set that is not linearly independent is linearly dependent. A linearly independent set of vectors \(B \subset X\) such that \(\spn(B) = X\) is called a basis of \(X\text{.}\) We generally consider the basis as not just a set, but as an ordered \(m\)-tuple: \(x_1,x_2,\ldots,x_m\text{.}\)
Suppose \(d\) is largest integer for which \(X\) contains a set of \(d\) linearly independent vectors. We then say \(d\) is the dimension of \(X\text{,}\) and we write \(\dim \, X \coloneqq d\text{.}\) If \(X\) contains a set of \(d\) linearly independent vectors for arbitrarily large \(d\text{,}\) we say \(X\) is infinite-dimensional and write \(\dim \, X \coloneqq \infty\text{.}\) For the trivial vector space \(\{ 0 \}\text{,}\) we define \(\dim \, \{ 0 \} \coloneqq 0\text{.}\)
A subset of a linearly independent set is clearly linearly independent. So if a set contains \(d\) linearly independent vectors, it also contains a set of \(m\) linearly independent vectors for all \(m \leq d\text{.}\) Moreover, if a set does not have \(d+1\) linearly independent vectors, no set of more than \(d+1\) vectors is linearly independent. So \(X\) is of dimension is \(d\) if there is a set of \(d\) linearly independent vectors, but no set of \(d+1\) vectors is linearly independent.
No element of a linearly independent set can be zero, and a set with one nonzero element is always linearly independent. In particular, \(\{ 0 \}\) is the only vector space of dimension \(0\text{.}\) Every other vector space has a positive dimension or is infinite-dimensional. As the empty set is linearly independent, it is a basis of \(\{ 0 \}\text{.}\)
As an example, the set \(Y\) of the two vectors in Example 8.1.9 is a basis of \(\R^2\text{,}\) and so \(\dim \, \R^2 \geq 2\text{.}\) We will see in a moment that every vector subspace of \(\R^n\) has a finite dimension, and that dimension is less than or equal to \(n\text{.}\) So every set of 3 vectors in \(\R^2\) is linearly dependent, and \(\dim \, \R^2 = 2\text{.}\)
If a set is linearly dependent, then one of the vectors is a linear combination of the others. In (8.1), if \(a_k \not= 0\text{,}\) then we solve for \(x_k\text{:}\)
\begin{equation*} x_k = \frac{-a_1}{a_k} x_1 + \cdots + \frac{-a_{k-1}}{a_k} x_{k-1} + \frac{-a_{k+1}}{a_k} x_{k+1} + \cdots + \frac{-a_m}{a_m} x_m . \end{equation*}
The vector \(x_k\) has at least two different representations as linear combinations of the vectors \(\{ x_1,x_2,\ldots,x_m \}\text{.}\) The one above and \(x_k\) itself. For instance, the set \(\bigl\{ (0,1), (2,3), (5,0) \bigr\}\) in \(\R^2\) is linearly dependent:
\begin{equation*} 3(0,1) - (2,3) + 2(1,0) = 0 , \qquad \text{so} \qquad (2,3) = 3(0,1) + 2(1,0) . \end{equation*}

Proof.

As \(X\) is the span of \(B\text{,}\) every \(y \in X\) is a linear combination of elements of \(B\text{.}\) Suppose
\begin{equation*} y = \sum_{k=1}^n a_k \, x_k = \sum_{k=1}^n b_k \, x_k . \end{equation*}
Then
\begin{equation*} \sum_{k=1}^n (a_k-b_k) x_k = 0 . \end{equation*}
By linear independence of the basis, \(a_k = b_k\) for all \(k\text{,}\) and so the representation is unique.
For \(\R^n\text{,}\) we define the standard basis of \(\R^n\text{:}\)
\begin{equation*} e_1 \coloneqq (1,0,0,\ldots,0) , \quad e_2 \coloneqq (0,1,0,\ldots,0) , \quad \ldots, \quad e_n \coloneqq (0,0,0,\ldots,1) . \end{equation*}
We use the same letters \(e_k\) for any \(\R^n\text{,}\) and which space \(\R^n\) we are working in is understood from context. A direct computation shows that \(\{ e_1, e_2, \ldots, e_n \}\) really is a basis of \(\R^n\text{;}\) it spans \(\R^n\) and is linearly independent. In fact,
\begin{equation*} a = (a_1,a_2,\ldots,a_n) = \sum_{k=1}^n a_k \, e_k . \end{equation*}
In particular, the last item says that if \(\dim X = d\) and \(T\) is a set of \(d\) linearly independent vectors, then \(T\) spans \(X\text{.}\) Another thing to note is that item iii implies that every basis of a finite dimensional vector space has the same number of elements.

Proof.

All statements hold trivially when \(d=0\text{,}\) so assume \(d \geq 1\text{.}\)
We start with i. Suppose \(S \coloneqq \{ x_1 , x_2, \ldots, x_d \}\) spans \(X\text{,}\) and \(T \coloneqq \{ y_1, y_2, \ldots, y_m \}\) is a linearly independent subset of \(X\text{.}\) We wish to show that \(m \leq d\text{.}\) As \(S\) spans \(X\text{,}\) write
\begin{equation*} y_1 = \sum_{k=1}^d a_{k,1} x_k , \end{equation*}
for some numbers \(a_{1,1},a_{2,1},\ldots,a_{d,1}\text{.}\) One of the \(a_{k,1}\) is nonzero, otherwise \(y_1\) would be zero. Without loss of generality, suppose \(a_{1,1} \not= 0\text{.}\) Solve
\begin{equation*} x_1 = \frac{1}{a_{1,1}} y_1 - \sum_{k=2}^d \frac{a_{k,1}}{a_{1,1}} x_k . \end{equation*}
In particular, \(\{ y_1 , x_2, \ldots, x_d \}\) spans \(X\text{,}\) since \(x_1\) can be obtained from \(\{ y_1 , x_2, \ldots, x_d \}\text{.}\) Therefore, there are some numbers for some numbers \(a_{1,2},a_{2,2},\ldots,a_{d,2}\text{,}\) such that
\begin{equation*} y_2 = a_{1,2} y_1 + \sum_{k=2}^d a_{k,2} x_k . \end{equation*}
As \(T\) is linearly independent—and so \(\{ y_1, y_2 \}\) is linearly independent—one of the \(a_{k,2}\) for \(k \geq 2\) must be nonzero. Without loss of generality suppose \(a_{2,2} \not= 0\text{.}\) Solve
\begin{equation*} x_2 = \frac{1}{a_{2,2}} y_2 - \frac{a_{1,2}}{a_{2,2}} y_1 - \sum_{k=3}^d \frac{a_{k,2}}{a_{2,2}} x_k . \end{equation*}
In particular, \(\{ y_1 , y_2, x_3, \ldots, x_d \}\) spans \(X\text{.}\)
We continue this procedure. If \(m < d\text{,}\) we are done. Suppose \(m \geq d\text{.}\) After \(d\) steps, we obtain that \(\{ y_1 , y_2, \ldots, y_d \}\) spans \(X\text{.}\) Any other vector \(v\) in \(X\) is a linear combination of \(\{ y_1 , y_2, \ldots, y_d \}\) and hence cannot be in \(T\) as \(T\) is linearly independent. So \(m = d\text{.}\)
We continue with ii. Suppose \(T = \{x_1,x_2,\ldots,x_m\}\) is linearly independent, does not span \(X\text{,}\) and \(v \in X \setminus \spn (T)\text{.}\) Suppose \(a_1 x_1 + a_2 x_2 + \cdots + a_m x_m + a_{m+1} v = 0\) for some scalars \(a_1,a_2,\ldots,a_{m+1}\text{.}\) If \(a_{m+1} \not=0\text{,}\) then \(v\) would be a linear combination of \(T\text{,}\) so \(a_{m+1} = 0\text{.}\) Then, as \(T\) is linearly independent, \(a_1=a_2=\cdots=a_m = 0\text{.}\) So \(T \cup \{ v \}\) is linearly independent.
We move to iii. If \(\dim \, X = d\text{,}\) then there must exist some linearly independent set \(T\) of \(d\) vectors, and \(T\) must span \(X\text{,}\) otherwise we could choose a larger set of linearly independent vectors via ii. So we have a basis of \(d\) vectors. On the other hand, if we have a basis of \(d\) vectors, the dimension is at least \(d\) as a basis is linearly independent. A basis also spans \(X\text{,}\) and so by i we know that dimension is at most \(d\text{.}\) Hence the dimension of \(X\) must equal \(d\text{.}\) The “in particular” follows by noting that \(\{ e_1, e_2, \ldots, e_n \}\) is a basis of \(\R^n\text{.}\)
To see iv, suppose \(Y \subset X\) is a vector subspace, where \(\dim \, X = d\text{.}\) As \(X\) cannot contain \(d+1\) linearly independent vectors, neither can \(Y\text{.}\)
For v, suppose \(T\) is a set of \(m\) vectors that is linearly dependent and spans \(X\text{.}\) We will show that \(m > d\text{.}\) One of the vectors is a linear combination of the others. If we remove it from \(T\text{,}\) we obtain a set of \(m-1\) vectors that still span \(X\text{.}\) Hence \(d = \dim \, X \leq m-1\) by i.
For vi suppose \(T = \{ x_1, x_2, \ldots, x_m \}\) is a linearly independent set. First, \(m \leq d\) by definition of dimension. If \(m=d\text{,}\) the set \(T\) must span \(X\) as in the proof of iii, otherwise we could add another vector to \(T\text{.}\) If \(m < d\text{,}\) \(T\) cannot span \(X\) by iii. So find \(v\) not in the span of \(T\text{.}\) Via ii, the set \(T \cup \{ v \}\) is a linearly independent set of \(m+1\) elements. Therefore, we repeat this procedure \(d-m\) times to find a set of \(d\) linearly independent vectors. Again, they must span \(X\text{,}\) otherwise we could add yet another vector.

Subsection 8.1.3 Linear mappings

When \(Y \not= \R\text{,}\) a function \(f \colon X \to Y\) is often called a mapping or a map rather than a function.

Definition 8.1.15.

A map \(A \colon X \to Y\) of vector spaces \(X\) and \(Y\) is linear (we also say \(A\) is a linear transformation or a linear operator) if for all \(a \in \R\) and all \(x,y \in X\text{,}\)
\begin{equation*} A(a x) = a A(x) \qquad \text{and} \qquad A(x+y) = A(x)+A(y) . \end{equation*}
We usually write \(Ax\) instead of \(A(x)\) if \(A\) is linear. If \(A\) is one-to-one and onto, then we say \(A\) is invertible, and we denote the inverse by \(A^{-1}\text{.}\) If \(A \colon X \to X\) is linear, then we say \(A\) is a linear operator on \(X\).
We write \(L(X,Y)\) for the set of linear maps from \(X\) to \(Y\text{,}\) and \(L(X)\) for the set of linear operators on \(X\text{.}\) If \(a \in \R\) and \(A,B \in L(X,Y)\text{,}\) define the maps \(aA\) and \(A+B\) by
\begin{equation*} (aA)(x) \coloneqq aAx , \qquad (A+B)(x) \coloneqq Ax + Bx . \end{equation*}
If \(A \in L(Y,Z)\) and \(B \in L(X,Y)\text{,}\) define the map \(AB \colon X \to Z\) as the composition \(A \circ B\text{,}\)
\begin{equation*} ABx \coloneqq A(Bx) . \end{equation*}
Finally, denote by \(I \in L(X)\) the identity: the linear operator such that \(Ix = x\) for all \(x\text{.}\)
In particular, \(L(X,Y)\) is a vector space, where \(0 \in L(X,Y)\) is the linear map that takes everything to \(0\text{.}\) As \(L(X)\) is not only a vector space, but also admits a product (composition), it is called an algebra.

Proof.

We leave the first four items as a quick exercise, Exercise 8.1.20. Let us prove the last item. Let \(a \in \R\) and \(y \in Y\text{.}\) As \(A\) is onto, then there is an \(x \in X\) such that \(y = Ax\text{.}\) As it is also one-to-one, \(A^{-1}(Az) = z\) for all \(z \in X\text{.}\) So
\begin{equation*} A^{-1}(ay) = A^{-1}(aAx) = A^{-1}\bigl(A(ax)\bigr) = ax = aA^{-1}(y). \end{equation*}
Similarly, let \(y_1,y_2 \in Y\) and \(x_1, x_2 \in X\) be such that \(Ax_1 = y_1\) and \(Ax_2 = y_2\text{,}\) then
\begin{equation*} A^{-1}(y_1+y_2) = A^{-1}(Ax_1+Ax_2) = A^{-1}\bigl(A(x_1+x_2)\bigr) = x_1+x_2 = A^{-1}(y_1) + A^{-1}(y_2). \qedhere \end{equation*}
We only prove this proposition for finite-dimensional spaces, as we do not need infinite-dimensional spaces.
 4 
For infinite-dimensional spaces, the proof is essentially the same, but a little trickier to write. Moreover, we haven’t even defined what a basis is for infinite-dimensional spaces.

Proof.

Let \(\{ x_1, x_2, \ldots, x_n \}\) be a basis of \(X\text{,}\) and let \(y_k \coloneqq A x_k\text{.}\) Every \(x \in X\) has a unique representation
\begin{equation*} x = \sum_{k=1}^n b_k \, x_k \end{equation*}
for some numbers \(b_1,b_2,\ldots,b_n\text{.}\) By linearity,
\begin{equation*} Ax = A\sum_{k=1}^n b_k x_k = \sum_{k=1}^n b_k \, Ax_k = \sum_{k=1}^n b_k \, y_k . \end{equation*}
The “furthermore” follows by setting \(y_k \coloneqq \widetilde{A}(x_k)\text{,}\) and then for \(x = \sum_{k=1}^n b_k \, x_k\text{,}\) defining the extension as \(A(x) \coloneqq \sum_{k=1}^n b_k \, y_k\text{.}\) The function is well-defined by uniqueness of the representation of \(x\text{.}\) We leave it to the reader to check that \(A\) is linear.
For a linear map, it is sufficient to check injectivity at the origin. That is, if the only \(x\) such that \(Ax =0\) is \(x=0\text{,}\) then \(A\) is one-to-one, because if \(Ay=Az\text{,}\) then \(A(y-z) = 0\text{.}\) For this reason, one often studies the nullspace of \(A\text{,}\) that is, \(\{ x \in X : Ax = 0 \}\text{.}\) For finite-dimensional vector spaces (and only in finitely many dimensions) we have the following special case of the so-called rank-nullity theorem from linear algebra.

Proof.

Let \(\{ x_1,x_2,\ldots,x_n \}\) be a basis for \(X\text{.}\) First suppose \(A\) is one-to-one. Let \(c_1,c_2,\ldots,c_n\) be such that
\begin{equation*} 0 = \sum_{k=1}^n c_k \, Ax_k = A\sum_{k=1}^n c_k \, x_k . \end{equation*}
As \(A\) is one-to-one, the only vector that is taken to 0 is 0 itself. Hence,
\begin{equation*} 0 = \sum_{k=1}^n c_k \, x_k, \end{equation*}
and so \(c_k = 0\) for all \(k\) as \(\{ x_1,x_2,\ldots,x_n \}\) is a basis. So \(\{ Ax_1, Ax_2, \ldots, Ax_n \}\) is linearly independent. By Proposition 8.1.14 and the fact that the dimension is \(n\text{,}\) we conclude \(\{ Ax_1, Ax_2, \ldots, Ax_n \}\) spans \(X\text{.}\) Consequently, \(A\) is onto, as any \(y \in X\) can be written as
\begin{equation*} y = \sum_{k=1}^n a_k \, Ax_k = A\sum_{k=1}^n a_k \, x_k . \end{equation*}
For the other direction, suppose \(A\) is onto. Suppose that for some \(c_1,c_2,\ldots,c_n\text{,}\)
\begin{equation*} 0 = A\sum_{k=1}^n c_k \, x_k = \sum_{k=1}^n c_k \, Ax_k . \end{equation*}
As \(A\) is determined by the action on the basis, \(\{ Ax_1, Ax_2, \ldots, Ax_n \}\) spans \(X\text{.}\) So by Proposition 8.1.14, the set is linearly independent, and \(c_k = 0\) for all \(k\text{.}\) In other words, if \(Ax = 0\text{,}\) then \(x=0\text{.}\) Thus, \(A\) is one-to-one.
We leave the proof of the next proposition as an exercise.
We can identify a finite-dimensional vector space \(X\) of dimension \(n\) with \(\R^n\text{,}\) provided we fix a basis \(\{ x_1, x_2, \ldots, x_n \}\) in \(X\text{.}\) That is, we define a bijective linear map \(A \in L(X,\R^n)\) by \(Ax_k \coloneqq e_k\text{,}\) where \(\{ e_1, e_2, \ldots, e_n \}\) is the standard basis in \(\R^n\text{.}\) We have the correspondence
\begin{equation*} \sum_{k=1}^n c_k \, x_k \, \in X \quad \overset{A}{\mapsto} \quad (c_1,c_2,\ldots,c_n) \, \in \R^n . \end{equation*}

Subsection 8.1.4 Convexity

A subset \(U\) of a vector space is convex if whenever \(x,y \in U\text{,}\) the line segment from \(x\) to \(y\) lies in \(U\text{.}\) That is, if the convex combination \((1-t)x+ty\) is in \(U\) for all \(t \in [0,1]\text{.}\) We write \([x,y]\) for this line segment. See Figure 8.2.

Figure 8.2. Convexity.

In \(\R\text{,}\) convex sets are precisely the intervals, which are also precisely the connected sets. In two or more dimensions there are lots of nonconvex connected sets. For example, the set \(\R^2 \setminus \{0\}\) is connected, but not convex—for any \(x \in \R^2 \setminus \{0\}\) where \(y \coloneqq -x\text{,}\) we find \((\nicefrac{1}{2})x + (\nicefrac{1}{2})y = 0\text{,}\) which is not in the set. Balls (in the standard metric) in \(\R^n\) are convex. It is a useful enough result to state as a proposition, but we leave its proof as an exercise.

Example 8.1.21.

A convex combination is, in particular, a linear combination. So every vector subspace \(V\) of a vector space \(X\) is convex.

Example 8.1.22.

Let \(C([0,1],\R)\) be the vector space of continuous real-valued functions on \(\R\text{.}\) Let \(V \subset C([0,1],\R)\) be the set of those \(f\) such that
\begin{equation*} \int_0^1 f(x)\,dx \leq 1 \qquad \text{and} \qquad f(x) \geq 0 \enspace \text{for all } x \in [0,1] . \end{equation*}
Then \(V\) is convex. Take \(t \in [0,1]\text{,}\) and note that if \(f,g \in V\text{,}\) then \((1-t) f(x) + t g(x) \geq 0\) for all \(x\text{.}\) Furthermore,
\begin{equation*} \int_0^1 \bigl((1-t)f(x) + t g(x)\bigr) \,dx = (1-t) \int_0^1 f(x) \,dx + t \int_0^1 g(x) \,dx \leq 1 . \end{equation*}
Note that \(V\) is not a vector subspace of \(C([0,1],\R)\text{.}\) The function \(f(x) \coloneqq 1\) is in \(V\text{,}\) but \(2f\) and \(-f\) is not.

Proof.

If \(x, y \in C\text{,}\) then \(x,y \in C_\lambda\) for all \(\lambda \in I\text{,}\) and hence if \(t \in [0,1]\text{,}\) then \((1-t)x + ty \in C_\lambda\) for all \(\lambda \in I\text{.}\) Therefore, \((1-t)x + ty \in C\) and \(C\) is convex.
A useful construction using intersections of convex sets is the convex hull. Given a subset \(S\) of a vector space \(X\text{,}\) define the convex hull of \(S\) as the intersection of all convex sets containing \(S\text{:}\)
\begin{equation*} \operatorname{co}(S) \coloneqq \bigcap \{ C \subset X : S \subset C, \text{ and } C \text{ is convex} \} . \end{equation*}
That is, the convex hull is the smallest convex set containing \(S\text{.}\) By Proposition 8.1.23, the intersection of convex sets is convex. Hence the convex hull is convex.

Example 8.1.24.

The convex hull of \(\{ 0, 1 \}\) in \(\R\) is \([0,1]\text{.}\) Proof: A convex set containing 0 and 1 must contain \([0,1]\text{,}\) so \([0,1] \subset \operatorname{co}(\{0,1\})\text{.}\) The set \([0,1]\) is convex and contains \(\{0,1\}\text{,}\) so \(\operatorname{co}(\{0,1\}) \subset [0,1]\text{.}\)
Linear mappings preserve convex sets. So in some sense, convex sets are the right sort of sets when considering linear mappings or changes of coordinates.

Proof.

Take two points \(p,q \in A(C)\text{.}\) Pick \(u,v \in C\) such that \(Au = p\) and \(Av=q\text{.}\) As \(C\) is convex, then \((1-t)u+t v \in C\) for all \(t \in [0,1]\text{,}\) so
\begin{equation*} (1-t)p+t q = (1-t)Au+tAv = A\bigl((1-t)u+tv\bigr) \in A(C) . \qedhere \end{equation*}

Exercises 8.1.5 Exercises

8.1.1.

Show that in \(\R^n\) (with the standard euclidean metric), for every \(x \in \R^n\) and every \(r > 0\text{,}\) the ball \(B(x,r)\) is convex.

8.1.2.

Verify that \(\R^n\) is a vector space.

8.1.3.

Let \(X\) be a vector space. Prove that a finite set of vectors \(\{ x_1,x_2,\ldots,x_n \} \subset X\) is linearly independent if and only if for every \(k=1,2,\ldots,n\)
\begin{equation*} \spn\bigl( \{ x_1,\ldots,x_{k-1},x_{k+1},\ldots,x_n \}\bigr) \subsetneq \spn\bigl( \{ x_1,x_2,\ldots,x_n \}\bigr) . \end{equation*}
That is, the span of the set with one vector removed is strictly smaller.

8.1.4.

Show that the set \(X \subset C([0,1],\R)\) of those functions such that \(\int_0^1 f = 0\) is a vector subspace. Compare Exercise 8.1.16.

8.1.5.

(Challenging)   Prove \(C([0,1],\R)\) is an infinite-dimensional vector space where the operations are defined in the obvious way: \(s=f+g\) and \(m=af\) are defined as \(s(x) \coloneqq f(x)+g(x)\) and \(m(x) \coloneqq a f(x)\text{.}\) Hint: For the dimension, think of functions that are only nonzero on the interval \((\nicefrac{1}{n+1},\nicefrac{1}{n})\text{.}\)

8.1.6.

Let \(k \colon [0,1]^2 \to \R\) be continuous. Show that \(L \colon C([0,1],\R) \to C([0,1],\R)\) defined by
\begin{equation*} Lf(y) \coloneqq \int_0^1 k(x,y)f(x)\,dx \end{equation*}
is a linear operator. That is, first show that \(L\) is well-defined by showing that \(Lf\) is continuous whenever \(f\) is, and then showing that \(L\) is linear.

8.1.7.

Let \(\sP_n\) be the vector space of polynomials in one variable of degree \(n\) or less. Show that \(\sP_n\) is a vector space of dimension \(n+1\text{.}\)

8.1.8.

Let \(\R[t]\) be the vector space of polynomials in one variable \(t\text{.}\) Let \(D \colon \R[t] \to \R[t]\) be the derivative operator (derivative in \(t\)). Show that \(D\) is a linear operator.

8.1.9.

Let us show that Proposition 8.1.18 only works in finite dimensions. Take the space of polynomials \(\R[t]\) and define the operator \(A \colon \R[t] \to \R[t]\) by \(A\bigl(P(t)\bigr) \coloneqq tP(t)\text{.}\) Show that \(A\) is linear and one-to-one, but show that it is not onto.

8.1.10.

Finish the proof of Proposition 8.1.17 in the finite-dimensional case. That is, suppose \(\{ x_1, x_2,\ldots x_n \}\) is a basis of \(X\text{,}\) \(\{ y_1, y_2,\ldots y_n \} \subset Y\text{,}\) and define a function
\begin{equation*} A(x) \coloneqq \sum_{k=1}^n b_k y_k, \qquad \text{if} \quad x=\sum_{k=1}^n b_k x_k . \end{equation*}
Prove that \(A \colon X \to Y\) is linear.

8.1.11.

Prove Proposition 8.1.19. Hint: A linear transformation is determined by its action on a basis. So given two bases \(\{ x_1,\ldots,x_n \}\) and \(\{ y_1,\ldots,y_m \}\) for \(X\) and \(Y\) respectively, consider the linear operators \(A_{jk}\) that send \(A_{jk} x_j = y_k\text{,}\) and \(A_{jk} x_\ell = 0\) if \(\ell \not= j\text{.}\)

8.1.12.

(Easy)   Suppose \(X\) and \(Y\) are vector spaces and \(A \in L(X,Y)\) is a linear operator.
  1. Show that the nullspace \(N \coloneqq \{ x \in X : Ax = 0 \}\) is a vector space.
  2. Show that the range \(R \coloneqq \{ y \in Y : Ax = y \text{ for some } x \in X \}\) is a vector space.

8.1.13.

(Easy)   Show by example that a union of convex sets need not be convex.

8.1.14.

Compute the convex hull of the set of 3 points \(\bigl\{ (0,0), (0,1), (1,1) \bigr\}\) in \(\R^2\text{.}\)

8.1.15.

Show that the set \(\bigl\{ (x,y) \in \R^2 : y > x^2 \bigr\}\) is a convex set.

8.1.16.

Show that the set \(X \subset C([0,1],\R)\) of those functions such that \(\int_0^1 f = 1\) is a convex set, but not a vector subspace. Compare Exercise 8.1.4.

8.1.17.

Show that every convex set in \(\R^n\) is connected using the standard topology on \(\R^n\text{.}\)

8.1.18.

Suppose \(K \subset \R^2\) is a convex set such that the only point of the form \((x,0)\) in \(K\) is the point \((0,0)\text{.}\) Further suppose that \((0,1) \in K\) and \((1,1) \in K\text{.}\) Show that if \((x,y) \in K\) and \(x \not= 0\text{,}\) then \(y > 0\text{.}\)

8.1.19.

Prove that an arbitrary intersection of vector subspaces is a vector subspace. That is, if \(X\) is a vector space and \(\{ V_\lambda \}_{\lambda \in I}\) is an arbitrary collection of vector subspaces of \(X\text{,}\) then \(\bigcap_{\lambda \in I} V_\lambda\) is a vector subspace of \(X\text{.}\)

8.1.20.

(Easy)   Finish the proof of Proposition 8.1.16, that is, prove the first four items of the proposition.
For a higher quality printout use the PDF versions: https://www.jirka.org/ra/realanal.pdf or https://www.jirka.org/ra/realanal2.pdf