Introduction to Graph Theory - Part 5

In Part 3, we briefly introduced trees as connected graphs without cycles and observed that they have exactly $n-1$ edges. We promised a rigorous proof — and now it's time to deliver. But trees deserve much more than just that one result. They are among the most fundamental structures in graph theory and computer science, appearing everywhere from network design to data structures to phylogenetics. In this part, we'll explore their properties in depth.

Leaves

Let's start with a question: in a tree, can every vertex have many neighbors, or are some vertices forced to sit at the "boundary"? Think of a real tree — it has branches that eventually end in leaves. The same is true for trees in graph theory.

Definition 21. A leaf of a tree is a vertex with degree $1$ .

In the star graph $G = (\{1,2,3,4\}, \{(1,2), (1,3), (1,4)\})$ from Part 3, the vertices $2$ , $3$ , and $4$ are all leaves — each connected to vertex $1$ by a single edge. In the path graph $P_n$ , the two endpoints are leaves.

It turns out that leaves are not just common in trees — they are guaranteed to exist.

Lemma 2. Every tree with at least two vertices has at least two leaves.

Proof. Let $T$ be a tree with $n \geq 2$ vertices. Consider a longest path in $T$ , say $v_0, v_1, \ldots, v_k$ . Since $T$ has at least two vertices and is connected, this path has at least one edge, so $k \geq 1$ .

We claim that $v_0$ is a leaf. Suppose for contradiction that $d(v_0) \geq 2$ . Then $v_0$ has a neighbor $u$ different from $v_1$ . If $u$ is already on the path, say $u = v_j$ for some $j \geq 2$ , then the vertices $v_0, v_1, \ldots, v_j, v_0$ would form a cycle — contradicting the fact that $T$ is a tree. If $u$ is not on the path, then $u, v_0, v_1, \ldots, v_k$ is a path longer than $v_0, v_1, \ldots, v_k$ — contradicting our choice of a longest path. In both cases, we reach a contradiction, so $d(v_0) = 1$ and $v_0$ is a leaf. By the same argument applied to $v_k$ , we get a second leaf.

This result is more powerful than it might first appear. It gives us a way to "peel off" a tree layer by layer: remove a leaf, and what remains is still a tree (with one fewer vertex). This idea is the key to proving the next theorem.

The Edge Count

In Part 3, we gave an informal argument for why a tree on $n$ vertices has $n-1$ edges. Now we can make this precise using the leaf removal technique.

Theorem 2. Every tree on $n$ vertices has exactly $n-1$ edges.

Proof. We prove this by induction on $n$ .

Base case: A tree on $n = 1$ vertex has no edges, and $1 - 1 = 0$ . The statement holds.

Inductive step: Assume the statement holds for all trees with fewer than $n$ vertices, where $n \geq 2$ . Let $T$ be a tree on $n$ vertices. By Lemma 2, $T$ has a leaf $v$ . Let $T'$ be the graph obtained by removing $v$ and its single edge from $T$ .

We need to check that $T'$ is still a tree. It is connected: since $v$ was a leaf, every path in $T$ that didn't use $v$ is still present in $T'$ , and no path between two vertices in $T'$ needed to go through $v$ (because $v$ is a dead end). It is acyclic: removing a vertex from a graph without cycles cannot create a cycle. So $T'$ is a tree on $n-1$ vertices.

By the induction hypothesis, $T'$ has $n-2$ edges. Since we removed exactly one edge to get from $T$ to $T'$ , the tree $T$ has $(n-2) + 1 = n-1$ edges.

Equivalent Characterizations

One of the beautiful things about trees is that there are many different ways to define them, and they all turn out to be equivalent. We've been working with the definition "connected and acyclic," but here are several other characterizations.

Theorem 3. For a graph $G$ on $n$ vertices, the following statements are equivalent:

(i) $G$ is connected and acyclic (i.e., $G$ is a tree).

(ii) $G$ is connected and has exactly $n-1$ edges.

(iii) $G$ is acyclic and has exactly $n-1$ edges.

(iv) Between any two vertices in $G$ , there is exactly one path.

(v) $G$ is connected, but removing any edge disconnects it.

(vi) $G$ is acyclic, but adding any edge creates exactly one cycle.

We won't prove all implications here, but let's verify the most instructive ones to build intuition.

(i) $\Rightarrow$ (iv): Let $G$ be a tree, and let $u, v$ be two vertices. Since $G$ is connected, at least one path from $u$ to $v$ exists. Suppose there are two distinct paths $P$ and $Q$ from $u$ to $v$ . Since $P$ and $Q$ are different, they must diverge at some vertex and rejoin at a later vertex — but this creates a cycle, contradicting the fact that $G$ is acyclic. So the path is unique.

(iv) $\Rightarrow$ (v): If there is exactly one path between any two vertices, then the graph is connected. Now take any edge $(u, w)$ . This edge is the unique path between $u$ and $w$ . Removing it means there is no longer any path from $u$ to $w$ , so the graph becomes disconnected.

(v) $\Rightarrow$ (i): The graph is connected by assumption. If it contained a cycle, we could remove any edge on that cycle without disconnecting the graph — contradicting the assumption that removing any edge disconnects it. So the graph is acyclic, and therefore a tree.

The remaining implications follow similar reasoning. The key takeaway is that trees live at a kind of "sweet spot" — they have exactly enough edges to be connected, but not a single one more¹.

Forests

What happens if we relax the requirement that a tree be connected? We get a graph that might consist of several separate trees — like a forest made up of individual trees.

Definition 22. A forest is a graph that contains no cycle as a subgraph.

In other words, a forest is an acyclic graph. Each connected component of a forest is a tree. A tree is simply a forest with exactly one connected component.

Observation 5. A forest on $n$ vertices with $c$ connected components has exactly $n - c$ edges.

This follows directly from Theorem 2: each connected component is a tree, and a tree on $n_i$ vertices has $n_i - 1$ edges. Summing over all $c$ components, the total number of edges is $\sum_{i=1}^{c} (n_i - 1) = n - c$ .

Rooted Trees

So far, our trees have been "unrooted" — no vertex plays a special role. But in many applications, especially in computer science, we want to single out one vertex as the starting point. This gives the tree a natural hierarchy.

Definition 23. A rooted tree is a tree with one designated vertex called the root.

Once we fix a root $r$ , the tree gets a natural top-down structure. Every other vertex has a unique path to the root (by Theorem 3), and this path defines a hierarchy.

Definition 24. In a rooted tree with root $r$ , let $v$ be a vertex different from $r$ . The parent of $v$ is the first vertex after $v$ on the unique path from $v$ to $r$ . The children of a vertex $v$ are all vertices whose parent is $v$ . A vertex with no children is called a leaf².

The depth of a vertex $v$ is the length of the unique path from $r$ to $v$ . The root has depth $0$ . The height of the tree is the maximum depth among all its vertices.

Think of a company's organizational chart: the CEO is the root, department heads are children of the CEO, team leads are children of department heads, and so on. The depth tells you how many levels of management sit above a person, and the height of the tree tells you how many levels the organization has.

Spanning Trees

Let's end with a concept that connects trees back to general graphs. Given a connected graph $G$ , we can always find a tree "hiding" inside it that still connects all the vertices.

Definition 25. A spanning tree of a connected graph $G = (V, E)$ is a subgraph $T = (V, E_T)$ that is a tree and contains all vertices of $G$ .

In other words, a spanning tree keeps every vertex but uses only $n-1$ of the edges — just enough to stay connected without any cycles.

Theorem 4. Every connected graph has a spanning tree.

Proof. Let $G$ be a connected graph. If $G$ has no cycle, it is already a tree, and we are done. Otherwise, $G$ contains a cycle. Pick any edge $e$ on that cycle and remove it. The resulting graph is still connected: any path that used $e$ can be rerouted along the rest of the cycle. Now repeat: if the graph still has a cycle, remove an edge from it. Each removal decreases the number of edges by one while preserving connectivity. Since the graph is finite, this process must terminate — and it terminates when no cycle remains, giving us a connected, acyclic subgraph that contains all vertices. That is a spanning tree.

Spanning trees are incredibly useful in practice. When designing a network — say, connecting cities with roads or computers with cables — a spanning tree gives you the cheapest way to keep everything connected without redundant links. Finding spanning trees with minimum total edge weight is the minimum spanning tree problem, one of the most classical problems in combinatorial optimization³.

Looking Ahead

Trees gave us our first taste of deeper structural results and rigorous proofs. In the next part, we will introduce directed graphs — graphs where edges have a direction, like one-way streets. This opens the door to modeling asymmetric relationships such as web links, task dependencies, and flows through networks.

Footnotes

This is why trees are sometimes called minimally connected graphs: they are connected, but removing any edge breaks the connection. ↩
This is consistent with our earlier definition: in a rooted tree, a leaf is a vertex with no children, which means its only neighbor is its parent, so it has degree $1$ — unless it is the root, which has degree $0$ only when the tree consists of a single vertex. ↩
Efficient algorithms for finding minimum spanning trees include Kruskal's algorithm and Prim's algorithm, both of which run in time polynomial in the size of the graph. ↩