Logo
Published on

Introduction to Graph Theory - Part 5

Authors
  • avatar
    Name
    Till Heller
    Twitter

In Part 3, we briefly introduced trees as connected graphs without cycles and observed that they have exactly n1n-1 edges. We promised a rigorous proof — and now it's time to deliver. But trees deserve much more than just that one result. They are among the most fundamental structures in graph theory and computer science, appearing everywhere from network design to data structures to phylogenetics. In this part, we'll explore their properties in depth.

Leaves

Let's start with a question: in a tree, can every vertex have many neighbors, or are some vertices forced to sit at the "boundary"? Think of a real tree — it has branches that eventually end in leaves. The same is true for trees in graph theory.

Definition 21. A leaf of a tree is a vertex with degree 11.

In the star graph G=({1,2,3,4},{(1,2),(1,3),(1,4)})G = (\{1,2,3,4\}, \{(1,2), (1,3), (1,4)\}) from Part 3, the vertices 22, 33, and 44 are all leaves — each connected to vertex 11 by a single edge. In the path graph PnP_n, the two endpoints are leaves.

It turns out that leaves are not just common in trees — they are guaranteed to exist.

tree-leaves

Lemma 2. Every tree with at least two vertices has at least two leaves.

Proof. Let TT be a tree with n2n \geq 2 vertices. Consider a longest path in TT, say v0,v1,,vkv_0, v_1, \ldots, v_k. Since TT has at least two vertices and is connected, this path has at least one edge, so k1k \geq 1.

We claim that v0v_0 is a leaf. Suppose for contradiction that d(v0)2d(v_0) \geq 2. Then v0v_0 has a neighbor uu different from v1v_1. If uu is already on the path, say u=vju = v_j for some j2j \geq 2, then the vertices v0,v1,,vj,v0v_0, v_1, \ldots, v_j, v_0 would form a cycle — contradicting the fact that TT is a tree. If uu is not on the path, then u,v0,v1,,vku, v_0, v_1, \ldots, v_k is a path longer than v0,v1,,vkv_0, v_1, \ldots, v_k — contradicting our choice of a longest path. In both cases, we reach a contradiction, so d(v0)=1d(v_0) = 1 and v0v_0 is a leaf. By the same argument applied to vkv_k, we get a second leaf.

This result is more powerful than it might first appear. It gives us a way to "peel off" a tree layer by layer: remove a leaf, and what remains is still a tree (with one fewer vertex). This idea is the key to proving the next theorem.

The Edge Count

In Part 3, we gave an informal argument for why a tree on nn vertices has n1n-1 edges. Now we can make this precise using the leaf removal technique.

Theorem 2. Every tree on nn vertices has exactly n1n-1 edges.

Proof. We prove this by induction on nn.

Base case: A tree on n=1n = 1 vertex has no edges, and 11=01 - 1 = 0. The statement holds.

Inductive step: Assume the statement holds for all trees with fewer than nn vertices, where n2n \geq 2. Let TT be a tree on nn vertices. By Lemma 2, TT has a leaf vv. Let TT' be the graph obtained by removing vv and its single edge from TT.

We need to check that TT' is still a tree. It is connected: since vv was a leaf, every path in TT that didn't use vv is still present in TT', and no path between two vertices in TT' needed to go through vv (because vv is a dead end). It is acyclic: removing a vertex from a graph without cycles cannot create a cycle. So TT' is a tree on n1n-1 vertices.

By the induction hypothesis, TT' has n2n-2 edges. Since we removed exactly one edge to get from TT to TT', the tree TT has (n2)+1=n1(n-2) + 1 = n-1 edges.

Equivalent Characterizations

One of the beautiful things about trees is that there are many different ways to define them, and they all turn out to be equivalent. We've been working with the definition "connected and acyclic," but here are several other characterizations.

Theorem 3. For a graph GG on nn vertices, the following statements are equivalent:

(i) GG is connected and acyclic (i.e., GG is a tree).

(ii) GG is connected and has exactly n1n-1 edges.

(iii) GG is acyclic and has exactly n1n-1 edges.

(iv) Between any two vertices in GG, there is exactly one path.

(v) GG is connected, but removing any edge disconnects it.

(vi) GG is acyclic, but adding any edge creates exactly one cycle.

We won't prove all implications here, but let's verify the most instructive ones to build intuition.

(i) \Rightarrow (iv): Let GG be a tree, and let u,vu, v be two vertices. Since GG is connected, at least one path from uu to vv exists. Suppose there are two distinct paths PP and QQ from uu to vv. Since PP and QQ are different, they must diverge at some vertex and rejoin at a later vertex — but this creates a cycle, contradicting the fact that GG is acyclic. So the path is unique.

(iv) \Rightarrow (v): If there is exactly one path between any two vertices, then the graph is connected. Now take any edge (u,w)(u, w). This edge is the unique path between uu and ww. Removing it means there is no longer any path from uu to ww, so the graph becomes disconnected.

(v) \Rightarrow (i): The graph is connected by assumption. If it contained a cycle, we could remove any edge on that cycle without disconnecting the graph — contradicting the assumption that removing any edge disconnects it. So the graph is acyclic, and therefore a tree.

The remaining implications follow similar reasoning. The key takeaway is that trees live at a kind of "sweet spot" — they have exactly enough edges to be connected, but not a single one more1.

Forests

What happens if we relax the requirement that a tree be connected? We get a graph that might consist of several separate trees — like a forest made up of individual trees.

Definition 22. A forest is a graph that contains no cycle as a subgraph.

In other words, a forest is an acyclic graph. Each connected component of a forest is a tree. A tree is simply a forest with exactly one connected component.

Observation 5. A forest on nn vertices with cc connected components has exactly ncn - c edges.

This follows directly from Theorem 2: each connected component is a tree, and a tree on nin_i vertices has ni1n_i - 1 edges. Summing over all cc components, the total number of edges is i=1c(ni1)=nc\sum_{i=1}^{c} (n_i - 1) = n - c.

Rooted Trees

So far, our trees have been "unrooted" — no vertex plays a special role. But in many applications, especially in computer science, we want to single out one vertex as the starting point. This gives the tree a natural hierarchy.

Definition 23. A rooted tree is a tree with one designated vertex called the root.

Once we fix a root rr, the tree gets a natural top-down structure. Every other vertex has a unique path to the root (by Theorem 3), and this path defines a hierarchy.

Definition 24. In a rooted tree with root rr, let vv be a vertex different from rr. The parent of vv is the first vertex after vv on the unique path from vv to rr. The children of a vertex vv are all vertices whose parent is vv. A vertex with no children is called a leaf2.

The depth of a vertex vv is the length of the unique path from rr to vv. The root has depth 00. The height of the tree is the maximum depth among all its vertices.

rooted-tree

Think of a company's organizational chart: the CEO is the root, department heads are children of the CEO, team leads are children of department heads, and so on. The depth tells you how many levels of management sit above a person, and the height of the tree tells you how many levels the organization has.

Spanning Trees

Let's end with a concept that connects trees back to general graphs. Given a connected graph GG, we can always find a tree "hiding" inside it that still connects all the vertices.

Definition 25. A spanning tree of a connected graph G=(V,E)G = (V, E) is a subgraph T=(V,ET)T = (V, E_T) that is a tree and contains all vertices of GG.

In other words, a spanning tree keeps every vertex but uses only n1n-1 of the edges — just enough to stay connected without any cycles.

spanning-tree

Theorem 4. Every connected graph has a spanning tree.

Proof. Let GG be a connected graph. If GG has no cycle, it is already a tree, and we are done. Otherwise, GG contains a cycle. Pick any edge ee on that cycle and remove it. The resulting graph is still connected: any path that used ee can be rerouted along the rest of the cycle. Now repeat: if the graph still has a cycle, remove an edge from it. Each removal decreases the number of edges by one while preserving connectivity. Since the graph is finite, this process must terminate — and it terminates when no cycle remains, giving us a connected, acyclic subgraph that contains all vertices. That is a spanning tree.

Spanning trees are incredibly useful in practice. When designing a network — say, connecting cities with roads or computers with cables — a spanning tree gives you the cheapest way to keep everything connected without redundant links. Finding spanning trees with minimum total edge weight is the minimum spanning tree problem, one of the most classical problems in combinatorial optimization3.

Looking Ahead

Trees gave us our first taste of deeper structural results and rigorous proofs. In the next part, we will introduce directed graphs — graphs where edges have a direction, like one-way streets. This opens the door to modeling asymmetric relationships such as web links, task dependencies, and flows through networks.

Footnotes

  1. This is why trees are sometimes called minimally connected graphs: they are connected, but removing any edge breaks the connection.

  2. This is consistent with our earlier definition: in a rooted tree, a leaf is a vertex with no children, which means its only neighbor is its parent, so it has degree 11 — unless it is the root, which has degree 00 only when the tree consists of a single vertex.

  3. Efficient algorithms for finding minimum spanning trees include Kruskal's algorithm and Prim's algorithm, both of which run in time polynomial in the size of the graph.