Pigeonhole Principle (Elementary Applications)
Why This Matters
The pigeonhole principle is the simplest non-trivial counting argument:
If objects are placed in boxes, some box contains at least two objects.
A trivially obvious statement that is, nonetheless, the key to a remarkable number of olympiad and Putnam problems. The craft is not in the principle itself but in what to count and what the boxes are. Once those are chosen well, the problem often dissolves.
This page focuses on the technique-application side: how to recognize a pigeonhole problem and how to choose the counting structure. For the theorem statement and the generalized form, see pigeonhole-principle on TheoremPath.
Recognition cues for pigeonhole:
- The problem mentions a finite set (often forced by a constraint like "from these numbers...").
- The conclusion is "two of them are related" or "some configuration must occur" (existence statement).
- The "boxes" are not given; you have to find them.
- The problem feels like there should be more structure than there is: pigeonhole exploits exactly this lack of structure.
The Generalized Statement
The general form: if objects are placed in boxes, some box contains at least objects.
For , this gives the elementary version. For larger , it gives a more useful counting bound: 100 objects in 9 boxes forces some box to have .
The proof is by contradiction: if every box has at most objects, the total is less than , a contradiction.
Worked Example: IMO 1972 Problem 1
IMO 1972 Problem 1
Problem. Prove that from any 10 distinct two-digit positive integers, you can always select two disjoint subsets such that the elements of each subset have the same sum.
Solution. Consider all subsets of the 10 numbers. There are subsets. Each non-empty subset has a sum between 10 (smallest possible: just the number 10) and (sum of the 10 largest two-digit numbers, generously bounded). Tighten: the maximum possible sum is . So sums lie in , giving at most 945 distinct values.
By pigeonhole on subsets and possible sums, two distinct subsets have the same sum. They might overlap. Take and ; these are disjoint, both non-empty (else or vice versa, but they have the same sum so equal), and have equal sums (subtract the common part).
Where the cleverness is. The boxes are possible sums (); the objects are subsets (). Without this matching, the problem is opaque.
The IMO 1972 #1 solution is a paradigm: when the problem says "there exist two with property X," count "objects with property X" carefully, and pigeonhole forces a coincidence.
Worked Example: IMO 1985 Problem 4
IMO 1985 Problem 4
Problem. Given a set of 1985 distinct positive integers, none of which has a prime divisor greater than 23, prove that contains a subset of 4 elements whose product is the 4th power of an integer.
Solution sketch. The primes are , nine primes. Each element of factors as over these nine primes. Define the parity vector .
There are possible parity vectors. With 1985 elements in , by pigeonhole at least elements share the same parity vector. Take any two such elements; their product is a perfect square (since their exponents are equal mod 2). So we get many "perfect-square" pairs.
Now repeat: among the perfect-square pairs, the product of any two of them is a 4th power. With 4 elements sharing the same parity vector, we form pairs, each with square product; among these pairs, find two with square root of equal "shape": formally, repeat the parity argument on the square-roots of the products. This yields the 4-element subset with product a 4th power.
The structure. Two layers of pigeonhole: first to find square-product pairs, then to find a 4th-power subset. The box count is what makes 1985 the right size of in the problem.
This is the iterated-pigeonhole pattern: when one application gets you to "perfect square," a second application: on the right reformulated structure: gets you to "4th power." Recognizing iterated pigeonhole is a skill that comes with practice.
Worked Example: Geometry Pigeonhole
Five points in the unit square
Problem. Show that among any five points in a unit square, two of them are within distance of each other.
Solution. Divide the unit square into four sub-squares, each of side . By pigeonhole on 5 points and 4 sub-squares, some sub-square contains at least 2 of the points. The diameter of each sub-square is . So those two points are within .
Generalization. Among any points in a unit square, two are within . Divide into sub-squares of side ; pigeonhole; bound the diameter.
This pattern: pigeonhole on a geometric partition: is ubiquitous. Choose the partition cells so the in-cell diameter matches what the problem wants.
The Recognition Process
When facing a problem, ask:
- What is finite? Identify the finite set (often the "objects").
- What property is forced? Existence statement: two things related, three things in a configuration, etc.
- What are the boxes? This is the hard part. Common
choices:
- Residues mod (for number-theoretic problems).
- Parity vectors (for prime-factorization problems).
- Partition cells (for geometric problems).
- Sums modulo (for additive problems).
- Does the count match? gives elementary pigeonhole; otherwise use the generalized form with .
- What does the coincidence imply? The shared box must give exactly the property the problem asks for.
If steps 3 and 4 line up but step 5 doesn't, the boxes are wrong: re-choose.
Common Mistakes
Mismatching objects and boxes
The most common error: the boxes don't capture the property the problem asks about. If the conclusion is "two integers sum to a multiple of 7," the boxes should be "residues mod 7," not "even or odd."
Forgetting the generalized form
For "three of them are related," elementary pigeonhole isn't enough. You need to force a box with at least 3 objects. The general formula is what to remember.
Overcounting boxes
A common slip: putting an object in a box that doesn't really exist. For example, "residues mod 7" gives 7 boxes (0 through 6), not 6 (just the non-zero residues). Off-by-one errors at the box-count step are the most frequent computational mistake.
Pigeonhole when the structure is more
If the problem has linear-algebraic, group-theoretic, or graph-theoretic structure, pigeonhole alone may be too weak. The combinatorial nullstellensatz, the probabilistic method, or extremal graph theory may give sharper bounds. Pigeonhole is the entry tool, not the only tool.
Exercises
Problem
Show that among any 12 integers, two of them have the same remainder modulo 11.
Problem
Show that any sequence of distinct real numbers contains a monotone subsequence of length (Erdős-Szekeres 1935).
ML and Connections
-
VC-dimension and learning theory. The Sauer-Shelah lemma uses a pigeonhole-style argument to bound the number of shatterable subsets in a hypothesis class. The proof sketch is exactly "double counting plus pigeonhole."
-
Hashing and birthday paradox. Pigeonhole gives the collision bound for buckets. Used in cryptographic collision attacks and probabilistic data structures.
-
Counting arguments in deep learning. Lower bounds on network depth or width often use pigeonhole on the number of distinct functions a network can compute, vs. the number it must compute to fit the data.
Cross-Network Links
- ProofsPath: pigeonhole-principle-advanced treats Ramsey-style applications; extremal-principle is the natural complement; double-counting is the related combinatorial technique; proof-by-contradiction is the underlying logical move.
- TheoremPath: pigeonhole-principle is the theorem-statement canonical; counting-and-combinatorics is the broader combinatorial framing.
References
See structured references block. Primary entry points:
Engel Ch 1 for the technique treatment; Larson "Problem-Solving
Through Problems" Ch 1 for many worked examples; Andreescu-Feng
"102 Combinatorial Problems" for olympiad-flavored practice;
Aigner-Ziegler "Proofs from THE BOOK" for elegant proofs
including the Erdős-Szekeres theorem.