Skip to main content

ProofsPath · 11 min

Pigeonhole Principle (Elementary Applications)

Important
For:GeneralResearch

Pigeonhole Principle (Elementary Applications)

Why This Matters

The pigeonhole principle is the simplest non-trivial counting argument:

If n+1n + 1 objects are placed in nn boxes, some box contains at least two objects.

A trivially obvious statement that is, nonetheless, the key to a remarkable number of olympiad and Putnam problems. The craft is not in the principle itself but in what to count and what the boxes are. Once those are chosen well, the problem often dissolves.

This page focuses on the technique-application side: how to recognize a pigeonhole problem and how to choose the counting structure. For the theorem statement and the generalized form, see pigeonhole-principle on TheoremPath.

Recognition cues for pigeonhole:

  • The problem mentions a finite set (often forced by a constraint like "from these nn numbers...").
  • The conclusion is "two of them are related" or "some configuration must occur" (existence statement).
  • The "boxes" are not given; you have to find them.
  • The problem feels like there should be more structure than there is: pigeonhole exploits exactly this lack of structure.

The Generalized Statement

The general form: if nn objects are placed in kk boxes, some box contains at least n/k\lceil n/k \rceil objects.

For n>kn > k, this gives the elementary version. For larger n/kn/k, it gives a more useful counting bound: 100 objects in 9 boxes forces some box to have 100/9=12\lceil 100/9 \rceil = 12.

The proof is by contradiction: if every box has at most n/k1<n/k\lceil n/k \rceil - 1 < n/k objects, the total is less than kn/k=nk \cdot n/k = n, a contradiction.

Worked Example: IMO 1972 Problem 1

Example

IMO 1972 Problem 1

Problem. Prove that from any 10 distinct two-digit positive integers, you can always select two disjoint subsets such that the elements of each subset have the same sum.

Solution. Consider all subsets of the 10 numbers. There are 210=10242^{10} = 1024 subsets. Each non-empty subset has a sum between 10 (smallest possible: just the number 10) and 91+92++99=85591 + 92 + \ldots + 99 = 855 (sum of the 10 largest two-digit numbers, generously bounded). Tighten: the maximum possible sum is 90+91++99=94590 + 91 + \cdots + 99 = 945. So sums lie in {1,2,,945}\{1, 2, \ldots, 945\}, giving at most 945 distinct values.

By pigeonhole on 10241024 subsets and 945945 possible sums, two distinct subsets A,BA, B have the same sum. They might overlap. Take A=ABA' = A \setminus B and B=BAB' = B \setminus A; these are disjoint, both non-empty (else ABA \subseteq B or vice versa, but they have the same sum so equal), and have equal sums (subtract the common part).

Where the cleverness is. The boxes are possible sums (945\le 945); the objects are subsets (2102^{10}). Without this matching, the problem is opaque.

The IMO 1972 #1 solution is a paradigm: when the problem says "there exist two with property X," count "objects with property X" carefully, and pigeonhole forces a coincidence.

Worked Example: IMO 1985 Problem 4

Example

IMO 1985 Problem 4

Problem. Given a set MM of 1985 distinct positive integers, none of which has a prime divisor greater than 23, prove that MM contains a subset of 4 elements whose product is the 4th power of an integer.

Solution sketch. The primes 23\le 23 are {2,3,5,7,11,13,17,19,23}\{2, 3, 5, 7, 11, 13, 17, 19, 23\}, nine primes. Each element of MM factors as piai\prod p_i^{a_i} over these nine primes. Define the parity vector (a1mod2,,a9mod2){0,1}9(a_1 \bmod 2, \ldots, a_9 \bmod 2) \in \{0, 1\}^9.

There are 29=5122^9 = 512 possible parity vectors. With 1985 elements in MM, by pigeonhole at least 1985/512=4\lceil 1985/512 \rceil = 4 elements share the same parity vector. Take any two such elements; their product is a perfect square (since their exponents are equal mod 2). So we get many "perfect-square" pairs.

Now repeat: among the perfect-square pairs, the product of any two of them is a 4th power. With 4 elements sharing the same parity vector, we form (42)=6\binom{4}{2} = 6 pairs, each with square product; among these pairs, find two with square root of equal "shape": formally, repeat the parity argument on the square-roots of the products. This yields the 4-element subset with product a 4th power.

The structure. Two layers of pigeonhole: first to find square-product pairs, then to find a 4th-power subset. The 29=5122^9 = 512 box count is what makes 1985 the right size of MM in the problem.

This is the iterated-pigeonhole pattern: when one application gets you to "perfect square," a second application: on the right reformulated structure: gets you to "4th power." Recognizing iterated pigeonhole is a skill that comes with practice.

Worked Example: Geometry Pigeonhole

Example

Five points in the unit square

Problem. Show that among any five points in a unit square, two of them are within distance 2/2\sqrt{2}/2 of each other.

Solution. Divide the unit square into four sub-squares, each of side 1/21/2. By pigeonhole on 5 points and 4 sub-squares, some sub-square contains at least 2 of the points. The diameter of each sub-square is (1/2)2+(1/2)2=2/2\sqrt{(1/2)^2 + (1/2)^2} = \sqrt{2}/2. So those two points are within 2/2\sqrt{2}/2.

Generalization. Among any n2+1n^2 + 1 points in a unit square, two are within 2/n\sqrt{2}/n. Divide into n2n^2 sub-squares of side 1/n1/n; pigeonhole; bound the diameter.

This pattern: pigeonhole on a geometric partition: is ubiquitous. Choose the partition cells so the in-cell diameter matches what the problem wants.

The Recognition Process

When facing a problem, ask:

  1. What is finite? Identify the finite set (often the "objects").
  2. What property is forced? Existence statement: two things related, three things in a configuration, etc.
  3. What are the boxes? This is the hard part. Common choices:
    • Residues mod kk (for number-theoretic problems).
    • Parity vectors (for prime-factorization problems).
    • Partition cells (for geometric problems).
    • Sums modulo kk (for additive problems).
  4. Does the count match? #objects>#boxes\#\text{objects} > \#\text{boxes} gives elementary pigeonhole; otherwise use the generalized form with n/k\lceil n/k \rceil.
  5. What does the coincidence imply? The shared box must give exactly the property the problem asks for.

If steps 3 and 4 line up but step 5 doesn't, the boxes are wrong: re-choose.

Common Mistakes

Watch Out

Mismatching objects and boxes

The most common error: the boxes don't capture the property the problem asks about. If the conclusion is "two integers sum to a multiple of 7," the boxes should be "residues mod 7," not "even or odd."

Watch Out

Forgetting the generalized form

For "three of them are related," elementary pigeonhole isn't enough. You need n>2kn > 2k to force a box with at least 3 objects. The general formula n/k\lceil n/k \rceil is what to remember.

Watch Out

Overcounting boxes

A common slip: putting an object in a box that doesn't really exist. For example, "residues mod 7" gives 7 boxes (0 through 6), not 6 (just the non-zero residues). Off-by-one errors at the box-count step are the most frequent computational mistake.

Watch Out

Pigeonhole when the structure is more

If the problem has linear-algebraic, group-theoretic, or graph-theoretic structure, pigeonhole alone may be too weak. The combinatorial nullstellensatz, the probabilistic method, or extremal graph theory may give sharper bounds. Pigeonhole is the entry tool, not the only tool.

Exercises

ExerciseCore

Problem

Show that among any 12 integers, two of them have the same remainder modulo 11.

ExerciseAdvanced

Problem

Show that any sequence of n2+1n^2 + 1 distinct real numbers contains a monotone subsequence of length n+1n + 1 (Erdős-Szekeres 1935).

ML and Connections

  • VC-dimension and learning theory. The Sauer-Shelah lemma uses a pigeonhole-style argument to bound the number of shatterable subsets in a hypothesis class. The proof sketch is exactly "double counting plus pigeonhole."

  • Hashing and birthday paradox. Pigeonhole gives the Θ(n)\Theta(\sqrt{n}) collision bound for nn buckets. Used in cryptographic collision attacks and probabilistic data structures.

  • Counting arguments in deep learning. Lower bounds on network depth or width often use pigeonhole on the number of distinct functions a network can compute, vs. the number it must compute to fit the data.

Cross-Network Links

References

See structured references block. Primary entry points: Engel Ch 1 for the technique treatment; Larson "Problem-Solving Through Problems" Ch 1 for many worked examples; Andreescu-Feng "102 Combinatorial Problems" for olympiad-flavored practice; Aigner-Ziegler "Proofs from THE BOOK" for elegant proofs including the Erdős-Szekeres theorem.