Integer factorization
Fundamentals
Definition and Motivation
Integer factorization refers to the decomposition of a composite positive integer into its prime factors, expressing as a product of prime numbers.[9] This process involves identifying the primes such that , where each .[2] The problem appears deceptively simple for small values of ; for instance, factoring 15 yields , which can be verified by basic division.[10] However, as grows large—particularly when is a product of two large primes—the computational effort required escalates dramatically, making it infeasible with current classical methods for sufficiently large instances.[11] An illustrative example is 91, which factors as , but scaling this to hundreds of digits reveals the inherent challenge. Historically, interest in factorization arose in number theory from ancient efforts to understand prime numbers and their properties, as explored by Euclid in his Elements around 300 BCE, where he laid foundational work on divisibility and primes.[2] Euclid's propositions, such as those demonstrating the infinitude of primes and the structure of integers, motivated early studies in factorization as a means to explore the building blocks of natural numbers.[12] In modern computing, factorization's significance stems from its role as a computationally hard problem underpinning public-key cryptography, notably the RSA cryptosystem introduced in 1978, whose security depends on the difficulty of factoring large semiprimes.[11] This hardness ensures that encrypted messages remain secure against adversaries lacking efficient factoring capabilities, driving ongoing research in both theoretical mathematics and practical algorithms. The uniqueness of such decompositions is affirmed by the fundamental theorem of arithmetic.[10]Prime Factorization Theorem
The Fundamental Theorem of Arithmetic, also known as the unique factorization theorem, asserts that every integer greater than 1 can be expressed as a product of one or more prime numbers, and this representation is unique up to the order of the factors.[13] Formally, for any integer $ n > 1 $, there exist prime numbers $ p_1, p_2, \dots, p_k $ (not necessarily distinct) such that $ n = p_1 p_2 \cdots p_k $, and if $ n = q_1 q_2 \cdots q_m $ is another such factorization with primes $ q_j $, then $ k = m $ and the multisets $ {p_1, \dots, p_k} $ and $ {q_1, \dots, q_m} $ are identical.[13] The proof proceeds in two parts: establishing existence and proving uniqueness. For existence, the well-ordering principle of the natural numbers is invoked. Suppose there exists some integer greater than 1 without a prime factorization; among all such integers, let $ n $ be the smallest. Then $ n $ cannot be prime (as primes factorize trivially as themselves), so $ n = ab $ with $ 1 < a, b < n $. By minimality, $ a $ and $ b $ have prime factorizations, hence so does $ n $, a contradiction. Thus, every integer greater than 1 has at least one prime factorization.[13] Uniqueness relies on Euclid's lemma, which states that if a prime $ p $ divides a product $ ab $, then $ p $ divides $ a $ or $ p $ divides $ b $.[14] To prove Euclid's lemma, assume $ p \mid ab $ but $ p \nmid a $; then $ \gcd(p, a) = 1 $, so there exist integers $ x, y $ such that $ px + ay = 1 $ by Bézout's identity. Multiplying by $ b $ yields $ p(xb) + a(by) = b $, and since $ p \mid ab $, it follows that $ p \mid b $. With Euclid's lemma, uniqueness follows by contradiction: suppose two distinct factorizations $ n = p_1^{e_1} \cdots p_r^{e_r} = q_1^{f_1} \cdots q_s^{f_s} $; then some prime, say $ p_1 $, must divide the right side, hence divide some $ q_j $, implying $ p_1 = q_j $, and inducting on the exponents shows all match.[13][14] For example, the integer 12 factors as $ 12 = 2^2 \times 3 $, and this is the unique prime factorization, as any other decomposition (e.g., $ 12 = 4 \times 3 = 2^2 \times 3 $) reduces to the same primes with identical multiplicities.[13] This theorem underscores the multiplicative structure of the integers, ensuring that arithmetic operations like greatest common divisors and least common multiples can be computed via prime exponents, and it forms the basis for much of elementary number theory, such as the distribution of primes and properties of integers under multiplication.[13] While the theorem holds specifically for the ring of integers $ \mathbb{Z} $, where primes are irreducible and lead to unique factorization, other integral domains like the Gaussian integers $ \mathbb{Z}[i] $ also exhibit unique factorization into irreducibles (Gaussian primes), though the notion of primality differs— for instance, 2 factors as $ (1+i)(1-i) $ up to units.[15] However, not all rings share this property; the focus here remains on $ \mathbb{Z} $, where the theorem guarantees irreducibility aligns perfectly with primality.[15]Classical Algorithms
Trial Division and Basic Methods
Trial division is one of the simplest and most straightforward methods for factoring an integer . It involves systematically testing for divisibility by all integers starting from 2 up to , as any factor larger than would pair with a smaller factor already checked.[16] If is divisible by a candidate divisor , then is repeatedly divided by until it is no longer divisible, removing all powers of that factor; the process continues with the quotient until the remaining value is prime or 1.[16] To implement trial division efficiently, first handle the special case of even numbers by dividing by 2 as many times as possible. Then, test odd divisors from 3 up to , incrementing by 2 to skip evens. This optimization halves the number of checks after the initial step for odd .[16] A basic pseudocode outline is as follows:function trial_division(n):
factors = []
// Handle factor 2
while n % 2 == 0:
factors.append(2)
n = n / 2
// Check odd divisors
d = 3
while d * d <= n:
while n % d == 0:
factors.append(d)
n = n / d
d = d + 2
if n > 1:
factors.append(n)
return factors
This method guarantees complete factorization by the fundamental theorem of arithmetic, but its time complexity is in the worst case, making it practical only for small (e.g., up to around ) or when small factors are present.[16]
Fermat's factorization method exploits the identity for odd , where and are positive integers with . The algorithm starts with and increments for , checking if is a perfect square ; if so, the factors are and .[16] This approach is particularly efficient when the two prime factors of are close in value (i.e., is small), as the required is then minimal, roughly .[16]
For example, to factor , , so . Then , yielding factors and .[16] Another illustration is : , so start at . After checking, , giving factors and .[16]
Despite its simplicity, Fermat's method shares trial division's limitations for large , requiring up to steps in the worst case when factors are unbalanced (e.g., one very small and one near ). It is thus best suited for educational purposes or numbers with nearby factors, often serving as a preprocessing step before more advanced techniques.[16]
Special-Purpose Algorithms
Special-purpose algorithms for integer factorization are designed to exploit specific structures in the number to be factored, such as small prime factors or cases where a prime factor has a smooth value of . These methods are particularly effective when the integer is not a balanced semiprime but instead has weaknesses that general-purpose techniques might overlook.[2] Pollard's method targets prime factors where is smooth, meaning it is composed of small prime powers. The algorithm selects a base (often 2) and computes , where is the least common multiple of integers up to a bound . If divides , then by Fermat's Little Theorem, so divides . Computing then yields as a factor if this condition holds. The method proceeds in stages, increasing to cover larger smooth parts of , and is efficient for factors up to around 50-60 digits when is sufficiently smooth. This approach was introduced by John M. Pollard in 1974.[17][2] Pollard's rho algorithm addresses the discovery of small to medium-sized prime factors without relying on smoothness assumptions. It generates a pseudorandom sequence using an iterating function, typically with a random starting point and constant (often 1 or -2), and detects cycles in the sequence modulo a factor using Floyd's cycle-finding technique or Brent's variant for improved efficiency. When a collision but occurs, computing reveals . The expected time to find the smallest prime factor is proportional to , making it suitable for factors up to about 20-30 digits. Brent's modification optimizes the cycle detection by using powers of 2 for iteration steps, reducing constant factors. This method was developed by John M. Pollard in 1975.[18][2] The elliptic curve method (ECM) extends these ideas by using elliptic curves over the ring to probabilistically find factors. Starting with a random elliptic curve and point on it, the algorithm performs scalar multiplication for a large (often smooth or powered), which may cause the group order modulo to lead to a singular curve and thus a nontrivial gcd with . The success probability depends on the size of the factor , with optimal performance for factors around 40-60 digits, and decreases for larger or smaller ones. Multiple curves are tried with randomized parameters to increase chances. ECM was invented by Hendrik W. Lenstra Jr. in 1987 and has been widely implemented, such as in the GMP-ECM library, for screening factors before applying general methods.[19] These algorithms are best employed when has small or smooth factors, serving as a preprocessing step or standalone for moderately sized integers; for instance, Pollard's rho can quickly factor by detecting a cycle in the sequence starting from with , yielding after about 20 iterations. Trial division remains a simple fallback for very small factors below 10^6. In practice, implementations use random starting points for rho to avoid worst-case behaviors and stage optimizations for , while ECM benefits from parameter tuning based on expected factor sizes, often running hundreds of curves in parallel on modern hardware.[18][19]General-Purpose Algorithms
General-purpose algorithms for integer factorization are designed to handle large composite numbers without exploiting specific structural properties of , making them suitable for arbitrary inputs. These methods, primarily the Quadratic Sieve (QS) and its variants, along with the Number Field Sieve (NFS), represent the state-of-the-art classical approaches for factoring numbers up to hundreds of digits. They rely on probabilistic sieving techniques to identify smooth values and linear algebra to extract non-trivial factorizations from relations among those values.[20][21] The Quadratic Sieve, introduced by Carl Pomerance in 1984, operates by generating quadratic residues of the form for integer near , and sieving to find those that are smooth over a carefully chosen factor base of small primes.[20] Once a sufficient number of smooth relations are collected—exceeding the size of the factor base by a small margin—the algorithm constructs a matrix over the finite field whose rows correspond to the exponent vectors of the prime factors in the relations.[20] Finding a non-trivial dependency in this matrix (via Gaussian elimination) yields a congruence of squares modulo , from which factors can be extracted using the difference of squares. The factor base size, and thus the matrix dimension, is asymptotically on the order of , balancing the probabilities of smoothness against computational cost.[20] To improve efficiency for larger , variants like the Multiple Polynomial Quadratic Sieve (MPQS) and Self-Initializing Quadratic Sieve (SIQS) employ multiple quadratic polynomials instead of a single one, allowing for better coverage of the residue space and reduced sieving overhead.[22] MPQS, developed as an extension of QS, selects a family of polynomials with varying leading coefficients , switching polynomials periodically to optimize smoothness detection.[22] SIQS further automates this process by self-initializing the polynomial set based on the prime factors of , making it more practical for implementation without manual tuning.[23] The Number Field Sieve (NFS), which generalizes the QS to algebraic number fields, is the most powerful general-purpose method and has superseded QS for numbers beyond about 100 digits. Introduced in its modern form by Arjen K. Lenstra and colleagues in 1990, the General Number Field Sieve (GNFS) works for arbitrary , while the Special Number Field Sieve (SNFS) applies to with special forms, such as Fermat numbers.[21] GNFS proceeds in four main phases: polynomial selection, where irreducible polynomials and of low degree are chosen such that for some integer ; sieving to collect relations; linear algebra over ; and finally, a descent phase to refine dependencies into actual factors.[24][21] Central to NFS sieving is identifying coprime integers and (with small and ) such that both the rational norm and the algebraic norm (where is the number field defined by and its root) factor smoothly over the respective factor bases.[24] The linear algebra phase mirrors QS, solving for the kernel of a large sparse matrix to produce virtual square roots, leading to factorizations via gcd computations.[24] Notable achievements include the factorization of the 768-bit RSA-768 modulus in December 2009 using GNFS, which required approximately 2 years of distributed computation across hundreds of machines.[25] As of February 2020, the record for a general-purpose classical factorization is RSA-250, a 250-digit (829-bit) number factored using GNFS, equivalent to about 2700 core-years of effort.[8]Quantum and Emerging Methods
Shor's Algorithm
Shor's algorithm, introduced by Peter Shor in 1994, is a polynomial-time quantum algorithm for integer factorization that runs on a quantum computer. It solves the problem by reducing it to the task of finding the period of the modular exponential function $ f(x) = a^x \mod n $, where $ n $ is the integer to be factored and $ a $ is a randomly chosen integer coprime to $ n $. This approach leverages quantum superposition and interference to exponentially speed up period detection compared to classical methods.[26] The algorithm proceeds in several key steps. First, a random base $ a $ is selected such that $ \gcd(a, n) = 1 $; if not coprime, a trivial factor is found immediately. The period $ r $ of $ f(x) $ is then determined, where $ r $ is the smallest positive integer satisfying the equation
This period-finding subroutine employs quantum phase estimation on a superposition of states to evaluate $ f(x) $ in parallel across many values of $ x $. The quantum Fourier transform (QFT) plays a central role here, enabling the efficient computation of the discrete Fourier transform to extract the frequency corresponding to the period; the QFT can be realized via a quantum circuit with depth $ O((\log n)^2) $. Once $ r $ is obtained, if $ r $ is even and $ a^{r/2} \not\equiv -1 \pmod{n} $, non-trivial factors of $ n $ are computed classically as $ \gcd(a^{r/2} - 1, n) $ and $ \gcd(a^{r/2} + 1, n) $. If these conditions fail, the process repeats with a new $ a $.[26][6]
In terms of complexity, Shor's algorithm requires $ O((\log n)^3) $ quantum gates to factor an $ n $-bit integer, making it efficient on a sufficiently powerful quantum device. For a 2048-bit integer, such as those used in modern RSA keys, the algorithm demands approximately 4000 logical qubits. The QFT and modular exponentiation subroutines dominate the gate count, but optimizations like windowed arithmetic can reduce resource needs further.[6]
Despite its theoretical efficiency, Shor's algorithm necessitates a fault-tolerant quantum computer with high-fidelity gates and error correction to handle the required circuit depth. No large-scale demonstrations have been achieved to date; the earliest experimental success was factoring the small integer 15 = 3 × 5 in 2001 using a 7-qubit nuclear magnetic resonance system. Subsequent implementations remain limited to numbers with few digits due to current hardware constraints.
Recent Quantum and Hybrid Advances
Since the introduction of Shor's algorithm in 1994, which provides a polynomial-time quantum method for integer factorization, recent efforts have focused on practical implementations and hybrid approaches to address hardware limitations. In 2024, researchers demonstrated a quantum annealing-based approach using D-Wave's hardware to factor a specially constructed 2048-bit integer resembling an RSA modulus, where the factors differ by only two bits in their binary representation, via a combinatorial optimization mapping that reduces the problem to a quadratic unconstrained binary optimization (QUBO) formulation.[27] This method succeeded on small instances but highlights the potential of annealing for structured factorization problems, though it does not apply to general semiprimes without such close factors.[28] Advancing trapped-ion quantum computing, a 2025 experiment factored the integer 21 using a Schnorr-inspired algorithm adapted to a fixed-point quantum approximate optimization algorithm (QAOA) on a five-qubit ion processor, achieving high fidelity despite noise by encoding the difference of squares in a variational framework.[29] This marks a step toward scalable implementations of class-group-based factoring on near-term devices, demonstrating error rates below 1% for the circuit depth involved.[29] Theoretically, a 2025 advancement from Google Quantum AI reduced the resource requirements for factoring 2048-bit RSA integers using an optimized version of Shor's algorithm, estimating that fewer than one million noisy qubits could suffice with runtime under one week, achieved through approximate residue arithmetic, windowed modular exponentiation, and improved quantum Fourier transform (QFT) implementations alongside surface-code error correction.[30] This represents a roughly 20-fold decrease in physical qubit needs compared to prior estimates, emphasizing trade-offs in Toffoli gate counts and magic state distillation.[31] A novel 2025 quantum algorithm extends Fermat's method by encoding the search for factors near the square root into a quantum period-finding subroutine, enabling factorization of odd composites with reduced ancillary qubits compared to standard Shor variants, though experimental validation remains limited to small numbers like 15 and 21.[32] In April 2025, a team of Chinese researchers factored a 48-bit RSA-style integer using a 10-qubit gate-based quantum computer, setting a new experimental record for quantum factorization beyond pure Shor's implementations.[33] In June 2025, researchers proposed a theoretical quantum algorithm capable of factoring integers of any size using just one qubit and three auxiliary oscillators, representing a significant departure from traditional qubit-heavy approaches like Shor's.[34] Despite these advances, key challenges persist in quantum factorization, including qubit decoherence and gate infidelity, which limit circuit depths to under 100 operations on current hardware, and scalability issues requiring millions of logical qubits for cryptographically relevant sizes. As of November 2025, the largest experimentally verified quantum factorization using pure Shor's algorithm on gate-based systems remains 21, with other methods achieving up to 48 bits.[29][33]Performance Analysis
Heuristic Running Times
Heuristic running times for integer factorization algorithms provide average-case or probabilistic estimates based on empirical models and assumptions about the distribution of prime factors and smooth numbers. These estimates often rely on the density of smooth integers and random mapping behaviors, offering practical predictions for performance on typical inputs, such as semiprimes used in cryptography. Unlike worst-case analyses, they assume favorable conditions like random-like behavior in iterations or sieving yields, enabling optimizations in real-world implementations. For trial division, the worst-case time complexity is , as it requires checking divisors up to when is prime or has large factors. However, for a random integer , the average case is much faster, as the smallest prime factor is typically small, with an expected number of trials on the order of when using a list of primes, leading to practical efficiency for numbers with modest factors. This makes trial division suitable as a preprocessing step before more advanced methods. Pollard's rho algorithm operates under the heuristic assumption that its pseudorandom mapping behaves like a random function modulo the smallest prime factor of , producing a collision and thus a factor in expected time . This reliance on the birthday paradox in a cycle of length approximately makes the method particularly effective for finding small to medium-sized factors, with constant space usage independent of . The heuristic has been rigorously analyzed under random mapping assumptions, confirming its expected performance for typical composite . The quadratic sieve (QS) achieves its heuristic running time of through two main phases: relation collection via sieving for smooth quadratic residues modulo , and solving a sparse linear system over for dependencies among relations. The sieving phase dominates, relying on the density of -smooth numbers near to yield sufficient relations, with the optimal smoothness bound balancing sieve interval size and matrix density. The matrix solving step adds a subdominant cost of where and , but heuristics confirm the overall exponent as the leading term. This analysis, developed by Pomerance, assumes the independence of Legendre symbols and smoothness probabilities. For the general number field sieve (GNFS), the heuristic time complexity is , derived from the densities of smooth elements in the rational and algebraic number fields. The constant 1.923 arises from optimizing the smoothness probabilities for ideals in the sieve, with the relation collection phase (line sieving over polynomial pairs) and linear algebra both scaling as the dominant term under the generalized Riemann hypothesis for smoothness distributions. This subexponential form outperforms QS for large , as validated in implementations. Empirical validations using the CADO-NFS software package confirm these heuristics for challenging instances like RSA-1024, a 1024-bit semiprime. Parameter optimizations in CADO-NFS, such as polynomial selection and sieving schedules, yield factorization times aligning closely with the GNFS heuristic, estimated to require approximately 500,000 core-years on 2.2 GHz CPUs for the full process, with sieving comprising over 80% of the effort.[35] Records for smaller RSA challenges (e.g., RSA-768 in 2009, RSA-250 in 2020) further match predicted yields, demonstrating the reliability of smooth number density assumptions in practice. Several factors influence the practical realization of these heuristic times, including memory usage for storing relations and sieve arrays, which can bottleneck large-scale runs if exceeding available RAM. Parallelism via distributed sieving across clusters accelerates the dominant sieving phase, with multi-core or GPU implementations scaling near-linearly up to thousands of nodes, though communication overhead limits efficiency. For large , the sieving phase overwhelmingly dominates, often accounting for 90% or more of total compute time, as small-prime sieving becomes computationally intensive while large-prime cofactoring remains subdominant.Rigorous Running Times and Complexity
Integer factorization is known to require subexponential time on classical computers, with no polynomial-time algorithm established despite extensive research. The best rigorous upper bound for general-purpose classical algorithms is provided by variants of the number field sieve (NFS), achieving expected running time where , under the generalized Riemann hypothesis (GRH) for some analyses, though unconditional bounds are slightly worse.[36] A seminal rigorous result is the Schnorr–Seysen–Lenstra (SSL) algorithm from 1982, which uses class group relations in quadratic number fields to achieve an expected running time of . This bound was rigorously proven without GRH assumptions by Lenstra and Pomerance in 1992, marking a key advancement in provable subexponential factorization.[37][38] For randomized algorithms like Pollard's rho method, a rigorous expected running time of for any holds under the extended Riemann hypothesis (ERH), as established by Miller in 1976. This analysis relies on bounds on the least quadratic non-residue, confirming the method's efficiency for finding small factors when the hypothesis is assumed.[39] Regarding lower bounds, the decision version of integer factorization lies in , implying it is not -complete unless . No proof of -hardness exists, positioning it as -intermediate, though it is widely believed to be computationally hard, serving as a foundational one-way function in cryptography.[40][41] In the quantum setting, Shor's algorithm provides a rigorous polynomial-time solution, factoring in time on a quantum computer with sufficient qubits, fundamentally altering the complexity landscape.[6] Subexponential running times in factorization are often expressed using the notation, defined as
which interpolates between polynomial () and exponential () growth for . This formalism precisely captures the asymptotic behavior of classical algorithms like NFS.[42]