Integer factorization

Integer factorization is the process of decomposing a positive integer (specifically, a composite number) into a product of smaller positive integers, ideally prime numbers, known as its prime factors.^[1] This problem is a cornerstone of number theory, with practical significance in areas such as computational complexity and cryptography, where the difficulty of factoring large semiprimes—products of two large primes—forms the security basis for public-key cryptosystems like RSA.^[2]^[3] While small integers can be factored efficiently using basic methods like trial division, which systematically tests divisors up to the square root of the number, larger composites require more sophisticated algorithms.^[4] Notable classical algorithms include Pollard's rho method for finding small factors, the quadratic sieve for general-purpose factoring, and the general number field sieve (GNFS), the fastest known classical approach for very large numbers, with subexponential time complexity.^[4]^[5] In the quantum computing domain, Shor's algorithm provides a polynomial-time solution to factorization, leveraging quantum Fourier transforms to solve the period-finding problem, thereby threatening the long-term viability of factorization-based encryption.^[6] Despite advances, factoring record-breaking semiprimes remains resource-intensive; as of 2025, the largest RSA challenge number factored classically was RSA-250 (829 bits), underscoring the ongoing computational challenge.^[7]^[8]

Fundamentals

Definition and Motivation

Integer factorization refers to the decomposition of a composite positive integer

n > 1

into its prime factors, expressing

n

as a product of prime numbers.^[9] This process involves identifying the primes

p_1, p_2, \dots, p_k

such that

n = p_1^{e_1} p_2^{e_2} \cdots p_k^{e_k}

, where each

e_i \geq 1

.^[2] The problem appears deceptively simple for small values of

n

; for instance, factoring 15 yields

15 = 3 \times 5

, which can be verified by basic division.^[10] However, as

n

grows large—particularly when

n

is a product of two large primes—the computational effort required escalates dramatically, making it infeasible with current classical methods for sufficiently large instances.^[11] An illustrative example is 91, which factors as

91 = 7 \times 13

, but scaling this to hundreds of digits reveals the inherent challenge. Historically, interest in factorization arose in number theory from ancient efforts to understand prime numbers and their properties, as explored by Euclid in his Elements around 300 BCE, where he laid foundational work on divisibility and primes.^[2] Euclid's propositions, such as those demonstrating the infinitude of primes and the structure of integers, motivated early studies in factorization as a means to explore the building blocks of natural numbers.^[12] In modern computing, factorization's significance stems from its role as a computationally hard problem underpinning public-key cryptography, notably the RSA cryptosystem introduced in 1978, whose security depends on the difficulty of factoring large semiprimes.^[11] This hardness ensures that encrypted messages remain secure against adversaries lacking efficient factoring capabilities, driving ongoing research in both theoretical mathematics and practical algorithms. The uniqueness of such decompositions is affirmed by the fundamental theorem of arithmetic.^[10]

Prime Factorization Theorem

The Fundamental Theorem of Arithmetic, also known as the unique factorization theorem, asserts that every integer greater than 1 can be expressed as a product of one or more prime numbers, and this representation is unique up to the order of the factors.^[13] Formally, for any integer $ n > 1 $, there exist prime numbers $ p_1, p_2, \dots, p_k $ (not necessarily distinct) such that $ n = p_1 p_2 \cdots p_k $, and if $ n = q_1 q_2 \cdots q_m $ is another such factorization with primes $ q_j $, then $ k = m $ and the multisets $ {p_1, \dots, p_k} $ and $ {q_1, \dots, q_m} $ are identical.^[13] The proof proceeds in two parts: establishing existence and proving uniqueness. For existence, the well-ordering principle of the natural numbers is invoked. Suppose there exists some integer greater than 1 without a prime factorization; among all such integers, let $ n $ be the smallest. Then $ n $ cannot be prime (as primes factorize trivially as themselves), so $ n = ab $ with $ 1 < a, b < n $. By minimality, $ a $ and $ b $ have prime factorizations, hence so does $ n $, a contradiction. Thus, every integer greater than 1 has at least one prime factorization.^[13] Uniqueness relies on Euclid's lemma, which states that if a prime $ p $ divides a product $ ab $, then $ p $ divides $ a $ or $ p $ divides $ b $.^[14] To prove Euclid's lemma, assume $ p \mid ab $ but $ p \nmid a $; then $ \gcd(p, a) = 1 $, so there exist integers $ x, y $ such that $ px + ay = 1 $ by Bézout's identity. Multiplying by $ b $ yields $ p(xb) + a(by) = b $, and since $ p \mid ab $, it follows that $ p \mid b $. With Euclid's lemma, uniqueness follows by contradiction: suppose two distinct factorizations $ n = p_1^{e_1} \cdots p_r^{e_r} = q_1^{f_1} \cdots q_s^{f_s} $; then some prime, say $ p_1 $, must divide the right side, hence divide some $ q_j $, implying $ p_1 = q_j $, and inducting on the exponents shows all match.^[13]^[14] For example, the integer 12 factors as $ 12 = 2^2 \times 3 $, and this is the unique prime factorization, as any other decomposition (e.g., $ 12 = 4 \times 3 = 2^2 \times 3 $) reduces to the same primes with identical multiplicities.^[13] This theorem underscores the multiplicative structure of the integers, ensuring that arithmetic operations like greatest common divisors and least common multiples can be computed via prime exponents, and it forms the basis for much of elementary number theory, such as the distribution of primes and properties of integers under multiplication.^[13] While the theorem holds specifically for the ring of integers $ \mathbb{Z} $, where primes are irreducible and lead to unique factorization, other integral domains like the Gaussian integers $ \mathbb{Z}[i] $ also exhibit unique factorization into irreducibles (Gaussian primes), though the notion of primality differs— for instance, 2 factors as $ (1+i)(1-i) $ up to units.^[15] However, not all rings share this property; the focus here remains on $ \mathbb{Z} $, where the theorem guarantees irreducibility aligns perfectly with primality.^[15]

Classical Algorithms

Trial Division and Basic Methods

Trial division is one of the simplest and most straightforward methods for factoring an integer

n > 1

. It involves systematically testing for divisibility by all integers starting from 2 up to

\sqrt{n}

, as any factor larger than

\sqrt{n}

would pair with a smaller factor already checked.^[16] If

n

is divisible by a candidate divisor

d

, then

n

is repeatedly divided by

d

until it is no longer divisible, removing all powers of that factor; the process continues with the quotient until the remaining value is prime or 1.^[16] To implement trial division efficiently, first handle the special case of even numbers by dividing

n

by 2 as many times as possible. Then, test odd divisors from 3 up to

\sqrt{n}

, incrementing by 2 to skip evens. This optimization halves the number of checks after the initial step for odd

n

.^[16] A basic pseudocode outline is as follows:

function trial_division(n):
    factors = []
    // Handle factor 2
    while n % 2 == 0:
        factors.append(2)
        n = n / 2
    // Check odd divisors
    d = 3
    while d * d <= n:
        while n % d == 0:
            factors.append(d)
            n = n / d
        d = d + 2
    if n > 1:
        factors.append(n)
    return factors

This method guarantees complete factorization by the fundamental theorem of arithmetic, but its time complexity is

O(\sqrt{n})

in the worst case, making it practical only for small

n

(e.g., up to around

10^{12}

) or when small factors are present.^[16] Fermat's factorization method exploits the identity

n = x^2 - y^2 = (x - y)(x + y)

for odd

n

, where

x

and

y

are positive integers with

x > y

. The algorithm starts with

x_0 = \lceil \sqrt{n} \rceil

and increments

x = x_0 + k

for

k = 0, 1, 2, \dots

, checking if

x^2 - n

is a perfect square

y^2

; if so, the factors are

x - y

and

x + y

.^[16] This approach is particularly efficient when the two prime factors of

n = pq

are close in value (i.e.,

|p - q|

is small), as the required

k

is then minimal, roughly

(p - q)/2

.^[16] For example, to factor

n = 91 = 7 \times 13

\sqrt{91} \approx 9.54

, so

x_0 = 10

. Then

10^2 - 91 = 9 = 3^2

, yielding factors

10 - 3 = 7

and

10 + 3 = 13

.^[16] Another illustration is

n = 5959 = 59 \times 101

\sqrt{5959} \approx 77.20

, so start at

x = 78

. After checking,

80^2 - 5959 = 441 = 21^2

, giving factors

80 - 21 = 59

and

80 + 21 = 101

.^[16] Despite its simplicity, Fermat's method shares trial division's limitations for large

n

, requiring up to

O(n^{1/4})

steps in the worst case when factors are unbalanced (e.g., one very small and one near

n

). It is thus best suited for educational purposes or numbers with nearby factors, often serving as a preprocessing step before more advanced techniques.^[16]

Special-Purpose Algorithms

Special-purpose algorithms for integer factorization are designed to exploit specific structures in the number to be factored, such as small prime factors or cases where a prime factor

p

has a smooth value of

p-1

. These methods are particularly effective when the integer

n

is not a balanced semiprime but instead has weaknesses that general-purpose techniques might overlook.^[2] Pollard's

p-1

method targets prime factors

p

where

p-1

is smooth, meaning it is composed of small prime powers. The algorithm selects a base

g

(often 2) and computes

g^E \mod n

, where

E

is the least common multiple of integers up to a bound

B

. If

p-1

divides

E

, then

g^E \equiv 1 \mod p

by Fermat's Little Theorem, so

p

divides

g^E - 1

. Computing

\gcd(g^E - 1, n)

then yields

p

as a factor if this condition holds. The method proceeds in stages, increasing

B

to cover larger smooth parts of

p-1

, and is efficient for factors up to around 50-60 digits when

p-1

is sufficiently smooth. This approach was introduced by John M. Pollard in 1974.^[17]^[2] Pollard's rho algorithm addresses the discovery of small to medium-sized prime factors without relying on smoothness assumptions. It generates a pseudorandom sequence using an iterating function, typically

f(x) = x^2 + c \mod n

with a random starting point

x_0

and constant

c

(often 1 or -2), and detects cycles in the sequence modulo a factor

p

using Floyd's cycle-finding technique or Brent's variant for improved efficiency. When a collision

x_i \equiv x_j \mod p

but

x_i \not\equiv x_j \mod n

occurs, computing

\gcd(|x_i - x_j|, n)

reveals

p

. The expected time to find the smallest prime factor

p

is proportional to

\sqrt{p}

, making it suitable for factors up to about 20-30 digits. Brent's modification optimizes the cycle detection by using powers of 2 for iteration steps, reducing constant factors. This method was developed by John M. Pollard in 1975.^[18]^[2] The elliptic curve method (ECM) extends these ideas by using elliptic curves over the ring

\mathbb{Z}/n\mathbb{Z}

to probabilistically find factors. Starting with a random elliptic curve

E: y^2 = x^3 + ax + b \mod n

and point

P

on it, the algorithm performs scalar multiplication

kP

for a large

k

(often smooth or powered), which may cause the group order modulo

p

to lead to a singular curve and thus a nontrivial gcd with

n

. The success probability depends on the size of the factor

p

, with optimal performance for factors around 40-60 digits, and decreases for larger or smaller ones. Multiple curves are tried with randomized parameters to increase chances. ECM was invented by Hendrik W. Lenstra Jr. in 1987 and has been widely implemented, such as in the GMP-ECM library, for screening factors before applying general methods.^[19] These algorithms are best employed when

n

has small or smooth factors, serving as a preprocessing step or standalone for moderately sized integers; for instance, Pollard's rho can quickly factor

n = 8051 = 83 \times 97

by detecting a cycle in the sequence starting from

x_0 = 2

with

f(x) = x^2 + 1 \mod n

, yielding

\gcd(48, 8051) = 97

after about 20 iterations. Trial division remains a simple fallback for very small factors below 10^6. In practice, implementations use random starting points for rho to avoid worst-case behaviors and stage optimizations for

p-1

, while ECM benefits from parameter tuning based on expected factor sizes, often running hundreds of curves in parallel on modern hardware.^[18]^[19]

General-Purpose Algorithms

General-purpose algorithms for integer factorization are designed to handle large composite numbers

N

without exploiting specific structural properties of

N

, making them suitable for arbitrary inputs. These methods, primarily the Quadratic Sieve (QS) and its variants, along with the Number Field Sieve (NFS), represent the state-of-the-art classical approaches for factoring numbers up to hundreds of digits. They rely on probabilistic sieving techniques to identify smooth values and linear algebra to extract non-trivial factorizations from relations among those values.^[20]^[21] The Quadratic Sieve, introduced by Carl Pomerance in 1984, operates by generating quadratic residues of the form

Q(x) = \left(x + \lfloor \sqrt{N} \rfloor \right)^2 - N

for integer

x

near

\sqrt{N}

, and sieving to find those that are smooth over a carefully chosen factor base of small primes.^[20] Once a sufficient number of smooth relations are collected—exceeding the size of the factor base by a small margin—the algorithm constructs a matrix over the finite field

\mathrm{GF}(2)

whose rows correspond to the exponent vectors of the prime factors in the relations.^[20] Finding a non-trivial dependency in this matrix (via Gaussian elimination) yields a congruence of squares modulo

N

, from which factors can be extracted using the difference of squares. The factor base size, and thus the matrix dimension, is asymptotically on the order of

\exp\left(\sqrt{\log N \log \log N}\right)

, balancing the probabilities of smoothness against computational cost.^[20] To improve efficiency for larger

N

, variants like the Multiple Polynomial Quadratic Sieve (MPQS) and Self-Initializing Quadratic Sieve (SIQS) employ multiple quadratic polynomials instead of a single one, allowing for better coverage of the residue space and reduced sieving overhead.^[22] MPQS, developed as an extension of QS, selects a family of polynomials

Q_i(x) = (a_i x + b_i)^2 - N

with varying leading coefficients

a_i

, switching polynomials periodically to optimize smoothness detection.^[22] SIQS further automates this process by self-initializing the polynomial set based on the prime factors of

N

, making it more practical for implementation without manual tuning.^[23] The Number Field Sieve (NFS), which generalizes the QS to algebraic number fields, is the most powerful general-purpose method and has superseded QS for numbers beyond about 100 digits. Introduced in its modern form by Arjen K. Lenstra and colleagues in 1990, the General Number Field Sieve (GNFS) works for arbitrary

N

, while the Special Number Field Sieve (SNFS) applies to

N

with special forms, such as Fermat numbers.^[21] GNFS proceeds in four main phases: polynomial selection, where irreducible polynomials

f

and

g

of low degree are chosen such that

f(m) \equiv 0 \pmod{N}

for some integer

m

; sieving to collect relations; linear algebra over

\mathrm{GF}(2)

; and finally, a descent phase to refine dependencies into actual factors.^[24]^[21] Central to NFS sieving is identifying coprime integers

a

and

b

(with small

|a|

and

|b|

) such that both the rational norm

N_\mathbb{Q}(a/b) = a - b m

and the algebraic norm

N_K(a/b) = \mathrm{Norm}_{K/\mathbb{Q}}((a + b \theta)/b)

(where

K

is the number field defined by

g

and

\theta

its root) factor smoothly over the respective factor bases.^[24] The linear algebra phase mirrors QS, solving for the kernel of a large sparse matrix to produce virtual square roots, leading to factorizations via gcd computations.^[24] Notable achievements include the factorization of the 768-bit RSA-768 modulus in December 2009 using GNFS, which required approximately 2 years of distributed computation across hundreds of machines.^[25] As of February 2020, the record for a general-purpose classical factorization is RSA-250, a 250-digit (829-bit) number factored using GNFS, equivalent to about 2700 core-years of effort.^[8]

Quantum and Emerging Methods

Shor's Algorithm

Shor's algorithm, introduced by Peter Shor in 1994, is a polynomial-time quantum algorithm for integer factorization that runs on a quantum computer. It solves the problem by reducing it to the task of finding the period of the modular exponential function $ f(x) = a^x \mod n $, where $ n $ is the integer to be factored and $ a $ is a randomly chosen integer coprime to $ n $. This approach leverages quantum superposition and interference to exponentially speed up period detection compared to classical methods.^[26] The algorithm proceeds in several key steps. First, a random base $ a $ is selected such that $ \gcd(a, n) = 1 $; if not coprime, a trivial factor is found immediately. The period $ r $ of $ f(x) $ is then determined, where $ r $ is the smallest positive integer satisfying the equation

a^r \equiv 1 \pmod{n}.

This period-finding subroutine employs quantum phase estimation on a superposition of states to evaluate $ f(x) $ in parallel across many values of $ x $. The quantum Fourier transform (QFT) plays a central role here, enabling the efficient computation of the discrete Fourier transform to extract the frequency corresponding to the period; the QFT can be realized via a quantum circuit with depth $ O((\log n)^2) $. Once $ r $ is obtained, if $ r $ is even and $ a^{r/2} \not\equiv -1 \pmod{n} $, non-trivial factors of $ n $ are computed classically as $ \gcd(a^{r/2} - 1, n) $ and $ \gcd(a^{r/2} + 1, n) $. If these conditions fail, the process repeats with a new $ a $.^[26]^[6] In terms of complexity, Shor's algorithm requires $ O((\log n)^3) $ quantum gates to factor an $ n $-bit integer, making it efficient on a sufficiently powerful quantum device. For a 2048-bit integer, such as those used in modern RSA keys, the algorithm demands approximately 4000 logical qubits. The QFT and modular exponentiation subroutines dominate the gate count, but optimizations like windowed arithmetic can reduce resource needs further.^[6] Despite its theoretical efficiency, Shor's algorithm necessitates a fault-tolerant quantum computer with high-fidelity gates and error correction to handle the required circuit depth. No large-scale demonstrations have been achieved to date; the earliest experimental success was factoring the small integer 15 = 3 × 5 in 2001 using a 7-qubit nuclear magnetic resonance system. Subsequent implementations remain limited to numbers with few digits due to current hardware constraints.

Recent Quantum and Hybrid Advances

Since the introduction of Shor's algorithm in 1994, which provides a polynomial-time quantum method for integer factorization, recent efforts have focused on practical implementations and hybrid approaches to address hardware limitations. In 2024, researchers demonstrated a quantum annealing-based approach using D-Wave's hardware to factor a specially constructed 2048-bit integer resembling an RSA modulus, where the factors differ by only two bits in their binary representation, via a combinatorial optimization mapping that reduces the problem to a quadratic unconstrained binary optimization (QUBO) formulation.^[27] This method succeeded on small instances but highlights the potential of annealing for structured factorization problems, though it does not apply to general semiprimes without such close factors.^[28] Advancing trapped-ion quantum computing, a 2025 experiment factored the integer 21 using a Schnorr-inspired algorithm adapted to a fixed-point quantum approximate optimization algorithm (QAOA) on a five-qubit ion processor, achieving high fidelity despite noise by encoding the difference of squares in a variational framework.^[29] This marks a step toward scalable implementations of class-group-based factoring on near-term devices, demonstrating error rates below 1% for the circuit depth involved.^[29] Theoretically, a 2025 advancement from Google Quantum AI reduced the resource requirements for factoring 2048-bit RSA integers using an optimized version of Shor's algorithm, estimating that fewer than one million noisy qubits could suffice with runtime under one week, achieved through approximate residue arithmetic, windowed modular exponentiation, and improved quantum Fourier transform (QFT) implementations alongside surface-code error correction.^[30] This represents a roughly 20-fold decrease in physical qubit needs compared to prior estimates, emphasizing trade-offs in Toffoli gate counts and magic state distillation.^[31] A novel 2025 quantum algorithm extends Fermat's method by encoding the search for factors near the square root into a quantum period-finding subroutine, enabling factorization of odd composites with reduced ancillary qubits compared to standard Shor variants, though experimental validation remains limited to small numbers like 15 and 21.^[32] In April 2025, a team of Chinese researchers factored a 48-bit RSA-style integer using a 10-qubit gate-based quantum computer, setting a new experimental record for quantum factorization beyond pure Shor's implementations.^[33] In June 2025, researchers proposed a theoretical quantum algorithm capable of factoring integers of any size using just one qubit and three auxiliary oscillators, representing a significant departure from traditional qubit-heavy approaches like Shor's.^[34] Despite these advances, key challenges persist in quantum factorization, including qubit decoherence and gate infidelity, which limit circuit depths to under 100 operations on current hardware, and scalability issues requiring millions of logical qubits for cryptographically relevant sizes. As of November 2025, the largest experimentally verified quantum factorization using pure Shor's algorithm on gate-based systems remains 21, with other methods achieving up to 48 bits.^[29]^[33]

Performance Analysis

Heuristic Running Times

Heuristic running times for integer factorization algorithms provide average-case or probabilistic estimates based on empirical models and assumptions about the distribution of prime factors and smooth numbers. These estimates often rely on the density of smooth integers and random mapping behaviors, offering practical predictions for performance on typical inputs, such as semiprimes used in cryptography. Unlike worst-case analyses, they assume favorable conditions like random-like behavior in iterations or sieving yields, enabling optimizations in real-world implementations. For trial division, the worst-case time complexity is

O(\sqrt{n})

, as it requires checking divisors up to

\sqrt{n}

when

n

is prime or has large factors. However, for a random integer

n

, the average case is much faster, as the smallest prime factor

p

is typically small, with an expected number of trials on the order of

\sqrt{n} / \log \sqrt{n}

when using a list of primes, leading to practical efficiency for numbers with modest factors. This makes trial division suitable as a preprocessing step before more advanced methods. Pollard's rho algorithm operates under the heuristic assumption that its pseudorandom mapping behaves like a random function modulo the smallest prime factor

p

n

, producing a collision and thus a factor in expected time

O(\sqrt{p})

. This reliance on the birthday paradox in a cycle of length approximately

\sqrt{p}

makes the method particularly effective for finding small to medium-sized factors, with constant space usage independent of

p

. The heuristic has been rigorously analyzed under random mapping assumptions, confirming its expected performance for typical composite

n

. The quadratic sieve (QS) achieves its heuristic running time of

\exp\left( (1 + o(1)) \sqrt{\log n \cdot \log \log n} \right)

through two main phases: relation collection via sieving for smooth quadratic residues modulo

n

, and solving a sparse linear system over

\mathbb{F}_2

for dependencies among relations. The sieving phase dominates, relying on the density of

B

-smooth numbers near

\sqrt{n}

to yield sufficient relations, with the optimal smoothness bound

B

balancing sieve interval size and matrix density. The matrix solving step adds a subdominant cost of

O(L^{\omega})

where

L = \exp(\sqrt{\log n \log \log n})

and

\omega \approx 2.37

, but heuristics confirm the overall exponent as the leading term. This analysis, developed by Pomerance, assumes the independence of Legendre symbols and smoothness probabilities. For the general number field sieve (GNFS), the heuristic time complexity is

\exp\left( (1.923 + o(1)) (\log n)^{1/3} (\log \log n)^{2/3} \right)

, derived from the densities of smooth elements in the rational and algebraic number fields. The constant 1.923 arises from optimizing the smoothness probabilities for ideals in the sieve, with the relation collection phase (line sieving over polynomial pairs) and linear algebra both scaling as the dominant

L^{1/3, 1.923}

term under the generalized Riemann hypothesis for smoothness distributions. This subexponential form outperforms QS for large

n > 10^{100}

, as validated in implementations. Empirical validations using the CADO-NFS software package confirm these heuristics for challenging instances like RSA-1024, a 1024-bit semiprime. Parameter optimizations in CADO-NFS, such as polynomial selection and sieving schedules, yield factorization times aligning closely with the GNFS heuristic, estimated to require approximately 500,000 core-years on 2.2 GHz CPUs for the full process, with sieving comprising over 80% of the effort.^[35] Records for smaller RSA challenges (e.g., RSA-768 in 2009, RSA-250 in 2020) further match predicted yields, demonstrating the reliability of smooth number density assumptions in practice. Several factors influence the practical realization of these heuristic times, including memory usage for storing relations and sieve arrays, which can bottleneck large-scale runs if exceeding available RAM. Parallelism via distributed sieving across clusters accelerates the dominant sieving phase, with multi-core or GPU implementations scaling near-linearly up to thousands of nodes, though communication overhead limits efficiency. For large

n

, the sieving phase overwhelmingly dominates, often accounting for 90% or more of total compute time, as small-prime sieving becomes computationally intensive while large-prime cofactoring remains subdominant.

Rigorous Running Times and Complexity

Integer factorization is known to require subexponential time on classical computers, with no polynomial-time algorithm established despite extensive research. The best rigorous upper bound for general-purpose classical algorithms is provided by variants of the number field sieve (NFS), achieving expected running time

L_n[1/3, c + o(1)]

where

c \approx 1.923

, under the generalized Riemann hypothesis (GRH) for some analyses, though unconditional bounds are slightly worse.^[36] A seminal rigorous result is the Schnorr–Seysen–Lenstra (SSL) algorithm from 1982, which uses class group relations in quadratic number fields to achieve an expected running time of

L_n^{1/2, 1 + o(1)}

. This bound was rigorously proven without GRH assumptions by Lenstra and Pomerance in 1992, marking a key advancement in provable subexponential factorization.^[37]^[38] For randomized algorithms like Pollard's rho method, a rigorous expected running time of

O(n^{1/4 + \epsilon})

for any

\epsilon > 0

holds under the extended Riemann hypothesis (ERH), as established by Miller in 1976. This analysis relies on bounds on the least quadratic non-residue, confirming the method's efficiency for finding small factors when the hypothesis is assumed.^[39] Regarding lower bounds, the decision version of integer factorization lies in

\mathsf{NP} \cap \mathsf{coNP}

, implying it is not

\mathsf{NP}

-complete unless

\mathsf{NP} = \mathsf{coNP}

. No proof of

\mathsf{NP}

-hardness exists, positioning it as

\mathsf{NP}

-intermediate, though it is widely believed to be computationally hard, serving as a foundational one-way function in cryptography.^[40]^[41] In the quantum setting, Shor's algorithm provides a rigorous polynomial-time solution, factoring

n

O((\log n)^3)

time on a quantum computer with sufficient qubits, fundamentally altering the complexity landscape.^[6] Subexponential running times in factorization are often expressed using the

L_n(a, c)

notation, defined as

L_n(a, c) = \exp\left( c (\log n)^a (\log \log n)^{1-a} \right),

which interpolates between polynomial (

a=0

) and exponential (

a=1

) growth for

0 < a < 1

. This formalism precisely captures the asymptotic behavior of classical algorithms like NFS.^[42]

Applications and Challenges

Role in Cryptography

The RSA cryptosystem bases its security on the presumed hardness of factoring the product of two large primes. Key generation involves selecting distinct primes

p

and

q

, computing the modulus

n = p q

, and deriving Euler's totient

\phi(n) = (p-1)(q-1)

. A public exponent

e

is chosen coprime to

\phi(n)

, and the private exponent

d

satisfies

e d \equiv 1 \pmod{\phi(n)}

. Encryption transforms plaintext

m

to ciphertext

c = m^e \mod n

, while decryption recovers

m = c^d \mod n

. Factoring

n

would reveal

p

and

q

, allowing computation of

\phi(n)

and thus

d

, compromising the system.^[11] To evaluate the practical limits of factorization for RSA security, RSA Laboratories initiated the RSA Factoring Challenge in 1991, offering semiprimes of varying bit lengths with monetary prizes for solutions. The challenge spurred advances in factoring algorithms but was discontinued in August 2006, as 1024-bit keys were becoming obsolete and 2048-bit moduli emerged as the standard for secure applications. Factoring a 2048-bit RSA modulus remains computationally infeasible on classical hardware, estimated to require billions of years with current methods, but success would break encryption in protocols like TLS, exposing global communications and stored data. NIST recommends 2048-bit RSA keys provide adequate security through at least 2030 for most uses. Shor's quantum algorithm threatens RSA by solving factorization in polynomial time, prompting a shift to post-quantum cryptography resistant to such attacks. NIST's standardization process has prioritized lattice-based schemes like CRYSTALS-Kyber for key encapsulation and hash-based signatures like SPHINCS+ for digital signatures, finalized in FIPS 203, 204, and 205 in August 2024. These alternatives avoid reliance on factorization hardness, focusing instead on problems like learning with errors or stateless hash chains. By 2030, NIST plans to deprecate RSA and similar vulnerable algorithms, mandating migration to PQC for federal systems amid accelerating quantum hardware progress.^[26]^[43]^[44] Even without quantum breakthroughs, classical attacks exploit implementation flaws to aid factorization. The ROCA vulnerability (CVE-2017-15361), disclosed in 2017, stemmed from flawed prime generation in Infineon TPM chips, producing primes with detectable structure that enabled efficient factorization of affected 512- to 2048-bit keys, potentially impacting millions of devices. Side-channel attacks, such as those analyzing power consumption or electromagnetic emissions during decryption, can leak partial information about

p

and

q

, reducing the effective key strength. In 2025, these risks underscore the need for robust key generation and hybrid PQC-classical systems during the transition period.^[45]

Current Records and Open Problems

In classical integer factorization, the current record for factoring a general semiprime using the general number field sieve (GNFS) is RSA-250, an 829-bit (250-digit) number factored in 2020 by an international team employing the CADO-NFS implementation on roughly 1000 CPU cores, consuming approximately 900 core-years of computation. This surpassed the prior RSA-240 (795-bit) factorization achieved in 2019 with similar resources. For smaller moduli, RSA-512 (512 bits) remains the last of its series factored via GNFS, completed in 1999 after extensive distributed computing efforts. For semiprimes with very close factors, Fermat's method with GPU acceleration can factor numbers up to several hundred bits efficiently, though 1024-bit remains challenging without extremely close factors. On the quantum front, records for digital quantum factorization include a 22-bit RSA integer factored in June 2025, building on earlier 48-bit demonstrations from 2022 using 10-qubit superconducting processors and variational quantum algorithms. In April 2025, researchers factored a 90-bit semiprime using D-Wave's quantum annealing, the largest for such hybrid approaches as of November 2025.^[28] For analog quantum annealing, D-Wave's largest verified general factorization stands at a 23-bit integer from 2023, limited by noise and optimization constraints. A 2025 D-Wave announcement claimed an RSA-2048 factorization, but this applied to a contrived semiprime with primes differing by just 2 bits, rendering it non-representative of cryptographic challenges.^[27] Key open problems persist in both domains. Classically, it remains unresolved whether a subexponential algorithm outperforming GNFS exists for arbitrary integers, as all post-1990s advances have been NFS variants without asymptotic improvements.^[46] Quantum efforts grapple with scalability: Recent estimates (as of 2025) suggest that implementing Shor's algorithm for a 2048-bit RSA key could require approximately 1 million physical qubits with error correction, though practical realization remains beyond current hardware, with optimistic timelines pushing breaks beyond 2030 amid hardware fidelity issues.^[30] Prominent conjectures include that factorization lies in BQP (via Shor) yet outside P, underpinning RSA security, and that the distribution of smooth numbers—vital for NFS efficiency—aligns with unproven analytic number theory bounds like the Dickman-de Bruijn rho function.^[47]

Integer factorization

Fundamentals

Definition and Motivation

Prime Factorization Theorem

Classical Algorithms

Trial Division and Basic Methods

Special-Purpose Algorithms

General-Purpose Algorithms

Quantum and Emerging Methods

Shor's Algorithm

Recent Quantum and Hybrid Advances

Performance Analysis

Heuristic Running Times

Rigorous Running Times and Complexity

Applications and Challenges

Role in Cryptography

Current Records and Open Problems

References

Table of Contents

Integer factorization

Fundamentals

Definition and Motivation

Prime Factorization Theorem

Classical Algorithms

Trial Division and Basic Methods

Special-Purpose Algorithms

General-Purpose Algorithms

Quantum and Emerging Methods

Shor's Algorithm

Recent Quantum and Hybrid Advances

Performance Analysis

Heuristic Running Times

Rigorous Running Times and Complexity

Applications and Challenges

Role in Cryptography

Current Records and Open Problems

References

Table of Contents

Sign in to contribute

Suggest an article

Something went wrong

Thank you!