close
Dual View Random Solved Random Open
OPEN This is open, and cannot be resolved with a finite computation.
Is there a sequence $1\leq d_1<d_2<\cdots$ with density $1$ such that all products $\prod_{u\leq i\leq v}d_i$ are distinct?
The open status of this problem reflects the current belief of the owner of this website. There may be literature on this problem that I am unaware of, which may partially or completely solve the stated problem. Please do your own literature search before expending significant effort on solving this problem. If you find any relevant literature not mentioned here, please add this in a comment.
Comment activity that has not yet been incorporated into the remarks
There are no solutions, partial or complete, claimed in the comments.
A construction of Selfridge (see [786]) shows that there exists such a sequence of density $>1/e-\epsilon$ for any $\epsilon>0$.

See also [786].

View the LaTeX source

View history

External data from the database - you can help update this
Formalised statement? Yes
Related OEIS sequences: A389544 A390848
Likes this problem Alfaiz
Interested in collaborating None
Currently working on this problem None
This problem looks difficult None
This problem looks tractable BorisAlexeev, TFBloom, Przemek, johnathonweare
The results on this problem could be formalisable None
I am working on formalising the results on this problem None

Additional thanks to: Desmond Weisenberg

When referring to this problem, please use the original sources of Erdős. If you wish to acknowledge this website, the recommended citation format is:

T. F. Bloom, Erdős Problem #421, https://www.erdosproblems.com/421, accessed 2026-05-21
Order by oldest first or newest first. (The most recent comments are highlighted in a red border.)
  • Here's the complete rewrite of the paper, following suggestion by Terence Tao to make the proof 'modular' so that various parts are as independent as possible. I think it greatly streamlined proof and divided it into 4 parts. All feedback welcome: critique, simplifications, suggestions.

    We build a density-one set $A\subset\mathbb N$ whose increasing enumeration has distinct products of consecutive blocks, by first building a density-one set $A_0$ with certain special properties. Then the construction of $A$ is dyadic: for each large dyadic $X$ we delete a small set $D_X\subset[X,2X)$ and set\[
    A:=A_0\setminus\bigcup_{X\ \mathrm{dyadic}}D_X,
    \]so the key is to ensure simultaneously: (i) $\sum_{X\le N}|D_X|=o(N)$ (density), and (ii) every collision with maximal term $m\in[X,2X)$ is hit by $D_X$ (hitting).

    Module I (sprinkling)
    Independently delete each $n\ge3$ with probability $\delta(n)=1/(\log n)^2$ and let $A_0$ be the complement. Almost surely $A_0$ has density $1$ and, by a union bound plus Borel-Cantelli, all "long" collisions are eliminated: for all sufficiently large maximal terms $m$ there is no collision whose degree satisfies\[
    \max(\ell,k+1)\ \ge\ L(m):=10(\log m)^3.
    \]Thus, from this point on we only need to kill the remaining logarithmic-regime collisions with $d=\max(\ell,k+1)\le L(X)$ at each dyadic scale $X$. We also make sure that $A_0$ is locally regular.

    Module II + III (encoding + Diophantine bounds)
    Module II shows that under a capacity constraint at scale $X$ (at most $T(X)$ deletions in every length-$L(X)$ interval), the predecessor lists of length $\le L(X)$ are determined by bounded local windows, so every logarithmic collision is encoded by an integer point $(x,y)\in[X,2X)^2$ on a split-product curve\[
    \prod_{i=0}^{\ell-1}(x-a_i)\;=\;\prod_{j=0}^{k}(y+b_j),
    \qquad
    d=\max(\ell,k+1)\le L(X),
    \qquad
    a_i,b_j\ll L(X)+T(X)+U(X).
    \]Module III supplies uniform point bounds for these curves: for short degrees $d<d_0(X)$, Bombieri--Pila plus a parameter count implies the set of possible endpoints is $o(X)$; for long degrees $d_0(X)\le d\le L(X)$, uniform determinant-method bounds (CCDN) give $X^{o(1)}$ points per absolutely irreducible component, and Bilu--Tichy with Hajdu--Tijdeman shows any remaining degenerate families are Pell/Dickson-type and contribute only $O(\log X)$ points in a dyadic box.

    Module IV (dyadic deletion via Moser--Tardos)
    At each dyadic $X$ we choose $D_X\subset[X,2X)\cap A_0$ by an algorithmic Lovasz local lemma: introduce Bernoulli variables $(\xi_n)$ (randomness exposed on a buffered universe $[X/2,2X)\cap A_0$ to avoid edge effects), define capacity bad events $\mathsf B_I^{+}$ (overflow) on length-$L(X)$ intervals and collision cylinder events $\mathsf A_\tau$ (patterns on local windows encoding a specific collision). A one-way coverage lemma shows that if a logarithmic-length collision occurs in $B_X$ with maximal term lying in $[X,2X)$, then at least one collision cylinder event occurs. Moreover, the maximum dependency degree of the bad-event graph is $X^{o(1)}$, coming from the window size and the admissible-pattern count. The asymmetric LLL (implemented by Moser-Tardos) yields an assignment with no bad events, hence capacity holds and no collision with maximal term in $[X,2X)$ survives. Since $|D_X|=o(X)$ for each dyadic $X$, the final assembly preserves density $1$ and eliminates all collisions.

  • I think I have the final proof. You can check the preprint here. However, I'm not really happy with it because it's pretty tedious. I plan on trying to formalize it now in Lean with Aristotle and simplify. If you have any ideas on how to make it simpler, I'd be grateful!

    Here's why it's tedious: when you try to construct a sequence in question, you note that greedy sequence looks promising but leaves no control, so the next step is to go dyadic and try something like stepwise-greedy dyadic construction. This runs into a wall too (but maybe manageable by some combinatorics - this is my hope for making it simpler). So eventually I settled for a dyadic Moser-Tardos construction (Lovasz local lemma), but that comes with a lot of bookkeeping.

    The final construction is:
    - reduce preliminary cases (sprinkling)
    - use geometry to treat logarithmic regime (Castryck-Cluckers-Dittmann-Nguyen, Bilu-Tichy and Hajdu-Tijdeman results)
    - let Moser-Tardos framework deal with the rest of the cases

    Any feedback welcome!

    • This looks promising, but does need careful checking, particularly with regards to whether the bounds coming from the algebraic geometry or Diophantine geometry inputs are strong enough (and uniform enough). A lot of dependencies of constants in that literature (on, say, degree or height) are poor, non-uniform or even ineffective, and this to me is the most dangerous source of errors. (The combinatorial inputs are much more quantitative, and often contain exponential gains (e.g., due to concentration of measure), so I am less concerned about this side of the argument, although it is still important to do the quantitative bookkeeping carefully.) Having a human-generated version of the argument that carefully discusses the nature of these bounds, and their uniformity, will be more reassuring than an AI-generated proof that does not focus on this issue.

      Additionally, it is unlikely that many of these inputs will be easily formalizable in Lean with current tools. Probably the best one can hope for is a conditional formalization in which key theorems from past literature are carefully formalized as precise axioms. However, this would still add a lot of credibility to the argument (once the axiom statements are scrutinized carefully by humans, particularly with regards to the order in which quantifiers are placed). Ideally, these axioms should be formalized by humans first, before running the automated tools to try to derive the main result from the given axioms; letting the AI tool state the axioms it would like to use is a particularly risky idea and should be avoided if at all possible.

      With regards to cleaning up the writing, one can try to make the proof more modular: for instance, create a theorem describing the existence of a set $A_0$ obeying certain explicit properties that will be used in later parts of the argument, but hide the implementation details of how the $A_0$ is constructed within the proof of that theorem, so that $A_0$ can be used as a "black box". This type of modularity will help both with human readability of the argument and with formalization, and could also increase the robustness of the proof in case some local errors are found.

      • Thank you Terence Tao for your feedback. I'll push Aristotle to see how far it can go out of curiosity (right now, not really far). Nevertheless making the proof modular is probably the best way to clean it up. I'll try to separate: construction of $A_0$ (great idea with a black box), geometric results, and then cleanly state and deal with all regimes for $k,l$.

        BTW, I too prefer human-generated proofs than formalized AI-generated Lean proofs, but I think you can pretty easily go from AI-generated Lean to 'human' with GPT-5.2 acting as a 'translator' of Lean to natural language.

        • Part of the point of having a human-generated version (as opposed to a 'human'-looking version) of the argument is that the process of writing the argument by hand will naturally and organically suggest ways to improve and streamline the presentation, make it more robust against error, and tie it more tightly to existing literature and ideas; also, if one is unable to formalize the original argument, the separate writing of the argument by the human expert serves as an important independent check of correctness. In short, I think the AI tools have done their job, and it is time for the human expert to take over.

    • Good luck! On the one hand, this is a pretty sophisticated result obtained in collaboration with AI, which is exciting. On the other hand, given the complexity, convincing the community (or oneself) of the correctness seems more difficult than usual. But I think doing this has great value - it at least is an experiment on how mathematics may work going forward, whether or not we finally achieve a proof you’re “happy” with!

    • Congrats on the print! I read it and it looks promising I agree but theres a few things I saw that might be iffy. If I'm wrong about these you can let me know.

      The first is that the split product (6) doesn't match the collision equation (2). Collision is defined as
      $$
      d_{M-\ell+1}\cdots d_M ;=; d_N\cdots d_{N+k}.
      $$
      Then the paper sets $x=d_M$ and $y=d_{N+k}$, defines $b_j := d_{N+j}-d_N$ and claims
      $$
      x(x-a_1)\cdots ;=; y(y+b_1)\cdots(y+b_k).
      $$
      But with $y=d_{N+k}$, the right-side factors in (2) are $(d_N,\dots,d_{N+k})$ so they are $\le y$ and should look like $y-\text{something}$, not $y+b_j$ with $b_j\ge 0$. As written, the factors $y+b_j$ do not correspond to the block $(d_N,\dots,d_{N+k})$.

      2. Lemma 4.1 uses forward on the $y$ side, which matches the same endpoint/sign confusion. After bounding shifts on the $x$ side, Lemma 4.1 states that the same argument applies forward on the $y$ side. But earlier in the same section it declared $y=d_{N+k}$, the right endpoint of the right block. If $y$ is the right endpoint, the relevant block elements lie to the left of $y$, so the analogous argument should run backward.

      3. Lemma 6.1 claims every window contains at least $L$ kept points but the proof ignores holes coming from $A_0$. At scale $X$, the eligible variables are
      $$
      V_X := [X,2X)\cap A_0,
      $$
      and deletions satisfy $D_X=B_X\subseteq V_X$. Lemma 6.1 defines windows $W_y\subseteq V_X$ and asserts that under (7), every interval of length $L+2T$ contains at least $L$ kept points and the proof deduces this from $|D_X\cap I|\le T$. That deduction implicitly treats $V_X$ as if it contained all integers in the interval but eligibility is restricted to $A_0$ and $A_0$ comes from a sprinkling step so it can miss many integers. The conclusion that the local predecessor lies inside $W_y$ isn't really justified from the stated assumptions.

      4. Lemma 6.2 subtracts from $L$ instead of from $|A_0\cap J|$. Lemma 6.2 claims that every length $L$ interval $J$ contains at least $L-T$ integers of
      $$
      ([X/2,2X)\cap A_0)\setminus(D^{<X}\cup B_X).
      $$
      The proof uses $\neg B_J$ to get $|(D^{<X}\cup B_X)\cap J|\le T$ and subtracts from $L$. But the quantity being lower bounded is
      $$
      |(A_0\cap J)| ;-; |(D^{<X}\cup B_X)\cap J|,
      $$
      so the bound is $|A_0\cap J|-T$, not $L-T$. Since $|A_0\cap J|$ need not equal $L$.

      • Unfortunately, I ran the preprint through ChatGPT and it did flag some (nontrivial) issues, which may relate to what uona is pointing out.

        For Chojecki: you may want to at least go thorough those here.

        • Thank you both uona and natso26 for your remarks. Indeed some of that are real issues but nothing unfixable. In the new write-up all above should be fine: 1. is basically an error, should be $y=d_N$, 3. is related to regularity of $A_0$ (made explicit) 4. combinatorics is now completely rewritten and simplified.

  • I'm very close to finishing it off, but some minor technicalities still persist. Any help would be appreciated here with the write-up. Generally, the strategy follows what Terence Tao outlined in a comment, and then applies Bilu-Tichy and Hajdu-Tijdeman for degeneracy classification, and determinant-method bounds via Castryck–Cluckers–Dittmann–Nguye for uniform counting for non-degenerate cases.

    Here is a route that I think makes the "main trouble case'' $k,\ell\asymp \log N$ fully tractable with known tools. Run a prefix-stable greedy construction: scan $n=2,3,\dots$ and include $n$ unless it creates a new relation\[
    d_{M-\ell+1}\cdots d_M \;=\; d_N\cdots d_{N+k},
    \qquad M<N,
    \]among consecutive blocks in the current sequence. Prefix-stability ensures later decisions cannot create new collisions among earlier terms, so it suffices to show that the set of rejected $n\le N$ is $o(N)$. For $n\in[X,2X]$, any rejection produces an integer point $(x,y)\in[X,2X]^2$ on a separated-variable curve\[
    \prod_{i=0}^{\ell-1}(x-a_i)=\prod_{j=0}^{k}(y+b_j),
    \qquad d:=\max(\ell,k+1)\asymp \log X,
    \]with shifts $a_i,b_j$ coming from local consecutive gaps.

    For the hard regime $d\asymp\log X$, the determinant method with polynomial dependence on $d$ gives a uniform polylogarithmic bound on integral points per curve once line components are excluded. Concretely, applying Castryck--Cluckers--Dittmann--Nguyen (polynomial-in-$d$ bounds for integral points on affine curves) to\[
    F_{\mathbf a,\mathbf b}(x,y):=\prod_{i=0}^{\ell-1}(x-a_i)-\prod_{j=0}^{k}(y+b_j)=0
    \]yields\[
    \#\bigl\{(x,y)\in[X,2X]^2\cap\mathbb Z ^2:\ F_{\mathbf a,\mathbf b}(x,y)=0\bigr\}\ \ll\ (\log X)^{O(1)}
    \]uniformly in the shifts, provided $F_{\mathbf a,\mathbf b}$ has no linear factor over $\mathbb Q$ (i.e. no affine line component). This directly attacks the $k,\ell\asymp\log N$ bottleneck.

    It remains to control the exceptional ''low-genus'' situations. By Bilu--Tichy (and the Hajdu--Tijdeman refinement in the simple-rational-roots setting relevant to split products), any bounded-denominator degeneracy forces either an affine/translate relation (line component: $f(x)\equiv g(x+t)$) or an explicit $m=2$ PTE/Pell-type family. Line components correspond to affine equivalence of the root multisets and can be removed at zero density (or shown to be only trivial overlap configurations). The remaining PTE/Pell-type components contribute only $O(\log X)$ points per dyadic box. Summing over dyadic $X$ shows only $o(X)$ rejections in $[X,2X]$, hence $o(N)$ rejections up to $N$, producing a density-$1$ sequence with all consecutive-block products distinct.

    P.S. I should add that this is done via a long and complex conversation with GPT-5.2, trying to 'unstuck it' over and over to find the path forward.

    • Quick tightening on the line component / affine-translate relation exception. In the actual collision setting (monotone $d_i$, hence $M<N$ so $x=d_M<y=d_N$), an affine line component cannot contribute any genuine collision points.

      Write
      $$
      f(x)=\prod_{i=0}^{\ell-1}(x-a_i),\qquad g(y)=\prod_{j=0}^{k}(y+b_j),
      $$
      with
      $$
      a_i=d_M-d_{M-i}\ge 0,\quad b_j=d_{N+j}-d_N\ge 0,\quad a_0=b_0=0.
      $$
      If $F_{\mathbf a,\mathbf b}(x,y)=f(x)-g(y)$ has an affine line component, then (as in the line-component lemma) it forces an identity
      $$
      f(x)\equiv g(x+t)\quad\text{or}\quad f(x)\equiv g(-x+t),
      $$
      i.e. the line is $y=x+t$ or $y=-x+t$.

      (i) $f(x)\equiv g(x+t)$ (line $y=x+t$).
      Roots of $f$ are the multiset ${a_i}\subset[0,\infty)$. Roots of $g(x+t)$ are the multiset ${-t-b_j}$. Since $b_0=0$, we get $-t\in{a_i}$, hence $t\le 0$. Therefore $y=x+t\le x$, so this line contains no points with $x<y$ (hence cannot encode a collision with $M<N$).

      (ii) $f(x)\equiv g(-x+t)$ (line $y=-x+t$).
      Roots of $g(-x+t)$ are the multiset ${t+b_j}$ (since $-x+t+b_j=0\iff x=t+b_j$), so in particular $t$ is a root (from $b_0=0$) and all roots are $\ge t$. But $f$ has the root $0$ (since $a_0=0$), so $0$ must also be a root of $g(-x+t)$, i.e. $t+b_j=0$ for some $j$. With $t\ge0$ and $b_j\ge0$ this forces $t=0$. Then the line is $y=-x$, which has no points in $[X,2X]^2$ for $X>0$.

      So the affine line component exception is actually harmless here. It yields zero relevant collision points in the regime $x,y>0$ with $x<y$. This lets you treat the determinant method bound as applying to all collision-producing curves, without needing an extra remove line components at zero density maneuver.

      Note that this just removes a case and simplifies dependencies. This was half done with ChatGPT in a string of questions and then reviewed.

      • If you don't mind, I'm usually quite fascinated by how interaction between human and AI can arrive at results that may be further than what AI (or human) can do alone. So if you're willing to share, I'd like to see the shared chat with ChatGPT as well.

      • Thank you, this line removal is indeed simplifying a bit!

  • There is discussion over at the community database GitHub repo that suggests that the greedy sequence is likely to satisfy the required property.

    In particular, there's a definition of "interesting numbers" for which it appears there are only 3 interesting numbers: 16, 352, and 715. Assuming there are no other interesting numbers, the greedy sequence has very clear growth and in particular has density 1.

    • Greedy sequence start from 2 admits the 3 small interesting numbers. Greedy sequence starting from $N < 20$ admits some small interesting numbers. There is however no small interesting number when starting the greedy sequence with 20.

  • I think there is a decent chance to solve this problem with known techniques. We are trying to find a dense sequence with no non-trivial solutions to the equation
    $$ d_{M-l+1} \dots d_M = d_N \dots d_{N+k} \quad (*).$$
    We can remove any common factors and assume that $M<N$. This also forces $l > k$ from the monotonicity of the $d_i$, so in particular $k < N$. We want $d_N \sim N$, so all the factors on the RHS need to be $\asymp N$. Taking logarithms, we get $l \log M \asymp k \log N$.

    By the Hardy-Ramanujan law, it is reasonable to demand that each $d_i$ has $\approx \log\log i$ prime factors (counting multiplicity). By counting the total number of primes on both sides of (*), this already handles the case when $k$ is bounded (the LHS and RHS have noticeably distinct prime factors) or even $k = O(\sqrt{\log\log N})$, and in general it implies that $l \log\log M \asymp k \log\log N$, which in conjunction with the previous constraint yields $\log M \asymp \log N$ and $l \asymp k$.

    For medium $k$, specifically $\sqrt{\log\log N} \ll k \leq \varepsilon \log N$ for a small $\varepsilon$, one can view (*) as a degree $l$ polynomial equation
    $$ (x-a_{l-1}) \dots (x-a_1) x = y (y+b_1) \dots (y+b_k)$$
    where $x = d_M$, $y = d_N$, $a_i = d_M - d_{M-i}$, and $b_i = d_{N+i} - d_N$. There are (morally) $N^{O(\varepsilon)}$ choices for the $a_1,\dots,a_{l-1},b_1,\dots,b_k$, and by Bombieri-Pila type bounds there should be only $N^{1-c}$ many integer solutions to this equation with $x,y = O(N)$ for some absolute constant $c>0$ (one has to check that the locus of this equation contains no straight lines, but I think this should be routine). So the set of solutions here is density zero and should be removable somehow.

    For large $k$, specifically in the regime where $k/\log N$ goes to infinity, I think performing a random sampling, deleting $o(1)$ of the elements of the sequence, should destroy all solutions to (*) with positive probability (here we may first want to prepare the sequence so that each $d_i$ has a relatively large prime factor, and use the fact that nearby elements of the sequence cannot share that large prime factor, somewhat in the spirit of Selfridge's construction).

    So the main case that causes trouble is when $k,l \asymp \log N$, but perhaps one of the above techniques can be pushed a little further to cover this case also.

    • Here’s how we can push the Bombieri--Pila/determinant-method approach into the remaining regime $k,\ell\asymp \log N$. Full write-up is here. It still falls short of settling the problem but pushes the above a bit further.


      Any collision\[
      d_{M-\ell+1}\cdots d_M \;=\; d_N\cdots d_{N+k}\qquad (M<N)
      \]can be rewritten with $x=d_M$, $y=d_N$, $a_i:=d_M-d_{M-i}$ ($1\le i\le \ell-1$), $b_j:=d_{N+j}-d_N$ ($1\le j\le k$) as an integer point on\[
      F_{\mathbf a,\mathbf b}(x,y):=\prod_{i=0}^{\ell-1}(x-a_i)-\prod_{j=0}^{k}(y+b_j)=0,\qquad (a_0=b_0=0),
      \]a plane curve of degree $d=\max(\ell,k+1)\asymp \log X$ when $x,y\asymp X$ (the bottleneck case).

      The key observation is that large degree actually helps in the exponent: for each non-degenerate pattern $(\mathbf a,\mathbf b)$ (i.e. $F_{\mathbf a,\mathbf b}$ has no line/rational component), determinant-method bounds give\[
      \#\{(x,y)\in[X,2X]^2\cap\Bbb Z^2:\ F_{\mathbf a,\mathbf b}(x,y)=0\}\ \ll_{d,\varepsilon}\ X^{1/d+\varepsilon},
      \]hence $\ll X^{\varepsilon}$ since $X^{1/d}\asymp e^{O(1)}$ for $d\asymp\log X$ (and the $d$-dependence in the implied constant is expected to be only $X^{o(1)}$ once $d\ll \log X$). Degenerate patterns can be isolated using Bilu--Tichy for $f(x)=g(y)$ with $f=\prod(x-a_i)$, $g=\prod(y+b_j)$: infinite ''special'' solution sets force $f,g$ into the standard-pair/composition classes, and for products of distinct linear factors this should collapse to an explicit finite/rigid exceptional family (essentially affine-equivalence of the root multisets, corresponding to overlap/trivial configurations), which can be removed at zero-density cost.

      What remains is a pattern-counting input: in a density-$1$ construction one can enforce a local deficiency invariant (e.g. at most $T(X)=(\log X)^{2/3}$ deletions in any interval of length $C\log X$), which implies the number of realized gap-patterns of length $\asymp\log X$ up to scale $2X$ is only\[
      \sum_{j\le T(X)} \binom{C\log X+j}{j} \;=\; X^{o(1)}.
      \]Therefore the total number of bottleneck collisions with $(x,y)\in[X,2X]^2$ is $X^{\varepsilon+o(1)}=o(X)$. One can then choose a hitting set $B_X\subset[X,2X]$ (e.g. pick one endpoint from each collision, with a union-bound/LLL nibble to preserve the local deficiency invariant) and delete dyadically; the total deletions are $o(N)$, giving density $1$ and killing all collisions, including the $k,\ell\asymp\log N$ regime, modulo the two technical verifications (degenerate-pattern classification and tracking $d$-dependence for $d\asymp\log X$).

All comments are the responsibility of the user. Comments appearing on this page are not verified for correctness. Please keep posts mathematical and on topic.

Log in to add a comment.

Back to the forum