number theory

On generating primes with periodic continued fractions

Pardon the mess. I wrote this with a different markup tool and it looks like it doesn't want to play nice with Ghost.

I started writing this a few days ago, and have since ironed out some things which I hope to come back and flesh out. I leave the rest of the post as-is for now, half for my Dear Reader, and half as a note to self.

TL;DR: I'm virtually positive I'm right about most of this, and have come up with several different ways to theoretically exploit continued fractions to generate primes of arbitrary size. Unfortunately, and naturally, PCFs are just as deeply nuanced as most prime-connected stuff, meaning same ol' story, different tune.

If it were possible to cheaply generate large PCFs, it would be a very different story, and I haven't completely ruled out the possibility yet. In fact, it would be sufficient merely to develop an algorithm which could spit out the length of the period of the PCF for a given radical, assuming it ran in reasonable time. What I did find in the literature about it suggests that it's an open problem, but there also wasn't very much literature compared to some of your more mainstream areas of number theory.

If for whatever reason this doesn't load or looks messed up, try reading here (or here!)

Dark integers

(or, "Yet another way to learn more about the underlying structure of the universe.")

Let us assume.

I was reading my article on the distribution of computation space and going back over my results, but this time around, a new potential implication took root in my imagination.

Let's assume for the moment that Turing-completeness is necessary and sufficient for computation of any general sort, and furthermore, that there are no such mechanisms in existence that can compute on a fundamentally higher level. In other words, let us assume that every computing device or mechanism is isomorphic to any other such device or mechanism.

Let's also assume that my experimental analysis of computation space was more or less correct, at least in the broad strokes. If someone else conducted a similar experiment using a substantially different instruction set or methodology, some of the specifics might turn out differently, but I suspect the more fundamental properties I found would still show up.

These aren't trivial assumptions, but they're at the very least plausible. So we proceed.


Near the end of my article, I speculated that primes would probably be under-represented in a list of numbers generated by random code and sorted by frequency. This was partly an observation, but mostly a logical argument; by their nature, primes eschew association with simple patterns, and simple patterns are the best you're going to get out of a random program generator. I would not expect primes to start showing up in bulk until we reached the level of complexity where a prime sieve might reasonably form, and that's way past the scope of what we were dealing with so far. (Side note: it would be interesting to figure out the shortest possible prime generating program using that sort of basic instruction set.)

What the article doesn't include is my follow-up analysis on the subject, which indeed showed that primes seem to be relatively less frequent than other integers in their neighborhood. Here's a plot relating the set of integers (absolute values were taken) to the number of times each one appeared in the cumulative output of a billion random micro-programs.

Integer Frequency through N=600

There is a lot in this plot that bears discussion, but let's stick to the highlights:

  • It is roughly logarithmic.
  • There are conspicuous spikes (and patterns of spikes), most notably around each %2^n%.
  • The red dots—you may have to click on it and zoom in to see them—are primes.
Conspicuous spikes.

If you haven't figured it out yourself, the spikes on the powers of two are a consequence of the binary assembly code in which these micro-programs are written. It's operationally trivial for computers to multiply or divide by two, so you'll see it a lot in a random distribution like this. And getting to %64% is not much more work than getting to %32% (after all, the infrastructure is already in place.) The same goes for %128%, %256%, and so on, which is why you see these spikes and why they're barely even tapering off.

Red dots.

Closer examination of where these dots fall show that they're almost always in a crevice, or at least on a slope, which is to say that the number in question has a notably smaller chance of being generated than one or both of its neighbors.

However, primes are way too slippery a customer to allow this to always happen. Take, for example, %n=31%. This is the first prime that's sitting up on a mesa. I argued before that primes are hard to generate, so what gives?

Well, for many purposes, the distribution of primes is effectively random. Because of this, some of them are doomed to land right next to a really popular neighbor, as with our poor %31%. As discussed, %32% is especially easy to generate, and as it happens, adding or subtracting %1% is also generally a piece of cake. It's just bad luck for %31% that his neighbor attracts all this traffic for him.

Degrees of introversion.

The example of %31% is particularly egregious, since those %2^n% spikes are the biggest bullies on the playground. But all primes are affected by this phenomenon to some extent. Since a prime by its nature can't be directly conjured via a loop (read: multiplication, exponentiation, hyper-operations), the only way to get to these numbers (short of e.g. prime sieves) is to land on some composite number close to it, and then happen to add or subtract the right constant. The frequency of any prime is ultimately a reflection of its neighborhood.

There are lots of consequences to this arrangement. Turning it around, one good example is the case of twin primes, which are primes that differ only by %2%. You only get twin primes when the number between them is divisible by %6%. Any multiple of %6%, being divisible by %2%, %3%, and %6%, is right up there as an easy target for random generation. The upshot here is that when you have a twin prime pair, you will almost always see a single-value spike between them, since they're surrounding such a juicy target.

But enough fun with plots.

Dark integers.

This is a term I pulled out of my ass which I like so much that I refuse to Google it and find out it's already taken.

We've shown how certain integers are "popular" in terms of computational space frequency, like the powers of two; we have also shown that some tend in the other direction, such as the primes. A dark integer, according to me, is loosely defined as an integer that's especially hard to generate through an algorithm of any kind. I am hampered by the fact that I don't have a working theory as to how to strictly identify these numbers other than by brute force, but I have some idea of their properties.

They are not necessarily prime—in fact, they seemingly prefer to be complicated composites with many disparate factors, often including one or more large primes as a factor. It does make some sense that this would be the ideal setup for a dark integer. The erratic and informationally-dense set of factors it comprises makes it an unpopular number already, and not being prime allows it to be bordered by or nearby to one or more primes, which will not attract any spillover traffic.

'Darkness' in this sense is a relative term, at least for the moment. Perhaps it will make sense to define a dark number as any integer %N% that has a strictly larger Kolmogorov complexity than any %n < N%, although that's still difficult to prove. At any rate, some numbers are darker than others, and while it should roughly correspond to which numbers are lowest on a plot such as the one above, we have to remember that this is an experimentally-derived data set and prone to noise, especially as frequency decreases.

That said, I would like to tip my hat to %3465%, which was the lowest number that did not appear a single time in my billion programs. Wolfram Alpha has this to say about %3465%:

  • %3465 = 3^2 \times 5 \times 7 \times 11%.
  • %3465% has the representation %3465=2^7 3^3+9%.
  • %3465% divides %34^6-1%.

Whether any of that is significant is too soon to say.

Time to get crazy.

So, do you remember what we're assuming about computers? Basically, that a computer is a computer by any other name? It comes into play now.

  • Assumption: The universe is not magic. It plays by the same Turing-equivalent rules as everything in it. This means physics is ultimately deterministic, and if we had infinite time and infinite memory, we could simulate our universe on a Turing machine.
  • Assumption: The Big Bang happened. The initial configuration of energy at its inception was effectively random.

The universe seemingly operates by following physical laws, with all matter and energy having been kicked off by that hot mess of a Big Bang. This strikes me as loosely analogous to my Skynet micro-program generator, what with random initial conditions carrying out clearly defined step by step processes. More to the point, our earlier assumption posits that they are mathematically interchangeable for our purposes.

Hypothesis: For any given countable set of anything in the universe, the quantity of that thing is relatively unlikely to be a dark integer, all things being equal.

To test the hypothesis:

  1. Identify dark integers. The more, the larger, the merrier.
  2. Identify anything that can be quantified.
  3. Quantify the anything.
  4. Repeat to within error bounds for confirmation or disproof.

There may, unfortunately, be some bounds on what the anything could actually be. Ideally, it would be as simple as catching a bunch of atoms of something over and over and tracking the result, but more likely it would have to be something more directly related to the Big Bang, such as star counts in galaxies.

If all of this were sound, conclusions could be drawn.

If there are fewer dark numbers after all, it may be further evidence of the truly random and arbitrary nature of creation, and more importantly, it may be strong evidence of a deterministic universe and all that that implies.

If there is no discrepancy when it comes to dark numbers, it may imply that there is a true stochastic or otherwise bizarre layer to physics, or it may imply that the Big Bang was ordered more deliberately than one would think, or that there is some agency at work that is disrupting the "natural" order of things in one way or another.

Either way, we learn.


The dark integers on the graph above are specific to the instruction set used to generate it. While there is reason to expect that most sets of operations will be similar inasmuch as a binary system seems to be the simplest approach, it is not a guarantee. One can imagine pathological instruction sets that will yield an entirely different contour, such as one that uses the primes as its base units.

Long story short, if the binary analysis doesn't turn up results, it might be worth investigating the possibility of reverse engineering an instruction set by identifying its specific dark integers and trying to tease out what sort of atomic operations would give rise to those holes.

While applying any of this to cosmology is wildly speculative, biology and the mechanism of evolution seem like a perfect candidate for testing the viability of dark integers in general. Evolution is undeniably a finite computational force that lends itself well to the whole idea. Multi-cellular organisms are literally built from code, and it stands to reason that on balance, evolutionary pressures will tend to restrict most quantifiable biological features to easier-to-compute amounts. More on this as I investigate.

Cappallo-Goldbach Conjecture

Three new recent discoveries.


it turns out that you can generate a complete list of primes in the following fashion.

Using %n% where %p + q = 2n%, let %n = 2% and find any Goldbach partitions %p, q%; if %max(q in bb Q)% is not in set %bb P%, add that %q% to %bb P%. This will generate a set %bb P% such that %bb P% contains every prime up through %2n-3%, in order. Furthermore, the corresponding values of %p% will be small. I haven't determined exactly, but I am confident of %p <= sqrt(2n)% as an upper bound.

Sketch of proof

Let %q% be some prime not yet in %bb P%. The first possible inclusion of %q% in a Goldbach pair will be %3 + q = 2n%, and as soon as %n = (q + 3)/2%, you will have that %q%, and it will be the largest such %q% for that %n%.

There are repeats (corresponding to prime gaps) and even backtracks of the largest %q% (deserving further investigation), but they necessarily consist of primes already in %bb P%.


the Cappallo-Goldbach Conjecture, refining the domain of %p% and %q% to make a stronger assertion:

%sf"For any even " N > 2,sf" there exist primes " p, q sf" such that "% %p + q = N sf" and " sqrt(N) <= p <= N//2 <= q <= N - sqrt(N).%

If true, this bound is an improvement on the original %2 <= p <= N//2 <= q <= N-2%.

Or, more simply: there will always be a valid %p% such that %p >= sqrt(N)%.

Note also that there need not be a %p < sqrt(N)%; the first such counter-example is found at %N = 98%, yielding %19 + 79% as the smallest of three partitions.

I have not proven why this should be, but confirmed it empirically through %N = 5000%. I find it suggestive that %N - sqrt(N) = sqrt(N)(sqrt(N) - 1)%, which is reminiscent of the binomial expressions which can also be used to identify primes.

For instance, it appears likely (see Lehmer's totient problem) that for any prime %p%, it must be possible to factor %p - 1% into an arbitrary number of terms of the form %(k^n - k^(n-1))%, where %k% is a unique prime and %n >= 1, n in NN%.

This also likely connects to results such as the %i%th row of Pascal's Triangle consisting only of multiples of %i " iff " i% is prime: or equivalently, that %p in bbb P iff {p:" " p | ((p),(i)) :}, 0 < i < p}%.


the set of all Goldbach numbers have a bijective mapping to the set of all semiprimes.

Goldbach diagonals

The numbers going across the top represent our %n%. Visually, if one drops a vertical line to meet the diagonal, and then travels on a down/left diagonal, any intersections crossed represent Goldbach pairs.

For example, the line dropped from %8% hits the intersection of the horizontal %11% line and the vertical %5% line, meaning that %5 + 11 = 2(8)% is a solution; the same applies to %3 + 13 = 2(8)%. Since every semiprime can be represented as intersections on this sort of diagram, and every Goldbach pair will likewise be generated once, when intersecting a semiprime, the mapping becomes apparent.

Alternatively, consider that any semiprime is the product of two specific primes, and conversely, any two primes form a unique semiprime. Likewise, any Goldbach pair consists of two specific primes, and while their sum may not be unique among Goldbach pairs, that pair will still only be used a single time in the set of all Goldbach pairs, providing our one-to-one map. As addition and multiplication are both commutative, ordering is irrelevant; it suffices to either always lead by the smaller/larger of the two operands, or allow both orderings when taking the sum or product.


(Under construction.)

For starters, it is known that there are infinite primes, thus infinite semiprimes, which in turn provides another proof of infinite Goldbach pairs. I also reason that since any given even number has a finite number of Goldbach pairs, there must be an infinite number of Goldbach sums—that is, even numbers with at least one Goldbach pair. If there were only a finite number of these sums, each with a finite number of Goldbach pairs, we would lose our bijection. Unfortunately, this is necessary but not sufficient to prove Goldbach's Conjecture, as the infinitude of sums says nothing about their frequency within %NN%.

There should be other low-hanging fruit here. For example, since it is ...

It is interesting that while the product of two primes is a unique integer, the sum of a Goldbach pair is typically not (e.g. %3+13=5+11=16%). However, because of our one-to-one mapping, we know that the number of semiprimes must equal the number of Goldbach sums.

Let's look at bounding %p% and %q%. If we choose some %C% such that %p <= q <= C%, we yield %(pi^2(C)+pi(C))/2% semiprimes / Goldbach pairs.

(Ignoring %2%s for the moment...)

Let %C=7%.

  • %3 * 3 = 9%
  • %3 * 5 = 15%
  • %3 * 7 = 21%
  • %5 * 5 = 25%
  • %5 * 7 = 35%
  • %7 * 7 = 49%
  • %3 + 3 = 2(3) = 6%
  • %3 + 5 = 2(4) = 8%
  • %3 + 7 = 2(5) = 10%
  • %5 + 5 = 2(5) = 10%
  • %5 + 7 = 2(6) = 12%
  • %7 + 7 = 2(7) = 14%

Let %C = 17%. We obtain the following semiprimes:

  • %3 * 3 = 9%
  • %3 * 5 = 15%
  • %3 * 7 = 21%
  • %3 * 11 = 33%
  • %3 * 13 = 39%
  • %3 * 17 = 51%
  • %5 * 5 = 25%
  • %5 * 7 = 35%
  • %5 * 11 = 55%
  • %5 * 13 = 65%
  • %5 * 17 = 85%
  • %7 * 7 = 49%
  • %7 * 11 = 77%
  • %7 * 13 = 91%
  • %7 * 17 = 119%
  • %11 * 11 = 121%
  • %11 * 13 = 143%
  • %11 * 17 = 187%
  • %13 * 13 = 169%
  • %13 * 17 = 221%
  • %17 * 17 = 289%

All semiprimes will be unique, and they will be bounded above by %C^2% and below by %3^2%.

We obtain the following Goldbach partitions:

  • %3 + 3 = 2(3) = 6%
  • %3 + 5 = 2(4) = 8%
  • %3 + 7 = 2(5) = 10%
  • %3 + 11 = 2(7) = 14%
  • %3 + 13 = 2(8) = 16%
  • %3 + 17 = 2(10) = 20%
  • %5 + 5 = 2(5) = 10%
  • %5 + 7 = 2(6) = 12%
  • %5 + 11 = 2(8) = 16%
  • %5 + 13 = 2(9) = 18%
  • %5 + 17 = 2(11) = 22%
  • %7 + 7 = 2(7) = 14%
  • %7 + 11 = 2(9) = 18%
  • %7 + 13 = 2(10) = 20%
  • %7 + 17 = 2(12) = 24%
  • %11 + 11 = 2(11) = 22%
  • %11 + 13 = 2(12) = 24%
  • %11 + 17 = 2(14) = 28%
  • %13 + 13 = 2(13) = 26%
  • %13 + 17 = 2(15) = 30%
  • %17 + 17 = 2(17) = 34%

All Goldbach pairs will be unique, and their sums bounded above by %2C% and below by %2*3%.

I have been working through this without knowing where I was going, but now that I re-read this last part, it strikes me that the Goldbach bounds are the derivatives of the semiprime bounds with respect to the bound.


I believe the first observation is essentially trivial number manipulation and contains no new useful truth. I have not yet formed an opinion about the utility of the second.

Goldbach as bitfield

I've been trying to visualize the problem in many different ways, tables and graphs and geometry and anything else that seems plausible. Here's one I nailed down this morning that's concise and readily graspable for any coder.

First, ignore the %2%. It just gets in the way.

And then we get going. Starting with %3%, we'll build up a list of primes and I'll show how you can easily check for Goldbach numbers while you do. As you build a string of bits, on each new number, you reverse the bits you have so far and slide it over one.

Much easier understood with a demonstration, starting with %n=6 => n/2=3% and incrementing %n=n+2%, as it must be even.

n=6   3
prm?  1
rev   1

n=8   3 5
prm?  1 1
rev   1 1

n=10  3 5 7
prm?  1 1 1
rev   1 1 1

n=12  3 5 7 9
prm?  1 1 1 0
rev   0 1 1 1

n=14  3 5 7 9 11
prm?  1 1 1 0 1
rev   1 0 1 1 1

n=16  3 5 7 9 11 13
prm?  1 1 1 0  1  1
rev   1 1 0 1  1  1

n=18  3 5 7 9 11 13 15
prm?  1 1 1 0  1  1  0
rev   0 1 1 0  1  1  1

n=20  3 5 7 9 11 13 15 17
prm?  1 1 1 0  1  1  0  1
rev   1 0 1 1  0  1  1  1

All we're doing with rev is reversing the order of the "prime?" bits. If you watch from one to the next, you can see that this is actually resulting in a kind of ongoing shift each time. Observe:

prm?   1  1  1  0  1  1  0  1  1  0  1  0  0  1
n      3  5  7  9 11 13 15 17 19 21 23 25 27 29
6      1
8      1  1
10     1  1  1
12     0  1  1  1
14     1  0  1  1  1
16     1  1  0  1  1  1
18     0  1  1  0  1  1  1
20     1  0  1  1  0  1  1  1
22     1  1  0  1  1  0  1  1  1
24     0  1  1  0  1  1  0  1  1  1
26     1  0  1  1  0  1  1  0  1  1  1
28     0  1  0  1  1  0  1  1  0  1  1  1
30     0  0  1  0  1  1  0  1  1  0  1  1  1
32     1  0  0  1  0  1  1  0  1  1  0  1  1  1

Notice all we are doing, for each new line, is taking %prm?(n)%, sticking it at the beginning of the new line, and shoving over all the rest. Same as above, but easier to see here. Note also the prime bits read down vertically in the regular order.


The thing is, you can check for any and all Goldbach numbers by simply bitwise ANDing two corresponding strings together. (This may actually be an efficient way to code this, should such a need arise.) Take %n=28%:

       3  5  7  9 11 13 15 17 19 21 23 25
prm?   1  1  1  0  1  1  0  1  1  0  1  0
rev    0  1  0  1  1  0  1  1  0  1  1  1
  &    0  1  0  0  1  0  0  1  0  0  0  1

The resulting bits identify which numbers are "Goldbach numbers," primes which will sum to %n%. In this case we get two pairs:

%5 + 23 = 28%
%11 + 17 = 28%

This process will give you all valid Goldbach numbers for a given %n%.

  prm? 1  1  1  0  1  1  0  1  1  0  1  0  0  1
n      3  5  7  9 11 13 15 17 19 21 23 25 27 29
6     (1)                                            3+3=6
8     (1  1)                                         3+5=8
10    (1 (1) 1)                                      3+7=10   5+5=10
12     0 (1  1) 1                                    5+7=12
14    (1  0 (1) 1  1)                                3+11=14  7+7=14
16    (1 (1  0  1  1) 1)                             3+13=16  5+11=16
18     0 (1 (1  0  1) 1) 1                           5+13=18  7+11=18

...and so on. I added the parentheses to emphasize the pairs, but it's just matching up by working from the outsides to the center. It did make me notice something I should have seen before, which is that if you don't want to bother with all the reversing and stuff, you accomplish the same thing by taking the prime bit string, counting in from both sides, and watching for matches that are both 1. Same exact thing.

The astute observer may point out that I'm not actually doing anything in this write-up, which is true. Especially viewed this most recent way, all we're doing is literally taking the Goldbach Conjecture by its definition, and looking manually for pairs of primes that add up. If there is something to be gained from this approach, I suspect it lies in study of the "sliding" nature of the reversal version, and seeing how it conspires not to leave holes.

This is almost certainly biased on my hunch that Goldbach's undoing lies in a thorough examination of the mechanism driving its avoidance of symmetry; to violate Goldbach, there would have to be an instance of a certain symmetry, which would in turn be representative of a periodic pattern to a degree verboten to primes.

But hey, that's just me.

The Subset Sum Problem

I have spent time working on almost every impossible problem there is.

This is a lie, but it is rooted in a kernel of truth. I remember my very first impossible problem. When I was eight or nine, my dad gave me a word problem about firemen and houses or some bullshit, but what it boiled down to was:

Draw three points wherever you want on a paper. Label them %A, B, C%.

Draw three other points, also wherever you want. Label them %1, 2, 3%.

Try to draw lines so that %A, B,% and %C% each have three lines going out to %1, 2,% and %3%—and have none of them cross.

sad diagram

And I spent days drawing nonsense like that (thanks Google Images), even after he tried to convey with a straightforward topological proof why it was impossible. But I wasn't convinced until I tried and tried and tried. It was especially painful because as it turns out, you can get all but one last little line to connect.

I'm older now, and have some modest understanding of math, so one might think I no longer fall for traps like this. Not so. I still have to try. But at least now I know on a rational level that it's a lost cause.

The most recent example is a couple of nights ago when I was reading about the subset sum problem. It's about as easy to explain as the problem above; in layman's terms, it's "Figure out how to quickly look through a bunch of positive and negative numbers and see if you can pick some out that add up to 0."

In a slightly more rigorous sense, it's a problem known to be NP-complete (you would think that might tip me off). Given an arbitrary set of integers, determine whether or not there is any subset thereof that sums to %N%, where %N% is some given integer (but classically %0%.) The trick, of course, is that you have to do it in polynomial time; the most efficient algorithm offered on Wikipedia requires %O(2^(N//2))%.

After thinking on it, my essentially random approach was to mod everything. The logic went like this.

You have %N% numbers. A naive approach would be to check the sum of every possible combination of your numbers, which is a combinatoric nightmare on the order of %O(2^N * N)%. No good.

Well, if we're shooting for %0%, then we can divide our set into positives and negatives. Surely this is an improvement; now instead of dealing with the power set of %N%—that is, the number of subsets in a set of size %N%, which is %2^N%—we've made some progress, since we won't waste time trying sets that are all positive numbers, for example. But it turns out this by itself buys us very little.

Then I go, hey, what if you take every integer modulo %2%? Or, in other words, go through and mark down which numbers are even and which are odd.

Why? Because we know that our potential sum is going to be even (i.e. %0%). We already knew that we'd have to take one or more numbers from each the positives and the negatives. Put those truths together, and it means we know that any group that might work will have a certain property: if the sum of our positive numbers is odd, then the sum of our negatives will have to be odd too, or else the sum of both together would be odd, which would mean not zero. Likewise, if our positives sum is even, the negs have to be even too. Or,

% \forall_P {P \subseteq S, P_i \geq 0}, \sum\P \equiv x \ ("mod"\ 2) \leftrightarrow \forall_Q {Q \subseteq S, Q_i < 0}, \sum\Q \equiv x \ ("mod"\ 2) .%

This is surely good, since on average, half of all possible combinations won't meet that constraint and thus can be discarded out of hand.

But wait, it gets better. Why not take everything modulo %3%? The same principle applies, which is that our overall sum must equal %0 \ ("mod"\ 3)%. So if the sum of the positives %("mod"\ 3)% is %1%, we know we only have to check those groups of negatives which sum to %2 \ ("mod"\ 3)%. And now it seems like we're getting a threefold rather than twofold improvement on our deal.

But wait, it gets better. Why not throw everything so far into a big pot and see what happens? It's not too hard to show that if you take %mod 2%, note which possible combinations remain, and then do the same thing with %mod 3%, you'll generally multiplicatively reduce your potential viable combinations. Then continue this with a handful of ascending integers (maybe primes?) and with almost no work at all, you'll all but perfectly sieve out any invalid combinations.

And so far as I could tell, this is all true. The insidious problem, and this is where I stopped working on this, is that the long cold years have taught me that things are never easy, particularly extremely difficult mathematical things. I have absolutely no doubt that if I sat down and did the math carefully, I would find that I was using up at least as many operations in logistical overhead as I was saving in the sieving process.

That's how these deceptive problems work. You feel like you're making progress, but all you've really done is paint a horse like a zebra; the nature of the thing itself remains unchanged.

Primes are especially brutal about that. Watch out for primes.