1. Probabilistic approach to prime counting
Chris De Corte
chrisdecorte@yahoo.com
KAIZY BVBA
Beekveldstraat 22
9300 Aalst, Belgium
February 2, 2014
1
2. The goal of this document is to share with the mathematical community the finding of a new
mathematical relationship with regard to prime counting that originated out of our probabilistic approach to the occurrence of primes.
One of the side results will be a new formula for the natural logarithmic function as well.
2
4. 3
1
DERIVATION OF FORMULA’S
Key-Words
Prime counting, Logarithmic Integral, Riemann, Gauss, Euler-Mascheroni.
2
Introduction
The following document originated out of our interest for primes.
We remain surprised by the fact that the young Riemann was at the origin of such a complex
formula as a proposition for the prime-counting function [1]
π(x) =
x
ln(x) − 1
(1)
or others [1] using the logarithmic integral:
x
li(x) =
2
dt
ln(t)
(2)
Nowhere did we ever see a theoretical explanation where this first formula came from. For
this reason, we started our quest to derive a (better) formula based on theoretical grounds.
The question was: how?
3
3.1
Derivation of formula’s
Lemma: Probability of finding a prime
The probability (Pr) to find a next prime immediately after an existing prime pi can be
represented by:
x=pi
(1 − 1/pi )
P r(pi+1 ) =
(3)
i=2
Proof:
Suppose that we have a classical x-axis which starts at 0 and which has nodes that count up
according to the natural numbers. We will see every natural number as a node on this x-axis.
4
5. 3.1
Lemma: Probability of finding a prime
3
DERIVATION OF FORMULA’S
Suppose our task is to touch every natural number with a graph. The easiest way to do this
would be to create a sinus like function with a period T=2. This graph would immediately go
through 0, 1, 2, ... hence through all the natural numbers. We call it the 1-graph. There is no
way to find a vacant spot any more on this x-axis now. So, if a vacant spot would represent a
chance of a possible prime number, there would be no chance of finding a prime number any
longer.
The above is the same as saying that every natural number is divisible by 1.
We will now delete our 1-graph and start with the new challenge to touch all the nodes from
2 up on the x-axis using a combination of different graphs than the 1-graph. Remember: our
challenge is to touch all the nodes on the x-axis without using the 1-graph. So, we will try to
fulfill the fundamental theorem of arithmetic [1] using pi − graph s only.
So, immediately after 1 but before 2: what is our chance of finding a prime? This will be
100% since all the nodes are still free and the next number that we will find will only be
divisible by 1 or itself. This is because there are no natural numbers in between. The next
number we find is 2 and 2 is a prime.
The most logic first graph in this new challenge will be the 2-graph and it will go through:
0, 2, 4, ... The probability of finding vacant spots on our x-axis (from 2 onward) will now be
represented by the formula (1 − 1/2). So, the number of vacant nodes along the x-axis immediately diminishes with 1/2. These vacant spots can temporarily be interpreted as possible
prime candidates over a short distance along the x-axis. And it seems to be true as the next
vacant spot (prime) is node 3.
The next logical graph that we will add will be the 3-graph and will go through 0, 3, 6, ... One
would at first think that the number of possible vacancies along the x-axis would immediately
diminish with an extra 1/3 but that is not completely correct since certain points (like 6, 12,
18, ...) would be touched by the 2-graph as well as by the 3-graph and we should not double
count them. The correct number of vacant spots on our x-axis (from 3 onward) will now be
represented by the formula (1 − 1/2).(1 − 1/3).
From 3 onward the chance of finding a next prime will hence be 33%.
If we continue with the same reasoning, the next vacant spot (prime) we find is node 5. So, the
next logical graph will have to go through 0, 5, 10, ... Now, we must watch out not to overdo
with avoiding the double counting (1/2x3, 1/2x5 and 1/3x5) hence we will have to correct
for a triple counting with adding a factor for 1/(2x3x5). The correct number of vacant spots
(from 5 onward) on our x-axis will however simplify to the formula (1−1/2).(1−1/3).(1−1/5).
Our new formula will be valid till we find the next vacancy: prime 7.
We are now ready to believe that the probability for finding a vacancy (read a prime) immediately after x = pi will be represented by:
x=pi
(1 − 1/pi )
P r(pi+1 ) =
(4)
i=2
5
6. 3.2
Theorem: Probabilistic prime counting formula
3
DERIVATION OF FORMULA’S
In the tables, we call this probability the total probability. The individual contribution to this
probability by each different prime pi , we will call the individual probability and is simply
calculated as (1 − 1/pi ).
We can mention already that the contribution of this total probability over a distance ∆x to
the prime counting will then be represented by:
x=pi
(1 − 1/pi )
∆x.
(5)
i=2
3.2
Theorem: Probabilistic prime counting formula
The probabilistic prime counting formula can be represented as follows:
x x=pi
(1 − 1/pi ).dx with α ≈ 1.7810292
π(x = pi ) = α.
2
(6)
i=2
α can be very closely approximated as:
α = eγ where γ ≈ 0.57721 is the Euler − M ascheroniconstant
(7)
Proof (including an heuristic argument)
In order to derive the probabilistic prime counting formula, we have to multiply each new
probability of finding a prime with the distance over which this probability is valid (until the
next prime) and sum up all these values. This can be approximated as:
x=pi
π(x = pi ) =
∆(x)
x=pi
{[
{P r(pi+1 ).∆(x)} =
(1 − 1/pi )].(pi − pi−1 )}
(8)
i=2
i=2
This is like calculating a surface over an irregular step function and can be represented with
an integral symbol. This method is an integration by parts.
However after initial tests on the first 50 million primes (see Method and Techniques),
We came to the conclusion that the above probabilistic prime counting function should
be corrected with a constant α. Therefore, if we refer to table 1, we can check that α =
50847534/28549683. As a conclusion, we come to formula 6.
6
7. 3.3
3.3
Corollary: new formula for natural logarithm
4
METHOD AND TECHNIQUES
Corollary: new formula for natural logarithm
We can now compare formula 2 with formula 6 and conclude the following new formula for
the natural logarithm:
ln(x = pi ) =
β
x=pi
i=2 (1
− 1/pi )
with β = 0.5614575
(9)
β can be very closely approximated as:
β = e−γ where γ ≈ 0.57721 is the Euler − M ascheroniconstant
(10)
Proof
The proof is trivial if we explain that we introduce β = 1/α. We must mention however that
we have optimized β in this case to come as close to li(x) as possible (using a slightly different
α. Therefore, if we refer to table 1, we can check that β = 28549683/50849229.
4
Method and techniques
We have developed a C++ program to verify the above logic for the first 50.7 million primes.
The result is stored in 1000 separate files of each around 7 Megabytes. A summary of this
information can be found in tables 1 and 2.
4.1
Probabilistic Prime Counting formula
We will now explain the calculation steps done and refer to the respective column in both
tables where this information can be found.
In the first column of both file types we have put a counter that basically will count the
primes.
In the second column, we have listed the first 50.7 million primes.
In a third calculation we have calculated the individual probability of the prime of that row
using the formula:
Ind P rob(pi ) = (1 − 1/pi )
(11)
Because of space reasons, we have not listed this calculation in table 1 but it can be found in
the third column of the separate files. The calculation is however trivial.
7
8. 4.2
New formula for natural logarithm
4
METHOD AND TECHNIQUES
In a next calculation, we calculated what we call the total probability of all the primes until
that prime according to the formula:
x=pi
(1 − 1/pi )
T ot P rob(pi ) =
(12)
i=2
The result of this calculation can be found in column 3 of table 1 and in column 4 of the
separate files.
In the next column (My Pure), we calculated formula 6 but with α = 1. So, it comes to
calculating formula 7. The reason for this is that we are then able to calculate the results for
all the primes first and then estimate α afterwards based on the last prime calculated. We
multiplied the previous probabilistic value with the difference of this prime with the previous
prime (∆x) and added this value to all the previous values (integration by parts).
In the next column (My Integr), we did exactly the same as in the previous column but we
corrected α = 1.7810292 from prime 2 onward.
In the next column, we simulated the logarithmic integral (formula 2) in a similar discrete
fashion:
li(pi ) = li(pi−1 ) +
pi − pi−1
ln(pi )
(13)
As mentioned before, for mostly unknown reasons (see later), we came to the conclusion that
the difference between my pure probabilistic integral and the real prime count seems to be
constant and comes to a factor 1.7810292 if we want to correct for primes upto 1 billion.
To illustrate the difference between my pure probabilistic integral and the logarithmic integral, we came to a factor 1.7810785 or 1/0.5614575.
We will use the first constant to optimize our prime count formula and the second to demonstrate our formula for the natural logarithm.
The results are demonstrated in the table 1.
4.2
New formula for natural logarithm
We were also eager to check our new formula for the natural logarithm and therefore we have
added 2 more columns to our separate tables. One in which we approximate the logarithm
based on our formula: we dividend our β with the total probability of all the primes until
that prime. Another column in which we calculated the natural logarithm the standard way.
A summary of this can be seen in table 2.
8
9. 4.2
New formula for natural logarithm
4
METHOD AND TECHNIQUES
Table 1: In this table, we calculate the probability for a new prime immediately after another one
(”My Integr”) and compare it with the logarithmic integral (”li(x)”) and with the perfect
prime count (”Count”). My integr has been corrected with a factor α = 50847534/28549683
versus My pure. We can also deduct β = 28549683/50849229.
Count
P rime
T ot prob
M y pure
M y integr
li(x)
1
2
0.5
1
1
1
10
29
0.1579472231019522
6.82
11.37
11.77
100
541
0.0887498837753298
59
105
106
1000
7919
0.06246659294666621
569
1012
1015
10000
104729
0.04855897117165969
5632
10030
10037
100000
1299709
0.03988065693618553
56197
100088
100107
1000000
15485863
0.03391365652866189
561668
1000346
1000408
10000000
179424673
0.02954214659629333
5614822
10000162
10000496
20000000
373587883
0.02844462187012779
11230033
20001015
20001621
30000000
573259391
0.02784067067191051
16844593
30000712
30001582
40000000
776531401
0.02742791846692972
22459541
40001097
40002229
50000000
982451653
0.02711633109429194
28074042
50000689
50002077
50847534
999999937
0.02709315486988111
28549683
50847819
50849229
Table 2: In this table we calculate the natural logarithm of the respective prime using our new formula
(”My ln(x)”) and compare it with the values using the standard formula (”ln(x)”).
Count
P rime
M y ln(x)
ln(x)
1
2
1.122915132600837
0.6931471805599453
10
29
3.554716286072388
3.367295829986474
100
541
6.326290721932071
6.293419278846481
1000
7919
8.988125329322656
8.977020214210413
10000
104729
11.56238595574076
11.55913134035915
100000
1299709
14.078443271354
14.07765095122063
1000000
15485863
16.55550075604224
16.55543810118943
10000000
179424673
19.00530702704133
19.00526602879025
20000000
373587883
19.73861944320852
19.73866383070949
30000000
573259391
20.16681181703334
20.16684886160037
40000000
776531401
20.47029441834519
20.47034763888541
50000000
982451653
20.70551374918884
20.70556169235444
50847534
999999937
20.72322581098074
20.72326577394641
9
10. 6
5
ACKNOWLEDGMENTS
Discussions
The question whether the lower part of the integral in formula 6 needs to start at 2 or maybe
at 1 or 0 as well as the fact that the first value ”1” in table 1 for ”MyIntegr” is the correct
one can be answered as follows: In the range between 1 and before 2, the chance of finding
a prime according to our logic is 100 percent. And our challenge only started at 1. So, the
value for ”MyIntegr” could then be represented by the formula: π(x) = (x − 1). There is
no need to have a correction factor here since this formula suits us fine as the starting point
π(1) = 0 and end point π(2) = 1 are correct according to our logic and according to the facts.
The question of why we had to correct our probabilities with a factor α from 2 onward remains
open. And to be honest, we can’t give a good mathematical answer. By using probabilities,
we assumed that the probability of finding a prime is flat over some range ∆(x). However,
this flat probability might not be the case over some longer distance. Immediately after a
prime, we know that the next number will not be a prime since it will be even. However,
there is a sort of higher chance that the number after this even number might be a prime. We
could explain this by assuming that a bigger part of our periodic waves are still in the phase
of going up or turning to come down but just not yet in the phase of touching the x-axis
again. So we are actually always moving before the bigger waterfall of prime multiples that
come down on us.
It is logic from our side to try to optimize α according to our needs. In the first part, we
wanted to optimize it for getting a better prime counting formula, in the second part, we
wanted to optimize it to come as close as possible to the natural logarithm.
6
Acknowledgments
It was only after the writing of this article that we found out that earlier work of Harald
Cram´r had similarities with the content of this article. However, what the Cram´r model
e
e
didn’t have was our ”infinitesimal” intake of using different probabilities for each distance
∆(x) = pi − pi−1 as we are doing in formula 8 and 6. Instead, it seems, Cram´r used the same
e
√
probability for all p ≤ x.
I would like to thank this publisher, his professional staff and his volunteers for all the effort
they take in reading all the papers coming to them and especially I would like to thank this
reader for reading my paper till the end.
I would like to thank my wife for supporting me during the countless hours I spend behind
my desk.
10