Fibonacci numbers appear in so many unexpected places that I am sure you have already seen them. They are elements of the Fibonacci sequence $F_n$ defined by $$ \begin{aligned} F_0 &= 0,\qquad F_1 =1, \\ F_n & = F_{n-1} + F_{n-2}, \qquad \text{ for } n >1. \end{aligned} $$ Obviously, this recursive formula gives infinitely many Fibonacci numbers. We also know that there are infinitely many prime numbers: the ancient Greeks knew it (actually proved it) in 300 BC!
But, to this day, we still do not know if there are infinitely many prime numbers in the Fibonacci sequence. These numbers form the set of Fibonacci primes. Its (in)finiteness is one of the still unsolved problems in mathematics.
In this activity, you will compute a few initial Fibonacci primes, while reviewing some python features along the way, such as
generator expressions, yield
, next
, all
, line magics, modules, and test functions. Packages we shall come across include memory_profiler
, primesieve
, and pytest
.
Representing sequences is one of the elementary tasks any programming language should be able to do well. Python lists can certainly be used for this. For example, the following list comprehension gives elements of the sequence $$ n^i, \qquad n = 0, 1, 2, \ldots, N-1 $$ succinctly:
i=2; N=10
L = [n**i for n in range(1, N)]
If you change the brackets to parentheses, then instead of a list comprehension, you get a different object called generator expression.
G = (n**i for n in range(1, N))
Both L
and G
are examples of iterators, an abstraction of a sequence of things with the ability to tell, given an element, what is the next element of the sequence. Since both L
and G
are iterators, you will generally not see a difference in the results if you run a loop to print their values, or if you use them within a list comprehension.
[l for l in L]
[g for g in G]
However, if you run the last statement again, what happens?
[g for g in G]
The difference between the generator expression G
and the list L
is that a generator expression does not actually compute the values until they are needed. Once an element of the sequence is computed, the next time, the generator can only compute the next element in the sequence. If the end of a finite sequence was already reached in a previous use of the generator, then there are no more elements of the sequence to compute. This is why we got the empty output above.
Just as list comprehensions can be viewed as abbreviations of loops, generator expressions can also be viewed so using the yield
statement. The statement
G = (n**i for n in range(1, N))
is an abbreviation of the following function with a loop where you find yield
in the location where you might have expected a return
statement.
def GG():
for n in range(1, N):
yield n**i
G2 = GG()
print(*G2) # see that you get the same values as before
The yield
statement tells python that this function does not just return a value, but rather a value that is an element of a sequence, or an iterator. Internally, in order for something to be an iterator in python, it must have a well-defined __next__()
method: even though you did not explicitly define anything called __next__
when you defined GG
, python seeing yield
defines one for you behind the scenes.
Recall that you have seen another method whose name also began with two underscores, the special __init__
method, which allows you to construct a object using the name of the class followed by parentheses. The __next__
method is also a "special" method in that it allows you to call next
on the iterator to get its next value, like so:
G2 = GG()
# get the first 3 values of the sequence using next:
next(G2), next(G2), next(G2)
print(*G2) # print the remaining values of the sequence
As you can see, a generator "remembers" where it left off in a prior iteration.
It might seem that generators are dangerous disposable objects that are somehow inferior to resuable lists which have all the same properties. Here is an example that checks that thinking:
i = -20
N = 10**8
To compute the sum $$ \sum_{n=1}^{10^8} \frac{1}{n^{20}}, $$ would you use the following list comprehension?
sum([n**i for n in range(1, N)])
If you do, you would need to store the created list in memory. If you install the memory_profiler
and use it as described in the prerequisite reading material from [JV-H], then you can see memory usage easily. If you don't have a GB of RAM free, be warned that running this list comprehension (mentioned above, and in the cell after next) might crash your computer.
%load_ext memory_profiler
%memit sum([n**i for n in range(1, N)])
Per official standards, memory should be reported in mebibytes (MiB), a power of two that is close to $10^3$ ("mebi" is made up of words "mega" and "binary"), although the commerical world continues to use 10-based MB, GB, etc. The "increment" reported in the above output in MiB is the difference between the peak memory and the memory used just before memit
was called: that gives the memory used by the statement.
Clearly we should not need so much memory for such a simple task. A better solution is offered by the generator expression. It has no memory issues since it doesn't store all the elements of the sequence at once. Moreover, we can decide to stop iterating when the numbers in the sequence get below machine precision, thus getting to the sum faster.
G3 = (n**i for n in range(1, N))
s = 0
for g in G3:
s += g
if g < 1e-15:
break
print(s)
By now you are wondering, if we can work with a sequence of $10^8$ entries, then why can we not work with an infinite sequence. Yes, python makes it easy for you to make an infinite sequence construct:
def natural_numbers():
n = 0
while True:
yield n
n += 1
for n in natural_numbers():
print(n)
if n >= 5: break # don't go into infinite loop!
In fact the function count
in module itertools
does just this. Python does assume that you are smart enough to use these without sending yourself into infinite loops. If you want to stay safe, then avoid using while True
, replacing it with while n < max
where max
is some maximum number, a sentinel, that you never plan to exceed.
To generate $F_n$ satisfying $$ F_0= 0,\; F_1 =1, \quad \forall n>1: F_n = F_{n-1} + F_{n-2}, $$ we use a generator that keeps in memory two prior elements of the sequence, as follows.
def fibonacci(max):
f, fnext = 0, 1
while f < max:
yield f
f, fnext = fnext, f + fnext
Fn = fibonacci(10000)
print(*Fn)
Note that we have used python's tuple swap idiom in the definition of fibonacci
above. To understand it, note the evaluation order of expressions
expr3, expr4 = expr1, expr2
per the official documentation. The tuple swap idiom is an example (yet another) of how python achieves brevity without compromising elegance or readability.
Let's make a generator for the infinite prime number sequence. This classic example is beautifully discussed in [JV-H], which I suggest you read, if you have not already. Here is a standard method to generate the set $P$ of all primes less than some $N$. Suppose at any stage of the generator algorithm, a subset $P = \{ 2, 3, \ldots, q\}$ of primes up to and including $q$ have been found. The prime number generator should find the next prime number by checking if any element of $P$ divides $n$ for a number $n$ greater than $q$: if the remainder in the division of $n$ by $p$ is nonzero for all $p \in P$, then $n$ is the next prime.
For example, at some stage, suppose $P$ is this:
P = [2, 3]
Then, the next number $n=4$ has remainders $4\%p$ given by
[4 % p for p in P]
Clearly not all
of the remainders are nonzero:
all([4 % p for p in P])
Hence the generator would conclude that the number $4$ is not a prime, and proceed to the next case $n=5$, which it would conclude is a prime because:
all([5 % p for p in P])
This is implemented below.
def prime_numbers(N):
primes = []
q = 1
for n in range(q+1, N):
if all(n % p > 0 for p in primes):
primes.append(n)
q = n
yield n
list(prime_numbers(70))
Now we can generate all primes less than any number $N$ and all Fibonacci numbers less than $N$. Listing Fibonacci primes less than $N$ then becomes possible by simply intersecting the two sets. Python does have a set
data structure which comes with a handy intersection method, so the code is trivial:
def fibonacci_primes(N):
F = set(fibonacci(N))
P = set(prime_numbers(N))
print('Intersecting', len(P), 'primes with', len(F), 'fibonaccis.')
return P.intersection(F)
fibonacci_primes(100000)
Verification refers to the process of cross checking that a program behaves as expected in a few chosen cases. It is the developer's responsibility to ensure that a code is correct. Part of this responsibility involves designing and adding test functions that verify that the code produces the correct output in a few cases where correct output is known. Complex codebases with multiple files often have a suite of test functions. After a development team member changes one file, if the test suite does not pass all tests, it is a sign that the change has broken the code functionality due to some unanticipated repercussions of the change in other parts of the code.
How do you know that the result of your fibonacci_primes
is correct? We could design checks, say by verifying that our prime number routine is correct for the first few primes, plus a similar check for Fibonacci numbers. Or, we could look up the known Fibonacci prime numbers and see if we have got the first few correctly. Design of such tests is the process of verification. While there no standard method for it, one often used principle is to code with all cases in mind and test using known cases.
Let us use the Online Encylopedia of Integer Sequences (OEIS) which reports the currently known values of $n$ for which $F_n$ is prime. It is listed there as sequence A001605. Here are the first 10 elements of this sequence:
nFP = [3, 4, 5, 7, 11, 13, 17, 23, 29, 43]
Based on this we can write a test function. A test function has a name that begins with test
and does not take any argument. In the test function below you see a statement of the form assert Proposition, Error
, which will raise an AssertionError
and print Error
if Proposition
evaluates to False
(and if True
, then assert lets execution continues to the next line). The test checks if our list of initial Fibonacci primes coincides with the one implied by nFP
above.
def test_fibonacci_prime():
N = 10000
F = list(fibonacci(N))
nFP = [3, 4, 5, 7, 11, 13, 17, 23, 29, 43]
our_list = fibonacci_primes(N)
known_list = set([F[n] for n in nFP if n < len(F)])
assert len(known_list.difference(our_list))==0, 'We have a bug!'
print('Passed test!')
test_fibonacci_prime()
One of the python modules that facilitates automated testing in python codes
is the pytest
module. If you run pytest
within a folder (directory), it will run all test functions it finds in all files of the form test_*.py
or *_test.py
in the current directory and its subdirectories. Please install pytest
in the usual way you install any other python package.
To illustrate its use, let us make up a simple project structure as follows. This also serves as your introduction to modules in python. Please create a folder fibonacci_primes
and files my_simple_primes.py
, fibonacci.py
and test_fibonacci_primes.py
within the folder as shown below:
fibonacci_primes <- create this folder within ../pyfiles
├── fibonacci.py <- define functions fibonacci & fibonacci_primes
├── my_simple_primes.py <- copy prime_numbers function definition here
└── test_fibonacci_primes.py <- copy test_fibonacci_prime function here
Individual files may be thought of as python modules and you can import from them the way you have been importing from external packages. Since fibonacci.py
uses prime_numbers
which is now in a different file my_simple_primes.py
, you should add
the line
from my_simple_primes import prime_numbers
at the top of fibonacci.py
. Additionally, since test_fibonacci_primes.py
uses the functions fibonacci
and fibonacci_primes
, in order for the test function to find these function definitions, you should include the line
from fibonacci import fibonacci, fibonacci_primes
at the top of test_fibonacci_primes.py
.
Now you have a project called fibonacci_primes
on which you can apply automated testing programs like pytest
, as shown in the next cell. Note that a test function will run silently if all tests pass (without printing any outputs).
!pytest ../pyfiles/fibonacci_primes
To see how the output would be like if a test fails, you might want to run this again after deliberately creating a bug: for example, set the initializer for Fibonacci recurrence to 2 instead of 1 in the file fibonacci.py
and then return to run the above pytest to see what happens.
While coding up the prime number generator, did you get that nagging question in your mind, the one that we all get when coding up a basic algorithmic task in python? May be there is already a module for this?
Yes, indeed, the few lines we implemented above to get the prime numbers actually form an ancient algorithm, called the Sieve of Eratosthenes, which is implemented in many places. An example is a python binding for a C library called primesieve
. (You might need to install python-primesieve and primesieve depending on your system.) After you install it, the following two lines will give you the same prime number list.
from primesieve import primes # do after you have installed primesieve
list(primes(70))
This package is many times faster than our simple code with a python loop above, as you can verify using another %magic
. (Recall that iPython/Jupyter facilties like %timeit
and %memit
are called line magics. You can read more about line and cell magics in your [JV-H].)
%timeit primes(1000)
%timeit list(prime_numbers(1000))
Finding bigger and bigger primes is kind of like finding rare bit coins. Indeed, the difficulty of factoring the product of two large prime numbers is the basis of several encryption techniques. There is a world-wide effort to find more and more primes. For example, GIMPS, the Great Internet Mersenne Prime Search, discovered the largest (currently) known prime number, $2^{82,589,933}-1$, having 24,862,048 digits (on December 7, 2018, using a computer in Ocala, Florida, whose owner decided to download the free GIMPS software to stress test his computer).
Our simple function fibonacci_primes
was not designed to go above a finite maximal N
, so of course, it can make no contribution to answering the unsolved question on finiteness of the set of Fibonacci primes. To write a code to find larger and larger Fibonacci primes, one might consider two options:
OR
The few Fibonacci numbers we saw above looked quite sparse so Option 1 might look good, but it would require us to test whether a number is prime or not, which as we saw involves quite a bit of effort as the numbers get larger.
Option 2 could work as a good strategy, especially when more and more primes are discovered, provided we know how to test if a number is in the Fibonacci sequence. Using some completely elementary mathematics (and without having to use any fancy theorems you haven't yet studied) you can prove the following. (Do be warned that proving this is not a 5-minute exercise; if you can do it in 5 minutes, I'd love to hear!)
$\quad$
Theorem 1. A number $F$ is a Fibonacci number if and only if $5F^2+4$ or $5F^2 - 4$ is a perfect square.
$\quad$
Let's close by implementing this check for a number to be a perfect square using math.sqrt
and let's use it together with primesieve
to get a Fibonacci prime.
import primesieve, math
def is_square(n):
s = int(math.sqrt(n))
return s*s == n
it = primesieve.Iterator()
it.skipto(2**28-1)
p = it.next_prime()
while p < 2**30-1:
if is_square(5*p*p+4) or is_square(5*p*p-4):
print('¡¡ Got one !! ', p, 'is a Fibonacci prime!')
p = it.next_prime()
Do feel free to experiment increasing the 2**30
limit above. But may be it is now time to manage your expectations a bit. A few decades ago, $2^{31}-1$ was the largest integer representable on most computers. Now that 64-bit computing is common, we can go up to $2^{63} -1$ (and a bit more with unsigned integers). To go beyond, not only will we need much faster programs, but also specialized software to represent larger integers and do arithmetic with them.
Author: Jay Gopalakrishnan
License: ©2020. CC-BY-SA
$\ll$Table of Contents