Fun with Excel #17 (ft. Python!) – The Beauty of Convergence

Part I: Everything is Four

A friend recently told me that “everything is four.” No, he wasn’t talking about the Jason Derulo album…

Suppose you start with any number. As an example, I’ll pick 17, which yields the following sequence: 17 -> 9 -> 4. Here’s a slightly more convoluted one: 23 -> 11 -> 6 -> 3 -> 5 -> 4. Figured out the pattern yet?

The Answer: for any number you choose, count the number of characters in its written representation. In our first example, the word “seventeen” has 9 characters, and “nine” has 4 characters. In our second example, “twenty three” has 11 characters (not counting spaces), “eleven” has 6 characters, “six” has 3 characters, “three” has 5 characters, and “five” has 4 characters. Try this with any number (or any word, in fact) and you’ll always end up with the number 4 eventually. This is because “four” is the only number in English that has the same number of characters as the value of the number (4) itself.

This occurrence is not unique to the English language: Dutch, Estonian, German, and Hebrew also “converge” to 4, while Croatian, Czech, and Italian converge to 3. Other languages may end up converging to more than one number, or end up in an infinite loop involving two or more numbers.

I decided to explore this phenomenon in more detail by examining how the series converges over a large set of natural numbers (1 through 10,000). Since Excel is unable to graph large data sets efficiently, I needed to relearn some basic Python to help me with this particular project. This was going to be fun…

Anyway, the important takeaway from all of this is that I eventually succeeded. Look at this beautiful wedge:

In this chart, the x-axis represents the number of iterations or steps in a particular series (you can think of it as “time”), while the y-axis represents the value itself. Every point from (0, 1) to (0, 10000) is on the chart, and they are the starting points for the first 10,000 natural numbers. For example, going from the point (0, 17) to the point (1, 9) is one iteration. The points (0, 17), (1, 9), and (2, 4) represent a series (for the starting integer 17), and every series terminates once 4 is reached (in this case, 2 iterations/steps are required to reach 4).

With that explanation out of the way, there are a few observations we can make about the above chart:

Convergence occurs in a fairly uniform manner. On average, the convergence is relatively well-behaved and progresses in a decreasing fashion. Note that I say “on average,” because we know that 1, 2, and 3 are the only natural numbers whose first iterations lead to a larger number (1 -> 3, 2 -> 3, and 3 -> 5). However, every natural number greater than 4 will lead to a smaller number, and given how the English language works…
Convergence occurs very quickly. English is pretty efficient when it comes to representing numbers in written form. Larger numbers will in general require more characters, but not that many more. For example, among the first 10,000 natural numbers, the “longest” ones only require 37 characters (e.g. 8,878 being one of them). This leads us to our second chart…

This shows the sum of all 10,000 series over time. At Time 0, the sum is 50,005,000, but that drops to 292,407 after just one iteration. After Time 6, each of the first 10,000 series will have converged to 4 and terminated. If we define “stopping time” as the number of iterations/steps it takes for a series to reach 4, then the stopping times of the first 10,000 natural numbers are shown below (along with a histogram of how often each stopping time occurs):

The average stopping time is 4.4, with a standard deviation of 1.2. Furthermore, the vast majority of stopping times fall under 3, 5, or 6. Note the interesting pattern that forms from the series that have a stopping point of 4 in the first chart.

Part II: The Collatz Conjecture

Truth be told, “Everything is Four” feels a bit gimmicky for a mathematical rule (well, because it technically isn’t), so here’s a rule that is better defined: Start with any positive integer n. If n is even, then divide n by 2 (n / 2). If n is odd, then multiple by 3 and add 1 (3n + 1). Continue this indefinitely.

The Collatz Conjecture states that for any given n, the sequence you get from following the above rules will eventually converge to 1. Note that it’s called a conjecture, meaning that despite being proposed in 1937, it remains unsolved!

Now, let’s see how the Collatz sequence converges over the first 10,000 natural numbers. I’ve included 3 charts to show how things look at different “zoom” levels.

A few observations:

Convergence does not occur in an uniform manner. The chart almost looks like many different seismographs stacked on top of each other. While each series does eventually reach 1, how they get there appears to be somewhat random and not well-defined at all, filled with both sudden spikes and collapses over time. In any event, it’s a heck of a lot more interesting than our first sequence. Moreover, relative to our first sequence…
Convergence takes a long time. Recall that in our first sequence, no series among the first 10,000 natural numbers lasted for more than 6 iterations before converging to 4. In the Collatz sequence, however, the number 6,171 takes 261 iterations to reach 1.

This chart shows the sum of all 10,000 series over time. At Time 0, the sum is 50,005,000, but that spikes to 87,507,496 at Time 1 before dropping to 59,381,244 at Time 3. The beginning of the chart looks a bit like a failing cardiogram, and things quickly get weird after that. Of course, the sum eventually reaches 0, but the way it decays appears random. The stopping times of the first 10,000 natural numbers are shown below:

Wow! The first chart is really something isn’t it? It certainly doesn’t appear to be 100% random, and the fact that there seems to be some structure to the stopping times of the Collatz sequence could be a sign that a proof for the Collatz Conjecture can eventually be found. The average stopping time is 85.0, with a standard deviation of 46.6. Moreover, with a median of 73 and a mode of 52, the distribution of the stopping times appears to be right-skewed. An examination of the same chart featuring the first 100 million natural numbers confirms this.

Anyway, there wasn’t really a point to this post other than to show that there is often a lot of beauty hidden under the surface of mathematics.

Let me know if you would like to see more posts like this!

You can find my backups for both the data and the Python code here.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Jeffrey Fan

Random Musings of an Amateur Data Scientist

Fun with Excel #17 (ft. Python!) – The Beauty of Convergence

Related

Share this:

Related