Properties of a Novel Binary Representation of Integers using Base $\phi$
August 22, 2024 · View on GitHub
Introduction
This project explores a novel method for encoding integers using a representation based on the golden ratio (), known as the irrational base . This encoding method leverages the unique mathematical properties of to create a binary-like representation that inherently avoids certain binary sequences. The exploration focuses on analyzing the characteristics of this representation, particularly when reinterpreted as standard binary numbers, and the unexpected findings related to prime density in the transformed number set.
It's conjectured the binary sequence 101 never occurs in this representation. This has been experimentally verified on millions of values.
You can run the available tests on this function by cloning this repository, installing the required dependencies and running:
$ python test.py
Here's what it looks like:
Original Integer irradix Representation Interpreted as Binary isPrime Prime Comparison Binary Expansion (%)
------------------------------------------------------------------------------------------------------------------------------------
1 1 1 False 0 100.00
2 10 2 True 3 100.00
3 11 3 True 3 100.00
4 100 4 False 0 100.00
5 110 6 False 2 100.00
6 111 7 True 1 100.00
7 1000 8 False 2 133.33
8 1001 9 False 0 100.00
Mathematical Foundation
Base Representation
The golden ratio is defined as:
Encoding integers using base involves representing numbers in a non-standard manner, where the expansion factor in terms of binary bits is approximately:
This suggests that base encoding requires around 44% more bits than traditional binary encoding. Despite this apparent inefficiency, base encoding has unique properties, such as naturally excluding the sequence "101," which we use as a delimiter.
Conversion Algorithms for Base Encoding: irradix and derradix
This section provides a detailed explanation of the conversion algorithms used in the irradix and derradix functions, including their mathematical foundations. Both algorithms revolve around the idea of encoding and decoding integers using a base representation, where is the golden ratio.
irradix Algorithm
The irradix function converts a positive integer \( n \) into its base representation. The process involves iterative calculations that decompose the number into a series of coefficients that correspond to powers of .
The result, \( r \), is the string of digits that represents \( n \) in base .
derradix Algorithm
The derradix function performs the inverse operation, converting a base representation back into the original integer. It interprets the base digits and reconstructs the number through a series of multiplication and summation steps, applying the ceiling function at each step.
In this equation, \( r_i \) represents the digits of the base encoded number, and the sum reconstructs the original integer.
Iterative Radix Conversion with Ceil Function
As discussed earlier, in the derradix function, we must apply the ceiling operation iteratively at each step of the summation:
This corresponds to the code, where the ceiling is applied after each multiplication by and addition of the next digit.
Notes
-
Sign Handling: The above algorithms omit sign handling for clarity. In practice, if the original integer is negative, the algorithm first converts it to a positive number, performs the conversion, and then adds a negative sign to the final result.
-
Termination Conditions in
irradix:- \(\text{thresh}\): A small threshold value that stops the loop once the magnitude of \( w_0 \) is sufficiently small.
- \(\text{quanta}\): Represents a precision level, ensuring that the algorithm's results are accurate.
- \(\epsilon\): A small value related to the precision of floating-point arithmetic, ensuring that the loop halts when further iterations no longer contribute meaningful digits.
Exploring Base Representation
Analysis and Observations
We analyzed the first 1,000 integers encoded using the base system, converting these representations back into integers by treating them as binary numbers. The analysis revealed unexpected behavior in the distribution and density of prime numbers in the transformed set, particularly when compared to the original integer set.
Prime Density Anomaly
Interestingly, when we mapped the first 1,000 integers through the base representation, we observed an unexpected prime density in the reinterpreted binary set. Given the nature of the expansion factor, one would anticipate a decrease in prime density by a factor corresponding to the expansion ratio. However, our findings suggest that the prime density in the reinterpreted set is significantly higher than expected.
This anomaly indicates that the base mapping may lead to a disproportionately higher number of primes in the transformed set compared to a uniform distribution. This finding could imply deeper number-theoretic properties associated with base representations, potentially linked to how interacts with the distribution of primes.
Statistical Considerations
From a statistical number theory perspective, the observed increase in prime density could be an artifact of how base encoding clusters certain types of integers. Given that base avoids certain sequences, it might be preferentially preserving numbers that are prime when interpreted as binary, thus inflating the prime density in the resulting set. Further exploration into this phenomenon could involve analyzing whether similar patterns emerge with other irrational bases or if this effect is unique to .
Integer Packaging and Efficiency
Delimiter-Based Packing
One of the key applications of the base representation explored in this project is integer packaging. The exclusion of the "101" sequence in the base encoded strings makes this encoding suitable for delimiter-based packing schemes, where "101" serves as a marker for separating individual encoded integers. This approach is particularly space-efficient as it avoids the need for additional length signifiers that are common in traditional length-type-value or length-value encoding schemes.
Efficiency of Packing
The practical efficiency of this packing method lies in its ability to represent multiple integers in a compressed format, using the inherent properties of the base system to delimit sequences without explicit markers. However, this method does introduce some redundancy, particularly when sequences must be padded to prevent unintentional "101" patterns. Despite this, the space savings from avoiding explicit length markers can be significant, making this method advantageous in certain contexts.
Connection to Binary Representation of 5
It is noteworthy that the sequence "101" corresponds to the binary representation of the number 5. In the context of base encoding, the exclusion of this sequence as a delimiter creates an interesting intersection between number theory and binary encoding. This connection suggests potential links to other irrational bases derived from primes, such as \(\sqrt{7}\), which might naturally avoid other specific sequences like "111" (binary for 7).
Implications for Prime Density and Number Theory
Expected vs. Observed Prime Density
Given the expansion factor of approximately 1.44 in base encoding, one would expect the density of primes in the reinterpreted binary set to decrease proportionally. Specifically, if the prime density in the original set is around 16.8%, the expected prime density in the binary reinterpreted set should be approximately:
However, the observed prime density in the binary set is significantly higher, around 9.5%. This discrepancy suggests that the base mapping might be influencing the distribution of primes in a way that is not immediately apparent from a uniform distribution perspective.
The first and last few lines of the data table:
Original Integer irradix Representation Interpreted as Binary isPrime Prime Comparison Binary Expansion (%)
------------------------------------------------------------------------------------------------------------------------------------
1 1 1 False 0 100.00
2 10 2 True 3 100.00
3 11 3 True 3 100.00
4 100 4 False 0 100.00
5 110 6 False 2 100.00
6 111 7 True 1 100.00
7 1000 8 False 2 133.33
8 1001 9 False 0 100.00
...
988 10000000000100 8196 False 0 140.00
989 10000000000110 8198 False 0 140.00
990 10000000000111 8199 False 0 140.00
991 10000000010000 8208 False 2 140.00
992 10000000010010 8210 False 0 140.00
993 10000000010011 8211 False 0 140.00
994 10000000011000 8216 False 0 140.00
995 10000000011001 8217 False 0 140.00
996 10000000011100 8220 False 0 140.00
997 10000000011110 8222 False 2 140.00
998 10000000011111 8223 False 0 140.00
999 10000001000000 8256 False 0 140.00
1000 10000001000010 8258 False 0 140.00
Prime Density:
Original: 16.80%
Interpret base-phi rep as binary: 9.40%
Prime Comparison Key:
3: Both Original and Binary Interpretations are Prime
2: Original is Prime, Binary Interpretation is Not
1: Binary Interpretation is Prime, Original is Not
0: Neither Original nor Binary Interpretation is Prime
Statistical Number Theory Perspective
This unexpected increase in prime density raises intriguing questions about the nature of the base encoding and its impact on number theory. It is possible that the encoding method selectively preserves or amplifies the presence of primes due to the unique properties of . Further research could explore whether this phenomenon is specific to base or if similar effects are observed with other irrational bases.
Conclusion and Future Directions
The exploration of base as a novel binary encoding method has uncovered intriguing and unexpected properties, particularly regarding prime density and integer packaging efficiency. While the practical applications may be limited due to the complexity of base arithmetic, the theoretical implications are compelling and warrant further investigation.
Summary of Python API
The Python API provided here leverages the mathematical properties of the golden ratio for encoding and decoding integers using a base representation. The key functions included in this API are as follows:
-
irradix(num): Converts a given integer into its base representation. This function handles the decomposition of the number into coefficients that correspond to powers of , returning a string that represents the number in base . -
derradix(rep): Decodes a base encoded string back into the original integer. It reconstructs the number by summing the products of the coefficients and powers of , with a ceiling operation applied at each step to ensure accuracy. -
encode(nums): Packs a list of arbitrarily-sized integers using the base encoding scheme. This function concatenates the encoded integers, handles padding, and converts the final sequence into an array of bytes. -
decode(chunks): Unpacks a sequence of encoded integers from an input of bytes, by reconstructing the original sequence, and decoding it back into the list of integers.