JSON's Numeric Boundaries: The Lesser-Known Reality of Inaccurate Figures

Exploring the Complexities of IEEE 754 Floating Point Arithmetic in JSON Numbers

Introductions

JSON (JavaScript Object Notation) has become a cornerstone in the world of data exchange, particularly in web applications. Its appeal lies in its simplicity and readability, making it the preferred choice for developers worldwide. However, when it comes to dealing with numbers, JSON's approach is somewhat less straightforward than it initially appears.

Guess what happens when you run this JavaScript code?

const x = 9223372036854775807
console.log(x)

You might be in for an amazing surprise!

JSON Specification

The JSON standard, as defined in RFC 8259 describe Numbers as follow:

A number is represented in base 10 using decimal digits. It contains an integer component that may be prefixed with an optional minus sign, which may be followed by a fraction part and/or an exponent part. Leading zeros are not allowed. .... This specification allows implementations to set limits on the range and precision of numbers accepted.

This simplicity can lead to inconsistencies when exchanging data across different programming languages. Each language interprets numbers slightly differently, which can affect both the size and precision of these values. It's like translating the same phrase into multiple languages - the core idea remains, but the details might vary slightly from one language to another.

For instance, a number that is perfectly valid and accurately represented in one language could be misinterpreted or result in precision errors in another. This is particularly relevant when dealing with large numbers or numbers that require high precision, such as financial calculations.

The pitfall

A classic example is the interaction between a Go backend and JavaScript frontend using JSON as the data interchange format.

In Go, an int64 type is often used to handle large integers, accommodating a wide range of values securely. However, when this int64 number is passed to a JavaScript frontend via JSON, the waters get murky. JavaScript represents numbers using the IEEE 754 double-precision floating-point format. This format has limitations, especially with integers larger than Number.MAX_SAFE_INTEGER in JavaScript, which is 253 -1.

To illustrate the potential pitfalls of JSON's numeric handling, let's consider a concrete example involving a large number. Imagine you have a Go backend that needs to send a large integer value to a JavaScript frontend. The JSON representation of this data might look like this:

{ "bigNumber": 9223372036854775807 } // Max int64 in Go

This number the maximum integer that can be represented by int64 in Go. When this JSON is processed by a JavaScript frontend, issues can arise. Let's look at what happens when this JSON is parsed in JavaScript:

const jsonData = '{"bigNumber": 9223372036854775807}'; 
const parsedData = JSON.parse(jsonData);
// 9223372036854776000
console.log(parsedData.bigNumber);

In JavaScript, due to its handling of numbers as IEEE 754 double-precision floating-point values, the bigNumber did not retain its original precision. The 9223372036854775807 passed from backend will get imprecisely parsed as 9223372036854776000.

Mechanics of IEEE 754

IEEE 754 is a technical standard describing the technical detail of floating-point arithmetic.

In the standard for double precision (64-bit format), floating-point numbers are represented in three parts:

  • S - Sign Bit (1 bit): Indicates whether the number is positive (0) or negative (1).

  • E - Exponent (11 bits): Determines the range of the number, with a bias of 1023. This bias is subtracted from the stored exponent to obtain the actual exponent value.

  • M - Mantissa (52 bits): Represents the precision of the number. There's an implied leading 1 in the mantissa for normalized numbers, which is not stored explicitly. Also known as Significand.

The formula to represent a number in this format is:

$$(−1)^S×1.M×2^{E−1023}$$

The Exponent in a floating-point number is somewhat like a slider that enables the representation of either extremely large or incredibly small numbers. With the 11-bit exponent in the IEEE 754 double-precision format, you can cover a vast range of magnitudes, from exceptionally large to minuscule numbers.

However, there's a catch: precision. Precision is akin to the level of detail a number can possess. When the exponent is used for a massive number, the detail—or precision—of that number decreases. This is because, in a fixed-size format like double precision, increasing the exponent's value (to represent a larger number) means there's less space for the mantissa to display the detailed portion of the number.

The same issue occurs for extremely small numbers. The IEEE 754 format can represent numbers incredibly close to zero, but once again, the precision is limited. The smaller the number, the less precision it can have in its fractional part.

For Example, value of 1 in IEEE 754 format :

// For beverity, only first and last 4 bit are shown in mantissa part
| sign  |  exponent   |   mantissa   |
    0     01111111111    0000...0000
= 1

The sign bit 0 indicate it is a positive number. The exponent bit 01111111 represents an actual exponent of 0 (after adjusting for bias of -1023). The Mantissa 0000....0000 represents value of 1.0

Let's take a look at Number.MAX_SAFE_INTEGER value in IEEE 754 format

| sign  |  exponent   |   mantissa   |
    0     10000110011    1111...1111
= 9007199254740991 (Number.MAX_SAFE_INTEGER)

The exponent value 10000110011 represents actual exponent of 52 (1075 - 1023). The Mantissa 1111...1111 (all set) represents value of 1.1111111111111111111111111111111111111111111111111111.

Using the formula presented earlier, we can get decode the IEEE 754 representation and calculate the actual value of Number.MAX_SAFE_INTEGER

$$1.1111111111111111111111111111111111111111111111111111×2^{52}$$

Notice that all 52 bits for Mantissa part of the representation are set. So what happens if we want to represent value of Number.MAX_SAFE_INTEGER + 1 in IEEE 754 format?

Lets add 1 to the previous value of Number.MAX_SAFE_INTEGER in IEEE 754 format. All we do is add 1 to the Mantissa and arrive at the following:

| sign  |  exponent   |   mantissa   |
    0     10000110100    0000...0000
= 9007199254740992 (Number.MAX_SAFE_INTEGER + 1)

Observe how the bit is carried forward and the exponent becomes 10000110100, with an actual exponent value of 53 (1076-1023). Calculating 2^53 is straightforward, and we arrive at the value of 9007199254740992 since the Mantissa is 1.0.

The next representable number in IEEE 754 format is 9007199254740994, with the binary format as follows:

| sign  |  exponent   |   mantissa   |
    0     10000110100    0000...0001
= 9007199254740994 (Number.MAX_SAFE_INTEGER + 3)

Now you might ask, what has changed? Look closely at how the actual exponent value has increased. This means that we've shifted the overall scaling of the formula by 2. As a result, we've lost the ability to represent the number Number.MAX_SAFE_INTEGER + 2 (9007199254740993).

We can verify this by running the following code in JavaScript

// 9007199254740991
console.log(Number.MAX_SAFE_INTEGER);
// 9007199254740992
console.log(Number.MAX_SAFE_INTEGER + 1);
console.log(Number.MAX_SAFE_INTEGER + 2);
// 9007199254740994
console.log(Number.MAX_SAFE_INTEGER + 3);

I've found IEEE-754 Analysis by Dr. Christopher to be useful in visualising effect of Exponent on precision.

Conclusion

While JSON simplifies data exchange with its straightforward number representation, it also brings challenges, particularly in cross-language data handling. The key issue arises from different programming languages interpreting numeric values differently, affecting their size and precision. This is evident in interactions between Go and JavaScript, where Go's int64 can lose precision in JavaScript due to its IEEE 754 double-precision floating-point format.

Understanding the limitations of IEEE 754 is crucial, especially for high-precision needs like financial computations. Developers must be aware of these differences and consider alternative strategies, such as string representations for large numbers or specialized libraries, to ensure data integrity.

Ultimately, this highlights the importance of a nuanced understanding of the tools and languages we use, emphasizing precision and accuracy in software engineering.