Talk:IEEE 754

This is the talk page for discussing improvements to the IEEE 754 article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Archives: 1: 365 days

Computing High‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
High	This article has been rated as High-importance on the project's importance scale.

Computer science High‑importance

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles

High

This article has been rated as High-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Possible modifications to the "ranges" table

Here is a playground where I intend to suggest a slightly modified version of the table. As it stands right now there are some repetitions that can be avoided and some info that could be added. Have a bit of patience and I will have a suggestion in a few days. (Editing tables is a pain)

I want to add information about sub-normal numbers and compact some information. I will try to not make too many changes here, but instead make small edits in a personal sandbox and then larger updates here when I think a discussion could be useful.

I need to work around the unfortunate wrapping in some places and that some columns are unnecessarily wide.

Here is the current table:

Name	Common name	Base	Significand digits^[a]	Decimal digits^[b]	Exponent bits	log₁₀ MAX	Exponent bias^[1]	E min	E max	Notes
binary16	Half precision	2	11	3.31	5	4.51	2⁴−1 = 15	−14	+15	Interchange
binary32	Single precision	2	24	7.22	8	38.23	2⁷−1 = 127	−126	+127	Basic binary
binary64	Double precision	2	53	15.95	11	307.95	2¹⁰−1 = 1023	−1022	+1023	Basic binary
binary128	Quadruple precision	2	113	34.02	15	4931.77	2¹⁴−1 = 16383	−16382	+16383	Basic binary
binary256	Octuple precision	2	237	71.34	19	78913.2	2¹⁸−1 = 262143	−262142	+262143	Interchange
decimal32		10	7	7	7.58	97 - 2.2·10^-15	101	−95	+96	Interchange
decimal64		10	16	16	9.58	385 - 2.2·10^-33	398	−383	+384	Basic decimal
decimal128		10	34	34	13.58	6145 - 2.2·10^-69	6176	−6143	+6144	Basic decimal

Note that in the table above, the minimum exponents listed are for normal numbers; the special subnormal number representation allows even smaller numbers to be represented (with some loss of precision). For example, the smallest positive number that can be represented in binary64 is 2⁻¹⁰⁷⁴; contributions to the −1074 figure include the E min value −1022 and all but one of the 53 significand bits (2^{−1022 − (53 − 1)} = 2⁻¹⁰⁷⁴).

Decimal digits is the precision of the format expressed in terms of an equivalent number of decimal digits. It is computed as digits × log₁₀ base. E.g. binary128 has approximately the same precision as a 34 digit decimal number.

log₁₀ MAX is a measure of the range of the encoding. Its integer part is the largest exponent shown on the output of a value in scientific notation with one leading digit in the significand before the decimal point (e.g. 1.698·10³⁸ is near the largest value in binary32, 9.999999·10⁹⁶ is the largest value in decimal32) Nsmeds (talk) 19:00, 13 September 2023 (UTC)[reply]

The binary log10 MAX round, or maybe truncate, to two digits after the decimal point. I think the decimal values should also do that. Gah4 (talk) 00:45, 14 September 2023 (UTC)[reply]

Yes, it is strange that the "log₁₀ MAX" values for the decimal formats are much more accurate than the ones for the binary formats, but I'm not sure how this could be presented in a good way. — Vincent Lefèvre (talk) 10:51, 14 September 2023 (UTC)[reply]

Seems that the choices are 96.99 and 97.00. Either one is fine with me. For those who understand floating point enough to ask the question, either one will be fine. For those that don't, no value will help. As above, though. I think the article still needs to explain better the position of the radix point in the different formats. I had out for another question: "Alpha Architecture Handbook" which has the VAX formats in it. VAX uses 0 bits before the binary point, but 128 and 1024 bias. And the highest exponent value isn't special. Gah4 (talk) 03:46, 15 September 2023 (UTC)[reply]

It seems that some of the binary format values are rounded up, so rounding up to 97.00, etc., seems fair. Gah4 (talk) 03:54, 15 September 2023 (UTC)[reply]

When one doesn't specify, one generally rounds to the nearest. — Vincent Lefèvre (talk) 08:51, 16 September 2023 (UTC)[reply]

Yes. I am also wondering how many digits they should have. Three after the decimal point might be too many. Gah4 (talk) 09:39, 16 September 2023 (UTC)[reply]

The reason why the decimal formats has a higher accuracy in the table is simply because it is easy to express their values. I thought it better to write 97-2.2·10^-15 than round it to 97.00. For the binary values there is no other way when expressing them in decimal format than rounding. But I will insert my suggested edited table now. Nsmeds (talk) 20:18, 20 September 2023 (UTC)[reply]

Suggestion for a revised table:

			Significand		Exponent			Properties^[c]
Name	Common name	Radix	Digits^[d]	Decimal digits^[e]	Min	Max	Bias^[1]	MAXVAL	log₁₀ MAXVAL	MINVAL>0 (normal)	MINVAL>0 (subnorm)	Notes
binary16	Half precision	2	11	3.31	-14	15	15	65504	4.8	6.10·10^-5	5.96·10^-8	Interchange
binary32	Single precision	2	24	7.22	-126	127	127	1.70·10³⁸	38.5	1.18·10^-38	1.40·10^-45	Basic binary
binary64	Double precision	2	53	15.95	-1022	+1023	1023	8.99·10³⁰⁷	308.2	2.23·10^-308	4.94·10^-324	Basic binary
binary128	Quadruple precision	2	113	34.02	−16382	+16383	16383	5.95·10⁴⁹³¹	4932.0	3.36·10^-4932	6.48·10^-4966	Basic binary
binary256	Octuple precision	2	237	71.34	−262142	+262143	262143	1.61·10⁷⁸¹⁹³	78913.2	2.48·10^-78913	2.25·10^-78984	Interchange
decimal32		10	7	7	−95	+96	101	≈1.0·10⁹⁷	97 - 2.2·10^-15	1·10^-95	1·10^-101	Interchange
decimal64		10	16	16	−383	+384	398	≈1.0·10³⁸⁵	385 - 2.2·10^-33	1·10^-191	1·10^-206	Basic decimal
decimal128		10	34	34	−6143	+6144	6176	≈1.0·10⁶¹⁴⁵	6145 - 2.2·10^-69	1·10^-6143	1·10^-6176	Basic decimal

Note that in the table above, the min exponent value listed is for normal binary numbers; the special subnormal number format allow for values in smaller magnitude to be represented, but at a loss of precision. The decimal format does not define a "subnormal" form of values as such, but numbers with a leading 0 in the mantissa and an exponent with the minimal value of the format can be seen as an analog to the subnormals of the binary formats.

Decimal digits is the precision of the format expressed in terms of an equivalent number of decimal digits. It is computed as digits × log₁₀ base. Eg binary128 has approximately the same precision as a 34 digit decimal number.

log₁₀ MAXVAL is a measure of the range of the encoding. Its integer part is the largest exponent shown on the output of a value in scientific notation with one leading digit in the significand before the decimal point (eg 1.698·10³⁸ is near the largest value in binary32, 9.999999·10⁹⁶ is the largest value in decimal32). The value in the table is rounded towards zero.

I would remove the column "Bias" for 2 reasons: 1) the bias is useful only when the encoding is described, while the encoding is ignored here; 2) its meaning depends on the radix: for the binary formats, the bias is related to the exponent e, and for the decimal formats, it is related to the exponent q (so, without detailed information, this is confusing). Also note that MOS:ABBR#Miscellanea says that one writes "e.g." (with periods, and not italicised). — Vincent Lefèvre (talk) 22:52, 20 September 2023 (UTC)[reply]

I agree with you, Vincent. I kept it to not make too many changes from the original table, but happy to remove it. Nsmeds (talk) 09:51, 21 September 2023 (UTC)[reply]

Better would be one that has actual meaning. But the way it is, it suggests to people that they don't understand it, so they should read the article more carefully. (At least that is what I did.) I suppose we should see what the standard says, though. Gah4 (talk) 23:09, 21 September 2023 (UTC)[reply]

It could be explained, but in a specific section on the encoding of the binary and decimal formats. Having the bias in this table is misleading (in addition to being useless for most readers), because its definition is different for the binary and the decimal formats (the standard gives it in two different tables: a table for the binary formats and a table for the decimal formats). — Vincent Lefèvre (talk) 23:21, 22 September 2023 (UTC)[reply]

Seems that it is worse than that. For binary, it is fine. There is one (hidden) bit before the binary point, and the bias gives the right value for the exponent. For decimal, the min/max work if there is one digit before the decimal point. But instead, it is defined with the decimal point to the right of the significand, and the different bias. Two different definitions at the same time. People reading the table now, will notice the inconsistent bias, and then read the article to find out why. (That is what I did a few days ago, even though I read it all before.) Since the standard allows for either the densely packed decimal or pure binary significand, it probably makes sense for the bias to be defined that way. It would help a lot, if the article just came out and said that. Until DFP gets more popular, though, there might not be so many interested in reading about it. Gah4 (talk) 21:58, 23 September 2023 (UTC)[reply]

The biased exponent depends on the unbiased exponent (e or q). For the decimal formats, the representation is not normalized, and for a given operation, the choice of the member of the set of the representations that give the considered value (this set is called a cohort) is done using the exponent q (because this is simpler and more natural). That's why the definition of the bias uses the exponent q for the decimal formats. — Vincent Lefèvre (talk) 23:17, 23 September 2023 (UTC)[reply]

For IBM S/360 and successors, HFP, prenormalization for add and subtract is done based on the exponents. Unnormalized values can be surprising. The Fortran AINT function works by adding 0 with a biased exponent of 7. At prenormalization, the other value is shifted to match the exponents, shifting the digit before the hexadecimal point into the guard digit. The post normalization shifts back. Digit past the hexadecimal point are lost, just as AINT requires. But not all do that. Multiply and divide prenormalize, shifting out left zeros. Gah4 (talk) 11:01, 24 September 2023 (UTC)[reply]

References

^ ^a ^b Cowlishaw, Mike. "Decimal Arithmetic Encodings" (PDF). IBM. Retrieved 6 August 2015.

sortability

There is a recent edit noting that IEEE-754 values are sortable as sign-magnitude. I believe this is true for most sign-magnitude floating point formats, at least for normalized values when they can be unnormalized. (I am not sure about denormals, though.) The PDP-10 floating point format uses two's complement on the whole word for negative values, such that they are comparable using integer compare instructions. Not many processors supply a sign-magnitude compare operation, though. Gah4 (talk) 20:38, 10 November 2023 (UTC)[reply]

Do you mean that the PDP-10 two's complement also applied on the exponent field, i.e. changing the sign of the FP number would also change the encoding of the exponent? That's important to make the FP numbers comparable using integer compare instructions. — Vincent Lefèvre (talk) 00:33, 11 November 2023 (UTC)[reply]

Yes the whole word, including exponent. I suspect that hardware uncomplements it before using it. Maybe harder for humans to read, though. I am not sure what you mean by encoding of the exponent, but I believe that there can be carry into the exponent. Gah4 (talk) 03:14, 11 November 2023 (UTC)[reply]

It's really old

It lacks many details for professionals.

It lacks simplicity as well.

They can't decide even audience after that many years.

It is really important. Boh39083 (talk) 05:03, 19 November 2023 (UTC)[reply]

This is hard to make sense of. Can you elaborate a bit, with more context and more specific details? –jacobolus (t) 03:43, 20 November 2023 (UTC)[reply]

decimal exponent

I did a revert to a change on decimal exponent values. I believe it is right, because of the way they are defined, but I start this in case someone wants to discuss it, as I noted in the edit summary. Gah4 (talk) 17:20, 19 January 2024 (UTC)[reply]

The change was correct. The decimal exponent values got wrong in 1179557460 (but before that, there were already errors in some values for decimal64, which were introduced in the previous change 1179553657 by Nsmeds). I've corrected another value in 1197287634. — Vincent Lefèvre (talk) 22:17, 19 January 2024 (UTC)[reply]

OK, I am confused. What I see now isn't what I remember from the differences I saw before. There is always the question of the position of the decimal point, and I thought it was just that. In any case, we are discussing them, which is what I wanted. Gah4 (talk) 03:49, 20 January 2024 (UTC)[reply]

Concerning the position of the decimal point, this can make a difference of 15 or 16 in the exponent for decimal64, but here, this was a factor 2. — Vincent Lefèvre (talk) 13:53, 20 January 2024 (UTC)[reply]

Yes, I thought it was closer to 15, and now I see that it was a factor of 2. That is why I said I was confused about it. In any case, it is good to discuss here. Gah4 (talk) 23:22, 20 January 2024 (UTC)[reply]

I am sorry for not keeping in touch on this issue. There have been some other things keeping me occupied lately. Looking at the table as it stands today https://en.wikipedia.org/w/index.php?title=IEEE_754&oldid=1210204290 I am happy with the way it looks as regards the table on some useful properties of the IEEE-754 floating point formats. I went through my spreadsheet to compare FP formats and found a bug in the computations of estimated log10(MAXVAL) and have a public (LibreOffice) copy uploaded to my Google Drive share

I will update the table with adjusted values of the Decimal format log10(Maxval) estimates. I am not sure, but I think that is the issue you discussed in this thread? Nsmeds (talk) 17:13, 25 February 2024 (UTC)[reply]

This is what I think the best approx is to log10(MAXVAL)

{\alpha =1+maxexp},{\beta =maxexp-digits}

log_{10}(10^{\alpha }-10^{\beta })=\alpha +log_{10}(1-10^{\beta -\alpha })=\alpha +{{log_{10}(1-10^{\beta -\alpha })} \over {ln(10)}}=

=\lbrace {\alpha >\beta >0}\rbrace ={\alpha -{{10^{\beta -\alpha }} \over {ln(10)}}}+\Omega ({10^{2\cdot (\beta -\alpha )})}=

={\alpha -log_{10}(e)\cdot 10^{\beta -\alpha }}={\alpha -10\cdot log_{10}(e)\cdot 10^{\beta -\alpha -1}}\approx {\alpha -4.34\cdot 10^{\beta -\alpha -1}}

={(1+maxexp)-4.34\cdot 10^{-(1+digits)}}

Nsmeds (talk) 18:44, 25 February 2024 (UTC)[reply]

Yes, if there are p digits, then the correction term is

\ln(1-10^{-p})/\ln(10)\approx -10^{-p}/\ln(10)\approx -4.34\cdot 10^{-p-1}

— Vincent Lefèvre (talk) 03:21, 26 February 2024 (UTC)[reply]

Thanks for confirming. If you want to (and if you have access to LibreOffice/OpenOffice and dare to enable my macros), have a look at the spreadsheet. Some of the formats there are not officially accepted and/or necessarily correctly described, but I find it illuminating to feed in various research formats and see what comes out. :-) Nsmeds (talk) 08:39, 26 February 2024 (UTC)[reply]

Introduction to History

@Nsmeds: Though I agree that an introduction in IEEE 754#History would be useful, there are several issues with what has been added in 1210457800, so that I'm going to revert this change (mainly because of the first point below):

First, this should be an introduction to the history of the standardization of floating-point arithmetic, not a history on FP arithmetic (there is an article Floating-point arithmetic on this larger subject, with its own history). So, here, it should just be explained what was before IEEE 754 and why standardization was needed.
In the added text, "non-enumerable" is pointless and misleading: The issue compared to integers is that the considered numbers cannot be represented exactly in general. This is the case even if you restrict to the computable real numbers (which form the subset of $R$ that really matters in computing), which are enumerable (countable), but also just rational numbers.
Binary representation is not always used.
The last two sentences are not clear, but anyway, they are not related to the standardization (like the whole paragraph).
The image is missing, but anyway, it is not related to the standardization either.
Note that "mantissa" and "de-normal" are not the correct terms (there are also English and typographic mistakes).

BTW, in section History of Floating-point arithmetic, there is a paragraph on the standardization, which could serve as a basis:

Initially, computers used many different representations for floating-point numbers. The lack of standardization at the mainframe level was an ongoing problem by the early 1970s for those writing and maintaining higher-level source code; these manufacturer floating-point standards differed in the word sizes, the representations, and the rounding behavior and general accuracy of operations. Floating-point compatibility across multiple computing systems was in desperate need of standardization by the early 1980s, leading to the creation of the IEEE 754 standard once the 32-bit (or 64-bit) word had become commonplace. This standard was significantly based on a proposal from Intel, which was designing the i8087 numerical coprocessor; Motorola, which was designing the 68000 around the same time, gave significant input as well.

— Vincent Lefèvre (talk) 01:54, 27 February 2024 (UTC)[reply]

@Vincent Lefèvre I agree with your comments. Hopefully there is someone with us able to volunteer a more suitable introductory paragraph? Nsmeds (talk) 12:50, 27 February 2024 (UTC)[reply]

Cite error: There are <ref group=lower-alpha> tags or {{efn}} templates on this page, but the references will not show without a {{reflist|group=lower-alpha}} template or {{notelist}} template (see the help page).

[DAE-3] Cowlishaw, Mike. "Decimal Arithmetic Encodings" (PDF). IBM. Retrieved 6 August 2015.

[a]

[b]

[1]

[c]

[d]

[e]