Jump to content

Talk:IEEE 754

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Possible modifications to the "ranges" table

[edit]

Here is a playground where I intend to suggest a slightly modified version of the table. As it stands right now there are some repetitions that can be avoided and some info that could be added. Have a bit of patience and I will have a suggestion in a few days. (Editing tables is a pain)

I want to add information about sub-normal numbers and compact some information. I will try to not make too many changes here, but instead make small edits in a personal sandbox and then larger updates here when I think a discussion could be useful.

I need to work around the unfortunate wrapping in some places and that some columns are unnecessarily wide.

Here is the current table:

Name Common name Base Significand digits[a] Decimal digits[b] Exponent bits log10 MAX Exponent bias[1] E min E max Notes
binary16 Half precision 2 11 3.31 5 4.51 24−1 = 15 −14 +15 Interchange
binary32 Single precision 2 24 7.22 8 38.23 27−1 = 127 −126 +127 Basic binary
binary64 Double precision 2 53 15.95 11 307.95 210−1 = 1023 −1022 +1023 Basic binary
binary128 Quadruple precision 2 113 34.02 15 4931.77 214−1 = 16383 −16382 +16383 Basic binary
binary256 Octuple precision 2 237 71.34 19 78913.2 218−1 = 262143 −262142 +262143 Interchange
decimal32 10 7 7 7.58 97 - 2.2·10-15 101 −95 +96 Interchange
decimal64 10 16 16 9.58 385 - 2.2·10-33 398 −383 +384 Basic decimal
decimal128 10 34 34 13.58 6145 - 2.2·10-69 6176 −6143 +6144 Basic decimal

Note that in the table above, the minimum exponents listed are for normal numbers; the special subnormal number representation allows even smaller numbers to be represented (with some loss of precision). For example, the smallest positive number that can be represented in binary64 is 2−1074; contributions to the −1074 figure include the E min value −1022 and all but one of the 53 significand bits (2−1022 − (53 − 1) = 2−1074).

Decimal digits is the precision of the format expressed in terms of an equivalent number of decimal digits. It is computed as digits × log10 base. E.g. binary128 has approximately the same precision as a 34 digit decimal number.

log10 MAX is a measure of the range of the encoding. Its integer part is the largest exponent shown on the output of a value in scientific notation with one leading digit in the significand before the decimal point (e.g. 1.698·1038 is near the largest value in binary32, 9.999999·1096 is the largest value in decimal32) Nsmeds (talk) 19:00, 13 September 2023 (UTC)[reply]

The binary log10 MAX round, or maybe truncate, to two digits after the decimal point. I think the decimal values should also do that. Gah4 (talk) 00:45, 14 September 2023 (UTC)[reply]
Yes, it is strange that the "log10 MAX" values for the decimal formats are much more accurate than the ones for the binary formats, but I'm not sure how this could be presented in a good way. — Vincent Lefèvre (talk) 10:51, 14 September 2023 (UTC)[reply]
Seems that the choices are 96.99 and 97.00. Either one is fine with me. For those who understand floating point enough to ask the question, either one will be fine. For those that don't, no value will help. As above, though. I think the article still needs to explain better the position of the radix point in the different formats. I had out for another question: "Alpha Architecture Handbook" which has the VAX formats in it. VAX uses 0 bits before the binary point, but 128 and 1024 bias. And the highest exponent value isn't special. Gah4 (talk) 03:46, 15 September 2023 (UTC)[reply]
It seems that some of the binary format values are rounded up, so rounding up to 97.00, etc., seems fair. Gah4 (talk) 03:54, 15 September 2023 (UTC)[reply]
When one doesn't specify, one generally rounds to the nearest. — Vincent Lefèvre (talk) 08:51, 16 September 2023 (UTC)[reply]
Yes. I am also wondering how many digits they should have. Three after the decimal point might be too many. Gah4 (talk) 09:39, 16 September 2023 (UTC)[reply]
The reason why the decimal formats has a higher accuracy in the table is simply because it is easy to express their values. I thought it better to write 97-2.2·10-15 than round it to 97.00. For the binary values there is no other way when expressing them in decimal format than rounding. But I will insert my suggested edited table now. Nsmeds (talk) 20:18, 20 September 2023 (UTC)[reply]

Suggestion for a revised table:

Significand Exponent Properties[c]
Name Common name Radix Digits[d] Decimal digits[e] Min Max Bias[1] MAXVAL log10 MAXVAL MINVAL>0 (normal) MINVAL>0 (subnorm) Notes
binary16 Half precision 2 11 3.31 -14 15 15 65504 4.8 6.10·10-5 5.96·10-8 Interchange
binary32 Single precision 2 24 7.22 -126 127 127 1.70·1038 38.5 1.18·10-38 1.40·10-45 Basic binary
binary64 Double precision 2 53 15.95 -1022 +1023 1023 8.99·10307 308.2 2.23·10-308 4.94·10-324 Basic binary
binary128 Quadruple precision 2 113 34.02 −16382 +16383 16383 5.95·104931 4932.0 3.36·10-4932 6.48·10-4966 Basic binary
binary256 Octuple precision 2 237 71.34 −262142 +262143 262143 1.61·1078193 78913.2 2.48·10-78913 2.25·10-78984 Interchange
decimal32 10 7 7 −95 +96 101 ≈1.0·1097 97 - 2.2·10-15 1·10-95 1·10-101 Interchange
decimal64 10 16 16 −383 +384 398 ≈1.0·10385 385 - 2.2·10-33 1·10-191 1·10-206 Basic decimal
decimal128 10 34 34 −6143 +6144 6176 ≈1.0·106145 6145 - 2.2·10-69 1·10-6143 1·10-6176 Basic decimal

Note that in the table above, the min exponent value listed is for normal binary numbers; the special subnormal number format allow for values in smaller magnitude to be represented, but at a loss of precision. The decimal format does not define a "subnormal" form of values as such, but numbers with a leading 0 in the mantissa and an exponent with the minimal value of the format can be seen as an analog to the subnormals of the binary formats.

Decimal digits is the precision of the format expressed in terms of an equivalent number of decimal digits. It is computed as digits × log10 base. Eg binary128 has approximately the same precision as a 34 digit decimal number.

log10 MAXVAL is a measure of the range of the encoding. Its integer part is the largest exponent shown on the output of a value in scientific notation with one leading digit in the significand before the decimal point (eg 1.698·1038 is near the largest value in binary32, 9.999999·1096 is the largest value in decimal32). The value in the table is rounded towards zero.

I would remove the column "Bias" for 2 reasons: 1) the bias is useful only when the encoding is described, while the encoding is ignored here; 2) its meaning depends on the radix: for the binary formats, the bias is related to the exponent e, and for the decimal formats, it is related to the exponent q (so, without detailed information, this is confusing). Also note that MOS:ABBR#Miscellanea says that one writes "e.g." (with periods, and not italicised). — Vincent Lefèvre (talk) 22:52, 20 September 2023 (UTC)[reply]
I agree with you, Vincent. I kept it to not make too many changes from the original table, but happy to remove it. Nsmeds (talk) 09:51, 21 September 2023 (UTC)[reply]
Better would be one that has actual meaning. But the way it is, it suggests to people that they don't understand it, so they should read the article more carefully. (At least that is what I did.) I suppose we should see what the standard says, though. Gah4 (talk) 23:09, 21 September 2023 (UTC)[reply]
It could be explained, but in a specific section on the encoding of the binary and decimal formats. Having the bias in this table is misleading (in addition to being useless for most readers), because its definition is different for the binary and the decimal formats (the standard gives it in two different tables: a table for the binary formats and a table for the decimal formats). — Vincent Lefèvre (talk) 23:21, 22 September 2023 (UTC)[reply]
Seems that it is worse than that. For binary, it is fine. There is one (hidden) bit before the binary point, and the bias gives the right value for the exponent. For decimal, the min/max work if there is one digit before the decimal point. But instead, it is defined with the decimal point to the right of the significand, and the different bias. Two different definitions at the same time. People reading the table now, will notice the inconsistent bias, and then read the article to find out why. (That is what I did a few days ago, even though I read it all before.) Since the standard allows for either the densely packed decimal or pure binary significand, it probably makes sense for the bias to be defined that way. It would help a lot, if the article just came out and said that. Until DFP gets more popular, though, there might not be so many interested in reading about it. Gah4 (talk) 21:58, 23 September 2023 (UTC)[reply]
The biased exponent depends on the unbiased exponent (e or q). For the decimal formats, the representation is not normalized, and for a given operation, the choice of the member of the set of the representations that give the considered value (this set is called a cohort) is done using the exponent q (because this is simpler and more natural). That's why the definition of the bias uses the exponent q for the decimal formats. — Vincent Lefèvre (talk) 23:17, 23 September 2023 (UTC)[reply]
For IBM S/360 and successors, HFP, prenormalization for add and subtract is done based on the exponents. Unnormalized values can be surprising. The Fortran AINT function works by adding 0 with a biased exponent of 7. At prenormalization, the other value is shifted to match the exponents, shifting the digit before the hexadecimal point into the guard digit. The post normalization shifts back. Digit past the hexadecimal point are lost, just as AINT requires. But not all do that. Multiply and divide prenormalize, shifting out left zeros. Gah4 (talk) 11:01, 24 September 2023 (UTC)[reply]


References

  1. ^ a b Cowlishaw, Mike. "Decimal Arithmetic Encodings" (PDF). IBM. Retrieved 6 August 2015.

sortability

[edit]

There is a recent edit noting that IEEE-754 values are sortable as sign-magnitude. I believe this is true for most sign-magnitude floating point formats, at least for normalized values when they can be unnormalized. (I am not sure about denormals, though.) The PDP-10 floating point format uses two's complement on the whole word for negative values, such that they are comparable using integer compare instructions. Not many processors supply a sign-magnitude compare operation, though. Gah4 (talk) 20:38, 10 November 2023 (UTC)[reply]

Do you mean that the PDP-10 two's complement also applied on the exponent field, i.e. changing the sign of the FP number would also change the encoding of the exponent? That's important to make the FP numbers comparable using integer compare instructions. — Vincent Lefèvre (talk) 00:33, 11 November 2023 (UTC)[reply]
Yes the whole word, including exponent. I suspect that hardware uncomplements it before using it. Maybe harder for humans to read, though. I am not sure what you mean by encoding of the exponent, but I believe that there can be carry into the exponent. Gah4 (talk) 03:14, 11 November 2023 (UTC)[reply]

It's really old

[edit]

It lacks many details for professionals.

It lacks simplicity as well.

They can't decide even audience after that many years.

It is really important. Boh39083 (talk) 05:03, 19 November 2023 (UTC)[reply]

This is hard to make sense of. Can you elaborate a bit, with more context and more specific details? –jacobolus (t) 03:43, 20 November 2023 (UTC)[reply]

decimal exponent

[edit]

I did a revert to a change on decimal exponent values. I believe it is right, because of the way they are defined, but I start this in case someone wants to discuss it, as I noted in the edit summary. Gah4 (talk) 17:20, 19 January 2024 (UTC)[reply]

The change was correct. The decimal exponent values got wrong in 1179557460 (but before that, there were already errors in some values for decimal64, which were introduced in the previous change 1179553657 by Nsmeds). I've corrected another value in 1197287634. — Vincent Lefèvre (talk) 22:17, 19 January 2024 (UTC)[reply]
OK, I am confused. What I see now isn't what I remember from the differences I saw before. There is always the question of the position of the decimal point, and I thought it was just that. In any case, we are discussing them, which is what I wanted. Gah4 (talk) 03:49, 20 January 2024 (UTC)[reply]
Concerning the position of the decimal point, this can make a difference of 15 or 16 in the exponent for decimal64, but here, this was a factor 2. — Vincent Lefèvre (talk) 13:53, 20 January 2024 (UTC)[reply]
Yes, I thought it was closer to 15, and now I see that it was a factor of 2. That is why I said I was confused about it. In any case, it is good to discuss here. Gah4 (talk) 23:22, 20 January 2024 (UTC)[reply]
I am sorry for not keeping in touch on this issue. There have been some other things keeping me occupied lately. Looking at the table as it stands today https://en.wikipedia.org/w/index.php?title=IEEE_754&oldid=1210204290 I am happy with the way it looks as regards the table on some useful properties of the IEEE-754 floating point formats. I went through my spreadsheet to compare FP formats and found a bug in the computations of estimated log10(MAXVAL) and have a public (LibreOffice) copy uploaded to my Google Drive share
I will update the table with adjusted values of the Decimal format log10(Maxval) estimates. I am not sure, but I think that is the issue you discussed in this thread? Nsmeds (talk) 17:13, 25 February 2024 (UTC)[reply]
This is what I think the best approx is to log10(MAXVAL)
Nsmeds (talk) 18:44, 25 February 2024 (UTC)[reply]
Yes, if there are p digits, then the correction term is
Vincent Lefèvre (talk) 03:21, 26 February 2024 (UTC)[reply]
Thanks for confirming. If you want to (and if you have access to LibreOffice/OpenOffice and dare to enable my macros), have a look at the spreadsheet. Some of the formats there are not officially accepted and/or necessarily correctly described, but I find it illuminating to feed in various research formats and see what comes out. :-) Nsmeds (talk) 08:39, 26 February 2024 (UTC)[reply]

Introduction to History

[edit]

@Nsmeds: Though I agree that an introduction in IEEE 754#History would be useful, there are several issues with what has been added in 1210457800, so that I'm going to revert this change (mainly because of the first point below):

  • First, this should be an introduction to the history of the standardization of floating-point arithmetic, not a history on FP arithmetic (there is an article Floating-point arithmetic on this larger subject, with its own history). So, here, it should just be explained what was before IEEE 754 and why standardization was needed.
  • In the added text, "non-enumerable" is pointless and misleading: The issue compared to integers is that the considered numbers cannot be represented exactly in general. This is the case even if you restrict to the computable real numbers (which form the subset of R that really matters in computing), which are enumerable (countable), but also just rational numbers.
  • Binary representation is not always used.
  • The last two sentences are not clear, but anyway, they are not related to the standardization (like the whole paragraph).
  • The image is missing, but anyway, it is not related to the standardization either.
  • Note that "mantissa" and "de-normal" are not the correct terms (there are also English and typographic mistakes).

BTW, in section History of Floating-point arithmetic, there is a paragraph on the standardization, which could serve as a basis:

Initially, computers used many different representations for floating-point numbers. The lack of standardization at the mainframe level was an ongoing problem by the early 1970s for those writing and maintaining higher-level source code; these manufacturer floating-point standards differed in the word sizes, the representations, and the rounding behavior and general accuracy of operations. Floating-point compatibility across multiple computing systems was in desperate need of standardization by the early 1980s, leading to the creation of the IEEE 754 standard once the 32-bit (or 64-bit) word had become commonplace. This standard was significantly based on a proposal from Intel, which was designing the i8087 numerical coprocessor; Motorola, which was designing the 68000 around the same time, gave significant input as well.

Vincent Lefèvre (talk) 01:54, 27 February 2024 (UTC)[reply]

@Vincent Lefèvre I agree with your comments. Hopefully there is someone with us able to volunteer a more suitable introductory paragraph? Nsmeds (talk) 12:50, 27 February 2024 (UTC)[reply]


Cite error: There are <ref group=lower-alpha> tags or {{efn}} templates on this page, but the references will not show without a {{reflist|group=lower-alpha}} template or {{notelist}} template (see the help page).