from Hacker News

Python rounds float values by converting them to string and then back

by bishala on 8/28/19, 4:38 AM with 144 comments

by coldtea on 8/28/19, 10:58 AM
Seems to be one of the best ways to go about it.
From the comment in protobuf source (which does the same thing as Python), mentioned in the Twitter thread:
(...) An arguably better strategy would be to use the algorithm described in "How to Print Floating-Point Numbers Accurately" by Steele & White, e.g. as implemented by David M. Gay's dtoa(). It turns out, however, that the following implementation is about as fast as DMG's code. Furthermore, DMG's code locks mutexes, which means it will not scale well on multi-core machines. DMG's code is slightly more accurate (in that it will never use more digits than necessary), but this is probably irrelevant for most users.
Rob Pike and Ken Thompson also have an implementation of dtoa() in third_party/fmt/fltfmt.cc. Their implementation is similar to this one in that it makes guesses and then uses strtod() to check them. (...)
https://github.com/protocolbuffers/protobuf/blob/ed4321d1cb3...
by fs111 on 8/28/19, 10:21 AM
Apples libc used to shell-out to perl in a function: https://github.com/Apple-FOSS-Mirror/Libc/blob/2ca2ae7464771...
by Noe2097 on 8/28/19, 11:27 AM
Well, the problem is precisely that rounding as it is generally conceived, is expressed in base 10 - as we generally conceive numbers including floating point ones in base 10. Yet at the lowest level, the representation of numbers is in base 2, including floating point ones. It is imaginable, would be more correct and efficient to perform rounding (or flooring or ceiling, for that matter) in base 2, but it would be that more difficult to comprehend when dealing with non integers in code. Rounding in base 10 needs some form of conversion anyway, going for the string is one way that is, at least, readable (pun intended).
by bhouston on 8/28/19, 11:34 AM
In my experience there are few things slower that float to string and string to float. And it seems so unnecessary.
I always implemented round to a specific digit based on the built-in roundss/roundsd functions which are native x86-64 assembler instructions (i.e. https://www.felixcloutier.com/x86/roundsd).
I do not understand why this would not be preferable to the string method.
float round( float x, int digits, int base) { float factor = pow( base, digits ); return roundss( x * factor ) / factor; }
I guess this has the effect of not working for numbers near the edge of it's range.
One could check this and fall back to the string method. Or alternatively use higher precision doubles internally:
float round( float x, int digits, int base ) { double factor = pow( base, digits ); return (float)( roundsd( x * factor ) / factor ); }
But then what do you do if you have a double rounded and want to maintain all precision? I think there is likely some way to do that by somehow unpacking the double into a manual mantissa and exponent each of which are doubles and doing this manually - or maybe using some type of float128 library (https://www.boost.org/doc/libs/1_63_0/libs/multiprecision/do...)...
But changing this implementation now could cause slight differences and if someone was rounding then hashing this type of changes could be horrible if not behind some type of opt-in.
by bishala on 8/28/19, 4:39 AM
Related thread on Twitter https://twitter.com/whitequark/status/1164395585056604160
by shellac on 8/28/19, 11:03 AM
OpenJDK BigDecimal::doubleValue() goes via a string in certain situations https://github.com/openjdk/jdk/blob/master/src/java.base/sha...
by latchkey on 8/28/19, 10:10 AM
https://0.30000000000000004.com/
by zelly on 8/28/19, 11:58 AM
This is what we are promised will make trucks drive themselves and usher in the 4th industrial revolution.
by analog31 on 8/28/19, 12:27 PM
My quick impression is that the choice of a rounding algorithm is relative to the purpose that it serves. For instance, floor(x + 0.5) is good enough in many applications.
In some cases, rounding is performed for the primary purpose of displaying a number as a string, in which case it can't be any less complicated than the string conversion function itself.
by jancsika on 8/28/19, 3:43 PM
A bit on topic...
Is there a phrase for the ratio between the frequency of an apparent archetype of a bug/feature and the real-world occurrences of said bug/feature? If not then perhaps the "Fudderson-Hypeman ratio" in honor of its namesakes.
For example, I'm sure every C programmer on here has their favored way to quickly demo what bugs may come from C's null-delimited strings. But even though C programmers are quick to cite that deficiency, I'd bet there's a greater occurrence of C string bugs in the wild. Thus we get a relatively low Fudderson-Hypeman ratio.
On the other hand: "0.1 + 0.2 != 0.3"? I'm just thinking back through the mailing list and issue tracker for a realtime DSP environment that uses single-precision floats exclusively as the numeric data type. My first approximation is that there are significantly more didactic quotes of that example than reports of problems due to the class of bugs that archetype represents.
Does anyone have some real-world data to trump my rank speculation? (Keep in mind that simply replying with more didactic examples will raise the Fudderson-Hypeman ratio.)
by d--b on 8/28/19, 12:03 PM
Note that there is a fallback version that doesn't use strings. This is definitely something that's been thought through.
by ericfrederich on 8/28/19, 1:27 PM
I was looking once at Python and Redis and how numbers get stored. I remember Python would in the end send Redis some strings. I dove pretty deep and found that Python floats when turned into a string and then back are exactly the same float.
I remember even writing a program that tested every possible floating point number (must have only been 32 bit). I think I used ctypes and interpreted every binary combination of 32 bits as a float, turned it into a string, then back and checked equality. A lot of them were NaN.
by deckar01 on 8/28/19, 1:23 PM
`blob/master` isn't a suitable permalink. Use the first few letters of the commit hash so the line numbers and code are still relevant when this file inevitably gets modified.
by ChrisSD on 8/28/19, 10:15 AM
Maybe I'm missing something but what's wrong with rounding floats this way?
by dahart on 8/28/19, 5:08 PM
Not entirely unlike how one of the better ways to deep-copy a JSON object in Javascript is json.parse(json.stringify(obj))
by kstenerud on 8/29/19, 5:46 AM
This is where decimal floating point really shines. Since the exponential portion is base 10, it's trivially easy to round the mantissa.
The only silly part of ieee754 2008 is the fact that they specified two representations (DPD, championed by IBM, and BID, championed by Intel) with no way to tell them apart.
by science404 on 8/28/19, 1:59 PM
Misleading title is misleading...
CPython rounds float values by converting them to string and then back
by Jenz on 8/28/19, 10:07 AM
I dunno, how efficient is this?
by acoye on 8/28/19, 3:59 PM
Another pragmatic aspect of Python as I see it.
by seamyb88 on 8/28/19, 4:14 PM
Am I the only one grimacing at the lack of curlies around if/else scope? Just good practice!