An appeal for correct, capable, future-proof math in nascent programming languages

Let me start by saying that, the math in most programming languages is WRONG.  In most cases, there is little to be done about this, except perhaps tack on some new classes which allow new code to be written to fix it.

However, nascent languages, like Rust, which I’ll mainly address here, have an opportunity to do better. I believe they SHOULD do better, and so I’m taking a little time, into the small hours, to attempt to make the case. Let’s get started then, bearing in mind that, in the small hours, this won’t be as well-edited, or even well-reasoned, as one might hope ;)

The wrongness of float

Floating point numbers are commonly thought of as the “normal” way to do math with real numbers using a computer.

The main thing to understand here is that floating point numbers, like register-limited integers, are a lossy optimisation, and that lossy optimisations should not be used by default, just as your compiler should not eliminate variables by default, if it isn’t sure that they aren’t needed.  It is WRONG, and even dangerous, in a “people jumping out of windows because of financial errors” kind of way, for currency.  It is WRONG for science.  I’d argue (albeit, a little tongue-in-cheek) that it’s morally wrong for children who pick up a programming language, hoping to use it for basic enquiries about life, like “how many planets are there in the known universe?”. Those are things that a computer can help with, and those are things that I genuinely believe a modern, practical, capable programming language like Rust should aim to support too — by default.

Really, floats are HORRIBLE. They’re a format that we’ve all come to accept, probably because most of us didn’t know a better way when we were first introduced to programming. The only benefit they really offer is speed. Granted, that’s a big one. But is it is big enough to make them the default, or to even drop arbitrary precision math support in favour of? Is it really something we should optimise BY DEFAULT? Consider that most compilers do not enable any other lossy optimisations by default. In GCC, for example, -ffast-math is a separate optimisation. -funsafe-math is a separate optimisation. Why is float the default?

The wrongness of machine ints

For me, a very similar problem applies to ints. I’ve seen acceptance of this idea in the Rust community too, that bigInts should be used by default, or at least in overflow cases, rather than allowing machine-sized ints to overflow. I would very much agree. If I ask my programming language to raise a large number by another large number, I do NOT want it to tell me that it’s run out of space — at 32 or 64 or 128 BITS, on a machine with many GB. We’re all guilty of accepting this status quo, but frankly, it seems ridiculous to me, that this is the norm.

Why should anyone care?

You might wonder why I’m bringing this up; why I’m making waves about this. It’s not just a matter of principle, or first impressions of the language (though I think those matter). It’s also a matter of practicality: practicality in terms of features provided by the language, AND practically in terms of flexibility. For those looking to reject this idea on the basis of optimisation: consider that abstracting away from 32-bit integers or 64-bit floats, or 256-bit ints, can potentially lead to better optimisation, if the compiler can then figure out that it’s on a 256-bit machine (or indeed, is targeting a 256-bit GPU) and is free to optimise accordingly.

Practically speaking, I’ve personally considered implementing a datetime library in rust that would cover all astronomical timescales to picosecond accuracy or more. This is very achievable, and, I believe, quite practical, when compared with the complexities of dealing with 1 BC in normal datetime libraries. The math involved in such things is not always computationally extreme, but it IS tedious and time-consuming to implement, especially when it should be a solved issue.  Since it is so tedious and time-consuming, and since there are so many potential variations in implementation, I believe it should be implemented ONCE, in the standard library, saving all users from the burden of implementing it themselves, or trying to make to incompatible libraries work together.

If Rust did this, it could potentially have the best datetime library of any language, which also used types compatible with many other Rust libraries, rather than using custom types.  That alone would potentially open the floodgates for many developers who in fields like history, sciences like astronomy, geology, linguistics, etc.  In most programming languages, handling dates before 1970, much less, before 0 BC, is incredibly complex: there are issues like string representations, timezones, exact birthdates of historical figures, orbits around the sun, alignment with planets, leap years, leap seconds, lunar calendars, solar calendars, and even relativity to consider!  However, with the right mathematical foundations in place, Rust could develop important capabilities like these on top, becoming a very welcoming language for science, finance, historical research, modern fields like big data analytics, etc., simply because it makes an effort to include these communities and their needs.

The Cobra language has implemented arbitrary precision decimal math by default, leaving float as an optimisation. I think Rust follow this lead, and take it further, doing the same for ints.

There’s really no harm to having slower, more correct math by default; most people understand and choose between integer/float types when they care anyway. BUT, having the language be correct by default, and have very capable numeric types by default, is a big win, both in capabilities, and in the language’s “feel” of professionalism and potential.

Rust proposal:

* real becomes, not a 32-bit float (as I understand it currently is), but what it sounds like: a real number, covering all numbers in the real number range, using an arbitrary precision type.

* int is not (misleadingly) a machine word, but represents what it sounds like: the entire range of integers, using a bigint type. The Big int type in extra would probably be fine, if it were optimised more. I noticed that there were pull requests for large performance improvements in bigint, but they were rejected for lack of comments. I suspect, if it where given its rightful place as the only fully capable integer type in the libraries, then it would rapidly gain performance through patches, too.

* Range checking be implemented in debug and optimisation compiler modes, to support the above, and for the general (ada-like) safety benefits that range-checking of numbers would provide. If it’s REALLY such a compilation-speed turn-off, it could be enabled with specific compiler flags.

* i32, u32, u8, etc. are considered OPTIMISATIONS of Int, for cases where it’s manually specified.

* Rust compilers can optimise int to i32 etc. _automatically_, iff it knows there are no problems doing so, because of the number ranges involved in calculations.

* Likewise, f64, etc. are considered optimisations of real, and can either be optimised by hand, or — potentially, if the compiler work is done — automatically.

Benefits:

* Rust gains number types which are SUPERSETS of all existing number types, except if performance is confused with features (see below for performance, though).

* Rust would be safer and more importantly, CORRECT, by default, because:

- overflows can no longer happen by default (ignoring memory limits)
- equality comparisons are accurate for math by default
- number range checking would be available, either by default or as an optional compiler feature / optimisation flag

These changes eliminate whole classes of bugs, and, I believe, are very much in the spirit of Rust.

* Rust would be, at least theoretically, capable of whole new classes of optimisation. New x86_64 CPUs will (may already) implement decimal types in hardware — in a way that’s compatible with C++’s new decimal types, if I recall correctly. GPUs are capable of vector math on large numbers. These capabilities will only increase over time. As far as I can tell, rust is completely blind to such developments at present, and it certainly not paving the way for math optimisation in any future-proof way, independent of current CPU limitations. I believe rust should standardise on a default math framework which is forward-compatible, before the language reaches 1.0. Defaulting to arbitrary precision math for real would not IMPLEMENT these specific types such as hardware decimal, but it would abstract both current and future numeric capabilities of processors, indirectly, as it implements the superset of all possible math, and allows for optimisations to implement the specific types transparently/manually, as subsets within that math, if/when performant implementations become available.

* Rust would would be much more appealing to scientific / math / financial community as a result, since they would not have to do the very boring groundwork of implementing basic math types before getting to what they REALLY want to implement. For instance, I would like to implement currency libraries that handle very large values in very small currencies, and datetime libraries that handle very large timespans in very granular resolutions, but I’m not keen to do implement basic math functionality just as a “yak-shaving” step on the way to real goals. I say this only to illustrate my point that not having these arbitrary precision / bigint as a standard, integral, well-respected part of the language is limiting, in terms of language audience. Recently, for example, there has been talk of dropping the bigint type altogether due to supposed lack of interest. I believe Patrick Walton made the point that I would make: if strong math capabilities are not adequately respected as part of the language/library proper, then Rust will not have many users who respect strong math capabilities.

* There’s little to lose, and a lot to gain. I understand that the meaning of real has become (what is commonly known as) “float” only recently, so there would be no big loss in renaming it. Again, as stated, there would be no big loss in having slow, but correct math by default, with the option to specify types like i32 or f64, where needed.

Drawbacks:

* Slower math by default. A compiler flag could override this, or people could manually specify a faster f32/f64/i64 type, for example, if they care. I do not think this is a significant problem.

Really, why not? What would REALLY be lost? But think of what could be gained.  A few new policies.  Creation of one new type.  Some optimisations, now or later.  But Rust would gain new capabilities.  New correctness. New users.  New communities.  New libraries.  New applications.  And, perhaps most importantly, compatibility between those libraries and applications.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>