Oh, I thought this was going to be about floating point comparisons.
The IEEE floating point people thought this through, and a NaN is generated when there's no meaningful numeric value for a result. But they were just doing the floating point unit, which, at the time, wasn't even part of the main CPU. This led to a problem.
In IEEE floating point, the possible results of comparing two values are "greater", "less", "equal", and "none of the above". Any comparison involving a Not a Number is a "none of the above".
When "a > b" has no meaning, the hardware returns False. If a or b is a NaN, "a > b", "a < b", and "a == b" are all false. Most programmers don't expect that.
Booleans and branching don't handle this. There's no "not a boolean" value. A few languages throw an exception on if statements where the condition is undefined, but that's seldom hooked to floating point.
If you compute with non-signalling NaNs, don't compare the results until you've checked with "isnan()" that it's meaningful to do so.
Rust has the `PartialOrd` trait, which allows you to compare two things, and it returns an `Option<Ordering>`. When two things don't have an ordering, you can return `Option::None`. There is also `Ord`, which extends `PartialOrd`, and can be implemented for things that have a total ordering. Naturally, the floating point numbers (`f32` and `f64`) only implement `PartialOrd`. It's quite nice :)
I too was going to mention this. As a non-Rust programmer that's been following it very closely for a long time now, it's interesting to see the pain points of Rust pop up. It could just be my view and confirmation bias, but almost invariably when programmers express frustration with something harder to do in Rust, it's because they were making incorrect assumptions previously.
For example, I was previously exposed to the floating point comparison problem referenced by Animats through someone complaining about comparison and sorting (IIRC) steps required in Rust to work on floats. The "Woah, you're attempting an operation on a type that doesn't hold for the invariants this operation requires, please add ornamentation to make sure it's what you mean" compile time error is something I appreciate, because I would much rather fix it initially than accidentally run into it in production at runtime and have to shoehorn a fix in later. I've had enough of that for a lifetime.
> I too was going to mention this. As a non-Rust programmer that's been following it very closely for a long time now, it's interesting to see the pain points of Rust pop up. It could just be my view and confirmation bias, but almost invariably when programmers express frustration with something harder to do in Rust, it's because they were making incorrect assumptions previously.
I'd say it's a 50:50(1) split between "the compiler is not advanced enough to understand that this is okay" and "it's okay ... really ... it's ... oh. you're right. Damn, never thought about that before"
Small edit:
(1)Numbers are subject to further analysis.
This is definitely nice. But I'm thinking that NaN is a bit like null, in that it's a surprising value that may pop up which suddenly means something completely different, and code must handle it correctly.
Just as some languages (like Rust) provide a way to say that a variable can't be null, I think it would make sense to have a floating point value that can't be NaN. Comparison of two non-NaNnable floating point value can then just return a boolean, because we know it will never fail.
A key point of IEEE floating point is that NaN propagates. Any arithmetic operation that involves a NaN returns NaN. So you don't have to check until the end of the computation. Unless, of course, branching is involved.
A floating point type that could never be NaN is quite reasonable for a language with exceptions. But in a language without them, what's supposed to happen?
And for the half-initiated in IEEE 754, who think (like I used to) that all operations defined in the standard (including non-arithmetic ones) propagate NaNs, don't forget that pow is also defined (recommended, not obligatory), and that pow(NaN, 0) == 1, as well as pow(1, NaN) == 1, so there is a way to "lose" NaNs when using standard but non-arithmetic operations.
Well if the language is flexible enough it doesn't have to be a big annoyance to have types for both fp32 and nnfp32 (no-NaN fp32).
nnfp32 guarantees things that a fp32 doesn't, for example that comparison is always legal and works as expected.
This way you can perform calculations on fp32, but then have to sanitize the output before storing it back in the data structure.
One way to use the type system to enforce sanitation could be if values in a database are of type nnfp32, and the only way to produce a nnfp32 is using a function which returns Option<nnfp32>, returning Some only if the value is not NaN.
Branching handles it fine, with the right conditions. As long as you remember that there are four outcomes and code intentionally for all four, you are golden.
You can do isnan() testing on all values before comparing and branch on that, or you can just do, e.g.:
if x > y ...
elsif x ≤ y ...
else ...
What is most clear and appropriate is context dependent.
That just means you get the problem even further away from the condition that generated it - perhaps your whole query returns NULL. You want the opposite, fail-fast so that you can see what the specific problem was. Signalling NaNs are the way to go unless you have very specific performance requirements.
You can only rely on the compiler correctly compiling code with signaling NaNs if you use '#pragma STDC FENV_ACCESS ON', but Clang doesn't even support that.
Why is that caused by "just doing the [FPU]"? It's not obvious to me how being in a different part of the chip/board lessens the problems with "rule-breaking" comparisons, could you expand?
Another common class of bugs in C++ is comparison functions that don't respect the strict weak ordering axioms [1]. I've seen a lot of implementations in which cmp(x, x) is true.
This causes undefined behavior in STL algorithms, for example std::sort can easily go out of bounds.
A good rule of thumb to remember the axioms is that the comparator should behave like <, not <=.
Weirdly enough in mathematics it's far more common to define <= instead of <. With a<b simply being shorthand for 'a<=b and not b<=a'. The definitions also seem to be simpler that way.
Equality and equivalence get murky sometimes. Often there is some sense of equivalence that's 'good enough'. Like spaces that are isomorphic or functions that are equal 'almost everywhere'.
Now in by far the majority of cases inequality is defined for partial orders, which have the property that a<=b and b<=a imply a=b, so in that case a<=b and not b<=a implies b=/=a, but (non-strict) weak orders can have a<=b and b<=a without necessarily a=b.
I'm pretty sure there are also a few cases where they just prove reflexivity and transitivity and just make a and b where a<=b and b<=a equal by definition. I think you could use this in the definition for real numbers, but usually the equivalence classes are defined first and inequality is defined afterwards (which might be more work).
There's four different types there: POSET, POSET_PROOFS, Pord, and POSET'.
- POSET only has an underlying type t and a function "leq" that takes two elements of type t and returns a bool (but doesn't enforce semantics of that bool).
- POSET_PROOFS embeds a POSET, and carries three proofs (reflexivity, antisymmetry, and transitivity) certifying that the underlying "leq" function is actually the less-than-or-equal function of a partial ordering.
- Pord is a result type with 4 values meant to denote the exhaustive possiblities of "strictly less"/"equal"/"strictly greater"/"uncomparable".
- POSET' has a type t', a function "pcomp" that takes two elements of type t' and returns a Pord, and a proof for transitivity in the "strictly less" case.
POSET' has fewer fields to instantiate, but POSET_PROOFS allows writing proofs that are closer to traditional math.
Fortunately, the fact that it's possible to write conversions between POSET_PROOFS and POSET' (done in the code as "from" and "to" in the POSET_ISOMORPHIMS module right below) indicate that they have the same properties.
Instead of having a function that says whether one object is bigger than another, you could define an ordering by having a function that maps objects to say natural numbers or real numbers or some other object with a known good comparison, and then compare the mapped objects.
So instead of implementing f in f(a, b), you'd implement f in f(a) < f(b)
Also you could resolve hashing in the same way: i.e. making sure that hashing is consistent with equality. Two objects of the same type are equal if f(a) == f(b), and have hashcode hash(type, f(a)).
We need a function "clamp" which pins a value to within two others.
clamp(a,b,c)
One way of writing this is:
clamp(x,min,max) { return x < min ? min : x > max ? max : x; }
and this is a fine way to do it, but sometimes people forget the order of the arguments, so they feel the need to invent an IDE to help them remember.
Another way is writing:
clamp(min,x,max) { return x < min ? min : x > max ? max : x; }
which has the advantage of "visually" putting the unknown between two values. That can help people remember which argument goes where.
Yet another way is:
clamp(min,max,x) { return x < min ? min : x > max ? max : x; }
That one is popular amongst ML, Haskell, F# etc, programmers because they have partial application. They write something like:
let clampVolume = clamp 0 11
and pat themselves on the back because now they have a function for clamping volumes.
Another reason this one is popular is because the last expression is likely to be the biggest expression. That might pay dividends on keeping bugs out.
Something I'd like people to consider is simply taking the middle value.
function clamp(a,b,c) { return [].slice.call(arguments,0).sort()[1] }
function clamp($a,$b,$c){ $x=func_get_args();sort($x);return $x[1]; }
clamp(A,B,C) -> lists:nth(2, lists:sort([A,B,C])).
sub clamp { @_ = sort @_; $_[1] }
clamp:{*x@&1=<x:(x;y;z)};
clamp:{asc[(x;y;z)]1}
This supports any crazy rationale you might have for a particular order, and it is resistant to a lot of bugs you're likely to make since the code isn't particularly repetitive no matter what language you use. Oh and it doesn't have any comparison operators either.
The only real downside is that it's slow.
Oh well.
Before we go and fix the compilers to recognise these contortions as a "clamp" function and optimise for it, maybe it's still worth meditating on next time you have a bunch of conditions in your code.
GLSL (OpenGL Shading language, for GPUs) implements this language feature[0]. They chose clamp(x,min,max). Although, it's not exactly a language feature, because I think most GPUs implement this instruction directly in hardware.
If you use named arguments, can't you typically rearrange the order?
Don't get me wrong, I use destructured arguments in js and it works fine, but I thought named arguments in other languages solve that particular issue as well.
In languages like Python where you can have keyword args (typically with defaults), these options must come after the first argument, e.g. clamp(x, min=0, max=inf). Then you can do what you like with the order. And often it's good to specify the keyword for your own sanity later, when you have no idea what the third argument to some obscure Numpy call is. Or if you're using something like matplotlib and there are so many kwargs that it's better to specify.
Although in this case min/max probably shouldn't have defaults, I think clamp(min, x, max) would be a bit weird in Python?
> Something I'd like people to consider is simply taking the middle value.
> This supports any crazy rationale you might have for a particular order, and it is resistant to a lot of bugs you're likely to make since the code isn't particularly repetitive no matter what language you use. Oh and it doesn't have any comparison operators either.
Most of your examples seem to call a "sort" function. Are you sure you're not using any comparisons?
The thing to focus on in the mental load. What are you more likely to get right?
a = sort(x,y,z); return a[1]
or something like this?
if(x >= y && x =< z) return x;
if(y >= x && y =< z) return y;
if(z >= x && z =< y) return z;
if(x >= z && x =< y) return x;
if(y >= z && y =< x) return y;
if(z >= y && z =< x) return z;
…
This is why I introduced the "Copy & Paste Error Scorecard" to my (small) team. Everybody manages to give away points from time to time. More serious points could be scored for mistakes getting as far as Git, but the best points are the ones it takes about 4-5 minutes of going "huh?" to spot. Its nice to remember that copy-and-paste tooling is a privilege, not a programmers' right.
And while I'm here,
My favorite technique for reducing these mistakes - is using "highlight occurrences" in the IDE, with a nice bright color ; click the cursor on a variable or method call, all matching uses jump out at you.
This gives a great visual pattern when working in copy-pasted code blocks.
"In most cases programmers make an error in the last line of similar text blocks."
I've noticed something similar in prose. When there's a sentence in a paragraph that should be cut, it's disproportionately often the last one. I've seen this in my own writing and in other people's. Interestingly, it's particularly noticeable in the writing of Wodehouse; these bad final sentences are one reason his later stuff is not quite as good.
Totally agree with the concerns of the article. When you have to write a class of comparison functions like this in many types, it's arguably a violation of the Don't Repeat Yourself (DRY) principle to implement comparison with unique, bespoke code in each type. Scenarios like this beg for a library or type system to make things easier.
In Java's case, a better solution is Guava's ComparisonChain:
public int compareTo(Foo that) {
return ComparisonChain.start()
.compare(this.aString, that.aString)
.compare(this.anInt, that.anInt)
.compare(this.anEnum, that.anEnum, Ordering.natural().nullsLast())
.result();
}
There's still an opportunity to make mistakes like typos here, but due to the visual consistency it's less likely. Actual logic errors in the comparison are also less likely.
A more powerful language than Java could potentially automate the entire implementation of methods like these (rather than just some of it), by reflecting over the set of fields in a class. Apache Lombok's annotations can kind of do this for a number of common boilerplate methods, though it looks like not yet for compareTo(): https://projectlombok.org/features/Data
C# properties are an example of an improvement over Java, for example, but C# class implementors are still responsible for dealing with implementing Equals(), GetHashCode(), and ToString() just like Java, as far as I know.
I would be interested to hear if there are any current programming languages that handle this use-case particularly well. Any Haskell, F#, or Scala folks out there who can comment?
F# records and tuples implement equality and comparison based on their members automatically, which handles the part that you mention that you're supported to do in C#. It's not reflection so much as at compile-time the compiler emits the correct implementation of these methods.
Could it be that at least some of the errors are due to autocomplete in the IDE? This came to my mind at the very first example given and I have been guilty of this by pressing "return" at the first option offered by the IDE instead of scrolling down to the correct one first. This feature of an IDE (as much as I like it) makes it even easier to think already about the next task.
I can understand the business reasons behind it, but it would be much better if these checks were implemented directly in mainstream compilers for the benefit of everyone.
Most (all?) functional programming languages implement structural equality automatically. Referential equality only makes sense when records can be mutated. Immutability eliminates whole categories of errors.
Haskell does not quite implement it automatically, but does support auto-generating it. Assuming all of of a type's component types support structural equality, a new type can support it by adding "deriving (Eq)" to the type declaration.
Ocaml has both structural and referential equality tests automatically. Unfortunately (for C-style programmers), structural equality uses "=", while referential equality uses "==".
From the Ocaml doc:
>On mutable types such as references, arrays, byte sequences, records with mutable fields and objects with mutable instance variables, e1 == e2 is true if and only if physical modification of e1 also affects e2. On non-mutable types, the behavior of ( == ) is implementation-dependent; however, it is guaranteed that e1 == e2 implies compare e1 e2 = 0.
> Ocaml has both structural and referential equality tests automatically.
Yep, and Ocaml equality is a giant pain. As for ==, referential equality is almost never what you want, so it’s (IMO) unwise to expose it except as an unsafe primitive. As for =, OCaml’s “polymorphic”/“structural” equality is usually wrong for any non-trivial data structure, so all the comparison-based standard library are functorized over comparison anyway. Use of polymorphic ‘=‘ is basically banned in the codebase I work on.
Haskell’s Eq/Ord typeclasses are vastly easier to use and harder to mess up. I think this particular use case is a more or less unambiguous win for typeclasses.
I find it interesting that a few of the patterns showed here can (should) be prevented by the compiler
Pattern: A < B, B > A
Can be prevented by using a compare/compareTo function. Even better if the data type returned by it is not an Int, but rather an enumeration/sum of LessThan | Equal | GreatherThan (which is pretty common in ML-family languages).
Such a data type will automatically give you warnings if you forget one of the comparison cases (or if you mistakenly go over the same case multiple times)
Pattern: a Member of the Class is Compared with itself
All of the examples are about writing a definition of compare/compareTo for the specific class.
This can be prevented by having the compiler automatically generating the implementation, as in Haskell:
This is quite interesting, I think it's fine to have a `,` operator that ignores the result on the left. What is not fine is to have such an operator for a side-effects-free context. Or, put another way, C-style `for` loops are too clever for their own good. If a for loop syntax is:
for( initialization; test; mutation)
obviously you want to have some sort of side effect for `initialization` and `mutation`. But most often you don't want/can do without side effects for the `test`, and disallowing `,` in tests would prevent these issues.
Similarly, for while loops, even if it's a common idiom/shortcut to have `++value` or `--value` in the test expression, it's not without shortcomings.
Ultimately, a for-each loop, (or better, a map) is less error-prone. That would also prevent the other issues listed under the same pattern (in which an unsigned int had been used for a decreasing counter).
Pattern: Sloppy Copying of the Code
The Compiler::RefCntCmp example ought to be prevented by factoring out more the code into reusable functions.
MySql's rr_cmp unfortunately is an example of, even when your tool has safer way to do things, you might miss out, in the (often misguided?) attempt to optimize it further.
Just like your code might not use the appropriate data types, and thus be unable to rely on safety guarantees provided by the compiler. Plenty of projects don't use static analyzers, and thus miss out on all the checks mentioned in the article.
I'd avoid writing new projects in C family languages, but given that there's plenty of code written in those languages already out there, I'm glad that tools like the one described in the article exist.
The implicit type conversion mess is all over the place in JS. Changing == to === is by far not sufficient.
"<==", "==>", "truecmp" would be required to correct a tad the mess still. More correct solution would be to burn JS to the ashes at my opinion, with emacs, modern hardware, devops and startups
Oh, I thought this was going to be about floating point comparisons.
The IEEE floating point people thought this through, and a NaN is generated when there's no meaningful numeric value for a result. But they were just doing the floating point unit, which, at the time, wasn't even part of the main CPU. This led to a problem.
In IEEE floating point, the possible results of comparing two values are "greater", "less", "equal", and "none of the above". Any comparison involving a Not a Number is a "none of the above".
When "a > b" has no meaning, the hardware returns False. If a or b is a NaN, "a > b", "a < b", and "a == b" are all false. Most programmers don't expect that.
Booleans and branching don't handle this. There's no "not a boolean" value. A few languages throw an exception on if statements where the condition is undefined, but that's seldom hooked to floating point.
If you compute with non-signalling NaNs, don't compare the results until you've checked with "isnan()" that it's meaningful to do so.
Rust has the `PartialOrd` trait, which allows you to compare two things, and it returns an `Option<Ordering>`. When two things don't have an ordering, you can return `Option::None`. There is also `Ord`, which extends `PartialOrd`, and can be implemented for things that have a total ordering. Naturally, the floating point numbers (`f32` and `f64`) only implement `PartialOrd`. It's quite nice :)
I too was going to mention this. As a non-Rust programmer that's been following it very closely for a long time now, it's interesting to see the pain points of Rust pop up. It could just be my view and confirmation bias, but almost invariably when programmers express frustration with something harder to do in Rust, it's because they were making incorrect assumptions previously.
For example, I was previously exposed to the floating point comparison problem referenced by Animats through someone complaining about comparison and sorting (IIRC) steps required in Rust to work on floats. The "Woah, you're attempting an operation on a type that doesn't hold for the invariants this operation requires, please add ornamentation to make sure it's what you mean" compile time error is something I appreciate, because I would much rather fix it initially than accidentally run into it in production at runtime and have to shoehorn a fix in later. I've had enough of that for a lifetime.
> I too was going to mention this. As a non-Rust programmer that's been following it very closely for a long time now, it's interesting to see the pain points of Rust pop up. It could just be my view and confirmation bias, but almost invariably when programmers express frustration with something harder to do in Rust, it's because they were making incorrect assumptions previously.
I'd say it's a 50:50(1) split between "the compiler is not advanced enough to understand that this is okay" and "it's okay ... really ... it's ... oh. you're right. Damn, never thought about that before"
Small edit: (1)Numbers are subject to further analysis.
Though sadly casting float conversions are still UB. There's a really good writeup about it here:
https://github.com/rust-lang/rust/issues/10184#issuecomment-...
This is definitely nice. But I'm thinking that NaN is a bit like null, in that it's a surprising value that may pop up which suddenly means something completely different, and code must handle it correctly.
Just as some languages (like Rust) provide a way to say that a variable can't be null, I think it would make sense to have a floating point value that can't be NaN. Comparison of two non-NaNnable floating point value can then just return a boolean, because we know it will never fail.
A key point of IEEE floating point is that NaN propagates. Any arithmetic operation that involves a NaN returns NaN. So you don't have to check until the end of the computation. Unless, of course, branching is involved.
A floating point type that could never be NaN is quite reasonable for a language with exceptions. But in a language without them, what's supposed to happen?
And for the half-initiated in IEEE 754, who think (like I used to) that all operations defined in the standard (including non-arithmetic ones) propagate NaNs, don't forget that pow is also defined (recommended, not obligatory), and that pow(NaN, 0) == 1, as well as pow(1, NaN) == 1, so there is a way to "lose" NaNs when using standard but non-arithmetic operations.
Well if the language is flexible enough it doesn't have to be a big annoyance to have types for both fp32 and nnfp32 (no-NaN fp32). nnfp32 guarantees things that a fp32 doesn't, for example that comparison is always legal and works as expected.
This way you can perform calculations on fp32, but then have to sanitize the output before storing it back in the data structure. One way to use the type system to enforce sanitation could be if values in a database are of type nnfp32, and the only way to produce a nnfp32 is using a function which returns Option<nnfp32>, returning Some only if the value is not NaN.
In fact there is a Rust crate for this:
https://danielkeep.github.io/rust-numeric-float/doc/numeric_...
Wow, that is fantastic! :) it looks like the same idea but developed.
> Booleans and branching don't handle this.
Branching handles it fine, with the right conditions. As long as you remember that there are four outcomes and code intentionally for all four, you are golden.
You can do isnan() testing on all values before comparing and branch on that, or you can just do, e.g.:
What is most clear and appropriate is context dependent.Or, construct your checks so that a NaN input produces the desired output... and then document that behavior with tests.
I can't believe I'm saying this, but it would be nice if it behaved like null in SQL.
That just means you get the problem even further away from the condition that generated it - perhaps your whole query returns NULL. You want the opposite, fail-fast so that you can see what the specific problem was. Signalling NaNs are the way to go unless you have very specific performance requirements.
You can only rely on the compiler correctly compiling code with signaling NaNs if you use '#pragma STDC FENV_ACCESS ON', but Clang doesn't even support that.
If your language can't use signalling NaNs correctly, get a better language.
Why is that caused by "just doing the [FPU]"? It's not obvious to me how being in a different part of the chip/board lessens the problems with "rule-breaking" comparisons, could you expand?
sigh
This succinctly explains a bug i have on the todo pile.
Details like this is why i am always reading new dev books
Another common class of bugs in C++ is comparison functions that don't respect the strict weak ordering axioms [1]. I've seen a lot of implementations in which cmp(x, x) is true.
This causes undefined behavior in STL algorithms, for example std::sort can easily go out of bounds.
A good rule of thumb to remember the axioms is that the comparator should behave like <, not <=.
[1] http://en.cppreference.com/w/cpp/concept/LessThanComparable
Weirdly enough in mathematics it's far more common to define <= instead of <. With a<b simply being shorthand for 'a<=b and not b<=a'. The definitions also seem to be simpler that way.
Is that so? I thought the “equals” relation was usually well-defined in formal math. What’s an instance where this isn’t the case? Perhaps the reals?
Equality and equivalence get murky sometimes. Often there is some sense of equivalence that's 'good enough'. Like spaces that are isomorphic or functions that are equal 'almost everywhere'.
Now in by far the majority of cases inequality is defined for partial orders, which have the property that a<=b and b<=a imply a=b, so in that case a<=b and not b<=a implies b=/=a, but (non-strict) weak orders can have a<=b and b<=a without necessarily a=b.
I'm pretty sure there are also a few cases where they just prove reflexivity and transitivity and just make a and b where a<=b and b<=a equal by definition. I think you could use this in the definition for real numbers, but usually the equivalence classes are defined first and inequality is defined afterwards (which might be more work).
Are there type systems where these axioms can be encoded? Dependent typing, perhaps.
As an exercise in learning Coq, I've implemented a proof-carrying version of the partial-ordering relation: https://github.com/aweinstock314/coq-stuff/blob/a1831f9e1e95...
There's four different types there: POSET, POSET_PROOFS, Pord, and POSET'.
- POSET only has an underlying type t and a function "leq" that takes two elements of type t and returns a bool (but doesn't enforce semantics of that bool).
- POSET_PROOFS embeds a POSET, and carries three proofs (reflexivity, antisymmetry, and transitivity) certifying that the underlying "leq" function is actually the less-than-or-equal function of a partial ordering.
- Pord is a result type with 4 values meant to denote the exhaustive possiblities of "strictly less"/"equal"/"strictly greater"/"uncomparable".
- POSET' has a type t', a function "pcomp" that takes two elements of type t' and returns a Pord, and a proof for transitivity in the "strictly less" case.
POSET' has fewer fields to instantiate, but POSET_PROOFS allows writing proofs that are closer to traditional math.
Fortunately, the fact that it's possible to write conversions between POSET_PROOFS and POSET' (done in the code as "from" and "to" in the POSET_ISOMORPHIMS module right below) indicate that they have the same properties.
Instead of having a function that says whether one object is bigger than another, you could define an ordering by having a function that maps objects to say natural numbers or real numbers or some other object with a known good comparison, and then compare the mapped objects.
So instead of implementing f in f(a, b), you'd implement f in f(a) < f(b)
Also you could resolve hashing in the same way: i.e. making sure that hashing is consistent with equality. Two objects of the same type are equal if f(a) == f(b), and have hashcode hash(type, f(a)).
We need a function "clamp" which pins a value to within two others.
One way of writing this is: and this is a fine way to do it, but sometimes people forget the order of the arguments, so they feel the need to invent an IDE to help them remember.Another way is writing:
which has the advantage of "visually" putting the unknown between two values. That can help people remember which argument goes where.Yet another way is:
That one is popular amongst ML, Haskell, F# etc, programmers because they have partial application. They write something like: and pat themselves on the back because now they have a function for clamping volumes.Another reason this one is popular is because the last expression is likely to be the biggest expression. That might pay dividends on keeping bugs out.
Something I'd like people to consider is simply taking the middle value.
This supports any crazy rationale you might have for a particular order, and it is resistant to a lot of bugs you're likely to make since the code isn't particularly repetitive no matter what language you use. Oh and it doesn't have any comparison operators either.The only real downside is that it's slow.
Oh well.
Before we go and fix the compilers to recognise these contortions as a "clamp" function and optimise for it, maybe it's still worth meditating on next time you have a bunch of conditions in your code.
GLSL (OpenGL Shading language, for GPUs) implements this language feature[0]. They chose clamp(x,min,max). Although, it's not exactly a language feature, because I think most GPUs implement this instruction directly in hardware.
https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/cl...
Fun fact: the Radeon GCN ISA actually has an instruction for taking the middle value, aka median, called v_med3.
Some languages let you spell out the arguments upon calling a function.
clamp(x:3, min:1, max:5)
Oh look! No mental strain required to think about what the function does nor what the order should be to minimize mistakes; it’s clear as day!
I wish more languages adopted the option to do this.
Moreover, you could also allow for something like passing a structure of parameters.
clamp({min:2, max:5, x:3})
^parameters can be rearranged as you see fit
There! Now the programmer can decide what’s the way they want to call the function that makes the most sense to them.
If you use named arguments, can't you typically rearrange the order?
Don't get me wrong, I use destructured arguments in js and it works fine, but I thought named arguments in other languages solve that particular issue as well.
In languages like Python where you can have keyword args (typically with defaults), these options must come after the first argument, e.g. clamp(x, min=0, max=inf). Then you can do what you like with the order. And often it's good to specify the keyword for your own sanity later, when you have no idea what the third argument to some obscure Numpy call is. Or if you're using something like matplotlib and there are so many kwargs that it's better to specify.
Although in this case min/max probably shouldn't have defaults, I think clamp(min, x, max) would be a bit weird in Python?
Do you happen to work for the C++ committee? http://en.cppreference.com/w/cpp/algorithm/clamp
> Something I'd like people to consider is simply taking the middle value.
> This supports any crazy rationale you might have for a particular order, and it is resistant to a lot of bugs you're likely to make since the code isn't particularly repetitive no matter what language you use. Oh and it doesn't have any comparison operators either.
Most of your examples seem to call a "sort" function. Are you sure you're not using any comparisons?
The thing to focus on in the mental load. What are you more likely to get right?
or something like this?Another option would be
Sure, and another option would be for a range "class" to do the clamping:
but this turns to a conversation about syntax, and not about designing API so that we guide our users towards success.This is why I introduced the "Copy & Paste Error Scorecard" to my (small) team. Everybody manages to give away points from time to time. More serious points could be scored for mistakes getting as far as Git, but the best points are the ones it takes about 4-5 minutes of going "huh?" to spot. Its nice to remember that copy-and-paste tooling is a privilege, not a programmers' right.
And while I'm here, My favorite technique for reducing these mistakes - is using "highlight occurrences" in the IDE, with a nice bright color ; click the cursor on a variable or method call, all matching uses jump out at you. This gives a great visual pattern when working in copy-pasted code blocks.
This reminds me of a running joke we had in highschool: When you learn Calculus, you forget Algebra.
Because the vast majority of our mistakes involved adding or subtracting wrong, something "simple" that we didn't expend much thought on.
"In most cases programmers make an error in the last line of similar text blocks."
I've noticed something similar in prose. When there's a sentence in a paragraph that should be cut, it's disproportionately often the last one. I've seen this in my own writing and in other people's. Interestingly, it's particularly noticeable in the writing of Wodehouse; these bad final sentences are one reason his later stuff is not quite as good.
Taking this heuristic to its ultimate conclusion, one should trim writing until nothing remains.
(I should have followed that heuristic in this comment.)
Trim until the idea to be conveyed disappears. Then ctrl-z once.
”Everything should be made as simple as possible, but no simpler.”
—Unknown, attributed to Einstein
Years ago (2005), there was a Java tool called Guantanamo[0] that would delete or comment out code that is not covered by unit tests.
[0] https://github.com/codehaus/ashcroft/tree/master/guantanamo
Totally agree with the concerns of the article. When you have to write a class of comparison functions like this in many types, it's arguably a violation of the Don't Repeat Yourself (DRY) principle to implement comparison with unique, bespoke code in each type. Scenarios like this beg for a library or type system to make things easier.
In Java's case, a better solution is Guava's ComparisonChain:
There are similar utility methods for computing the object hashCode() and toString() and so on. See: https://github.com/google/guava/wiki/CommonObjectUtilitiesEx...There's still an opportunity to make mistakes like typos here, but due to the visual consistency it's less likely. Actual logic errors in the comparison are also less likely.
A more powerful language than Java could potentially automate the entire implementation of methods like these (rather than just some of it), by reflecting over the set of fields in a class. Apache Lombok's annotations can kind of do this for a number of common boilerplate methods, though it looks like not yet for compareTo(): https://projectlombok.org/features/Data
C# properties are an example of an improvement over Java, for example, but C# class implementors are still responsible for dealing with implementing Equals(), GetHashCode(), and ToString() just like Java, as far as I know.
I would be interested to hear if there are any current programming languages that handle this use-case particularly well. Any Haskell, F#, or Scala folks out there who can comment?
Java 8 also has `Comparator::comparing` and `Comparator::thenComparing`, so your code example above would be
This makes much more sense than your example I think, since it will use the same getter on both methods, completely eliminating this class of errors.F# records and tuples implement equality and comparison based on their members automatically, which handles the part that you mention that you're supported to do in C#. It's not reflection so much as at compile-time the compiler emits the correct implementation of these methods.
Could it be that at least some of the errors are due to autocomplete in the IDE? This came to my mind at the very first example given and I have been guilty of this by pressing "return" at the first option offered by the IDE instead of scrolling down to the correct one first. This feature of an IDE (as much as I like it) makes it even easier to think already about the next task.
I love these PVS-Studio write ups.
I can understand the business reasons behind it, but it would be much better if these checks were implemented directly in mainstream compilers for the benefit of everyone.
Most (all?) functional programming languages implement structural equality automatically. Referential equality only makes sense when records can be mutated. Immutability eliminates whole categories of errors.
Haskell does not quite implement it automatically, but does support auto-generating it. Assuming all of of a type's component types support structural equality, a new type can support it by adding "deriving (Eq)" to the type declaration.
Ocaml has both structural and referential equality tests automatically. Unfortunately (for C-style programmers), structural equality uses "=", while referential equality uses "==".
From the Ocaml doc:
>On mutable types such as references, arrays, byte sequences, records with mutable fields and objects with mutable instance variables, e1 == e2 is true if and only if physical modification of e1 also affects e2. On non-mutable types, the behavior of ( == ) is implementation-dependent; however, it is guaranteed that e1 == e2 implies compare e1 e2 = 0.
> Ocaml has both structural and referential equality tests automatically.
Yep, and Ocaml equality is a giant pain. As for ==, referential equality is almost never what you want, so it’s (IMO) unwise to expose it except as an unsafe primitive. As for =, OCaml’s “polymorphic”/“structural” equality is usually wrong for any non-trivial data structure, so all the comparison-based standard library are functorized over comparison anyway. Use of polymorphic ‘=‘ is basically banned in the codebase I work on.
Haskell’s Eq/Ord typeclasses are vastly easier to use and harder to mess up. I think this particular use case is a more or less unambiguous win for typeclasses.
You are making some covert buffer-overflow-vulnerability-crafters very sad now... their easy subterfuge is revealed :-o
I find it interesting that a few of the patterns showed here can (should) be prevented by the compiler
Pattern: A < B, B > A
Can be prevented by using a compare/compareTo function. Even better if the data type returned by it is not an Int, but rather an enumeration/sum of LessThan | Equal | GreatherThan (which is pretty common in ML-family languages).
Such a data type will automatically give you warnings if you forget one of the comparison cases (or if you mistakenly go over the same case multiple times)
Pattern: a Member of the Class is Compared with itself
All of the examples are about writing a definition of compare/compareTo for the specific class.
This can be prevented by having the compiler automatically generating the implementation, as in Haskell:
https://www.haskell.org/onlinereport/derived.html
Pattern: Incorrect Check of Null References
This is handled by the usual Maybe/Optional types
Pattern: Incorrect Loops
This is quite interesting, I think it's fine to have a `,` operator that ignores the result on the left. What is not fine is to have such an operator for a side-effects-free context. Or, put another way, C-style `for` loops are too clever for their own good. If a for loop syntax is:
for( initialization; test; mutation)
obviously you want to have some sort of side effect for `initialization` and `mutation`. But most often you don't want/can do without side effects for the `test`, and disallowing `,` in tests would prevent these issues.
Similarly, for while loops, even if it's a common idiom/shortcut to have `++value` or `--value` in the test expression, it's not without shortcomings.
Ultimately, a for-each loop, (or better, a map) is less error-prone. That would also prevent the other issues listed under the same pattern (in which an unsigned int had been used for a decreasing counter).
Pattern: Sloppy Copying of the Code
The Compiler::RefCntCmp example ought to be prevented by factoring out more the code into reusable functions.
MySql's rr_cmp unfortunately is an example of, even when your tool has safer way to do things, you might miss out, in the (often misguided?) attempt to optimize it further.
Just like your code might not use the appropriate data types, and thus be unable to rely on safety guarantees provided by the compiler. Plenty of projects don't use static analyzers, and thus miss out on all the checks mentioned in the article.
I'd avoid writing new projects in C family languages, but given that there's plenty of code written in those languages already out there, I'm glad that tools like the one described in the article exist.
Oh I thought it was about JS insanity.
Well known caveats : don't use == but ===.
What about cmp implicit operator in js?
Well, it relies on == :)
same with lt/gt
"0" === 0 false "0" < 0 false
"0" <= 0 true "0" > 0 false
WTFOMGBBQ?!!!!
The implicit type conversion mess is all over the place in JS. Changing == to === is by far not sufficient.
"<==", "==>", "truecmp" would be required to correct a tad the mess still. More correct solution would be to burn JS to the ashes at my opinion, with emacs, modern hardware, devops and startups