A Toronto Data Guy

True, False, & NULL/None/nil/Blank logic in MySQL, Python, Ruby, and Excel

(NOTE: This blog post is extremely old and may give the reader the false impression that I have no idea what I’m doing. I’m keeping it up because I’m not a coward.)

EDIT: see below for an explanation on the “nil and False => nil while nil or False => False”

Being a Data Guy (in the corporate world, a “Business/Market Intelligence Analyst/Manager”) has its advantages. There is always a new question to answer, a new system to optimize, and a new split test to run. Management makes a ton of money through discoveries found in the data, and that means that bonuses are not far behind. But one dark side of being a Data Guy is the sheer number of tools you need to use (quickly!) and the inconsistencies across them. Even if some of them are as cool as Ruby.

Ruby Logo

Allow me to introduce True, False, and NULL problem. On the surface this doesn’t seem that hard. How many inconsistencies could there be across just 3 values (or lack thereof in the case of NULL) represented in 4 languages?

As it turns out: Many.

The old pros out there are already screaming, “hey! NULL in MySQL has nothing to do with nil in Ruby! This discussion is meaningless! Don’t you know anything? Go read some books by W. Richard Stev…”

Well, they have a point. NULL in MySQL is not the same as nil in Ruby, but since this message has been pounded into me the hard way I was hoping to save others the headaches that I’ve befallen upon. Grey beards, go back to kernel hacking for now, we’ll grab beer later.

The “Or” Logical Operator

A table of "or" logical operators

Ok, so far so good, we can see that regardless of the tool used “NULL or TRUE” returns “TRUE.” And it should, given that no matter what is on the other side of an “or” operator we already have a “TRUE” value. The “NULL or False” column shows an inconsistency between MySQL and the rest of the bunch. Why does MySQL return “NULL” when the rest of the group give “FALSE”? Pretty easy answer there is that in MySQL “NULL” basically means “unknown”.

MySQL dolphin saying "oh shit"

If that still doesn’t answer the question in your mind, imagine this: You are a home to home surveyor, asking people about their favorite politician. You get to a house to ask the owner whether they are going to vote for the Libertarian Candidate or the Constitution Party Candidate (there are many surveyors in dreamworld, you see) but you can’t tell if the person is male or female, say because they are behind a screen door. When you get to your trusty MySQL database you leave the value of NULL in the “is_a_man?” column  because you cannot tell one way or another. That is what the MySQL guys and girls had in mind when they decided “NULL or FALSE” returns “NULL”. They basically said the meaning of “NULL” is “unknown”. So when we say “NULL or FALSE” we really can’t tell one way or the other. This is especially important in numeric type fields where we can’t just print the string “couldn’t tell the height in inches due to darkness.”

What about the rest of the bunch, why do they default to “None, nil, etc…”? Because to them the lack of a value indicates exclusion from the logical operator. In other words, imagine you were a computer program interpreter. Whenever you saw the word “nil” (or otherwise) after an “or” operator you basically said “screw it, forget about that guy, he hasn’t called me in months”. That would describe Ruby, Python, and Excel. They only look to the remaining side of the “or” operator.

What about the last column of the table? Well, MySQL stays consistent with the whole NULL = Unknown value situation, basically saying: “here I’m giving you a NULL, but that actually means ‘unknown’ because either of these two values could be a true”. Totally cool and consistent.

Here we also see Ruby and Python try to tell both sides of the “or” operator to go #*&^ itself, but are left with nothing so they say “got nothing” in their own way, nil and None, respectively.

Excel is different. And as we will come to learn, when in doubt Excel will be different. It basically says “Hey! How’s it going? So I’m trying to hide this concept of ‘NULL’ and ‘nil’ from you because you guys are mostly MBAs that have a hard enough grasp dealing with a blank cell, but you are making it really hard for me. So I’m just going to throw up the error ‘#VALUE!’ and you can call over the programmer paid one fourth as much to figure it out for you.”

Excel logo

This is a recurring theme with Excel. Excel doesn’t really have a NULL or even nil field. Excel has a “Blank” field, which it will happily take as input, but will almost never serve as output, so it gets creative. (As a side note, entering a single “’” into an excel cell gets you an “empty non-blank field” useful for when the default behavior isn’t what you want. This comes in handy everywhere, from conditional formating to logical comparisons to type safety.)

The “And” Logical Operator

The "And" logical operator chart

Here is where things start to get batty. Let’s start with the “NULL and TRUE” column. MySQL handles this situation as we would expect, given that in MySQLese “NULL” just means “unknown”. Ruby and Python can’t tell the nil/None boys to stuff it this time, because there is a pesky “and” operator, which foiled some plans for world domination, but that’s a story for another day. They basically say “hey, you may actually want to take a look at this sarge”. Personally, if I were a programming language interpreter I would either say “nil or false => false” OR “nil and true => nil” not both. I’d make it easier on the girls dancing with me. Then again, 80% of my keyboard time is spent with MySQL, so maybe I’m just used to blonde haired girls.

Things were just starting to get boring until Microsoft stepped in. Ladies and gentlemen Microsoft assumes that, if you aren’t FALSE you may as well be TRUE in an “and” operator. Blank, the number 3, “luapnor” - it doesn’t matter. Like republicans in negative land, everyone is their friend until you specifically tell them you hate their guts. Just for fun, try this in cell “B1” in a brand new Excel spreadsheet: ‘=if(A1,”lol_true”,”fffuuu”)’  should come out to “fffuu”. Great. Now replace it with this: ‘=IF(AND(A1,TRUE),”lol_true”,”fuuu”)’ now what does it come to? “lol_true”.

So why is this? It comes back to MS Excel trying to hide complexity to MBAs. Given an “and” operator Excel will ignore any non boolean values to make the lives easier for people just messing around with spreadsheets (these often can have huge holes in them that MBAs want to ignore). But when there is no “and”/”or” present, the if statement NEEDS to try to use the blank cell, which it does in the ’=if(A1,”lol_true”,”fffuuu”)’ Excel cell.

To the next column!

MySQL still acting as it should. Treats “NULL” as “unknown” and predictably says “no matter what ‘NULL’ was supposed to be, I have one FALSE, so I can safely say “FALSE”.

Excel will predictably say “F that NULL, I’m just going to ignore it. This whole thing is False.”

Which brings us to Python and Ruby. I don’t have words for why “nil or false” gives me false in Ruby (and Python), but “nil and false” gives me nil. Even if Ruby told nil to get out town it would still be left with FALSE. I’m going to try to keep my faith in Ruby by saying this fifteen times slowly: “Ruby was made by intelligent people, there’s a logical explanation… Ruby was made by intelligent people…”

The last column makes total sense. NULL and NULL should be NULL. you have nothing to work off of! No idea about anything. True, False? “Is the answer to this question the same as if I asked you if true and false were the same thing?” Of course Excel wants to say NULL, but it can’t because people like Mitt Romney might throw a fit if they see an term so unfamiliar as NULL. So it just throws an error, which makes sense when you look at all the other behavior Excel has exhibited over the past couple paragraphs.

The “!=” Logical Operator

Using the not equal to operator

You know the drill at this point. “TRUE != NULL”: makes sense. NULL for MySQL ‘cause NULL is basically unknown, the rest say “hey, you know what True really isn’t equal to NULL, that MySQL guy is on smack.”

Next Column: False != NULL, MySQL says “NULL could be anything. It could even be a boat! So we can’t tell if FALSE != NULL.” Ruby and Python ask us what MySQL is smoking and say that, without a doubt, that FALSE isn’t nil/None.

Then comes Excel. Apparently, to excel, an empty cell is not not equal to false. Which means it is equal to false. Except when you put it in an “or” operator or an “and” operator,then it isn’t false, otherwise those things wouldn’t have thrown an error, they would have returned false, or when used with an and operator they would have acted like false, not true.


Er…

At this point I don’t even care anymore. Onto the last column: NULL != NULL. Makes sense all round. Even in Excel.

The “Greater than” Logical Operator

One last image showing truefalsenull greaterthanhree

I just included this one to make sure everyone knew that Python and Ruby couldn’t be counted on to return the same thing. Of course I’ve only chosen to look at some aspects of the NULL/TRUE/FALSE problem. I’m not even going to get started on “not nil or nil” problem. Or PHP. Or Ruby’s difference between Case equivalence and normal equivalence.

I don’t really have an overarching conclusion in this blog post. No “wtf Micro$oft engineers are dolts!” message. I look at each tool fulfilling each role extremely well and for their intended audience. There is a difference between NULL and nil, even if it makes it harder on multi-tool Data Guys, like myself. At least business-y people will feel more comfortable rocking a spreadsheet when their whole world is true-false-error, and I mean this sincerely. I must say, though, that I certainly do prefer MySQL’s consistent handling of NULL. Makes me wish that Ruby had its own type of unknown class. Also, don’t think that I don’t have love for MS Excel. Excel is amaaaaazing. By far MS best product. Understanding its quirks is just part of life.

If anyone wants to add to this list, I’ve made a open spreadsheet here. It is a Google spreadsheet, which I guess is kind of funny after all the MS Excel explaining I’ve had to do. Any questions for me? Send ‘em over to p.engineer@gmail.com.

Update from Pavpanchekha, whom emailed me following my post (many thanks, I KNEW there was a reason):

Wanted to explain the strange inconsistencies you saw in your recent essay on Nil in Python/Ruby/Excel/MySQL.


You questioned the sanity of people who decided that nil and False was nil while nil or False was False
It all comes down to short-circuiting operators, that is, a specific optimization/feature that ever programming language known to man has. (I stress programming, as MySQL and Excel really aren’t programming languages).
The basic idea is that an and statement, and an or statement, will only examine as many elements as they need to decide their value.
This is good; if you have, say:
cheap_function_that_is_usually_false() and big_computation()
you will almost always not compute big_computation(). It’s also a feature, not just an optimization. In python:
if len(s) > 1 and s[1] == “bob”:    do_stuff()
if and weren’t short-circuiting, this would cause errors, since it’d try to access s[1] even when len(s) == 1 (and thus there is no s[1]).
Now, how does the short-circuiting work? Well, it’s simple, really. For and: evaluate the first argument; if that’s false, return it, otherwise, evaluate and return the other argument.
So:
nil and False
Well, we evaluate the first. We get nil. Is nil truthy? No, because we want “if nil” to not do anything. So we return nil.
False and nil
Well, we evaluate the first. We get False. Is False truthy? No, so we return False.
And yes, this does mean that and and or are no longer commutative. But the benefits are great enough to justify it.
For or, the algorithm is similar. Evaluate the first. If that’s true (truthy), we’re done, so return it. Otherwise, return the other.
So:
nil or True
We evaluate the first. We get nil. Is nil truthy? No, so we return True
True or nil
We evaluate the first. We get True. Is True truthy? Yes, so we return it.
In this case, both orders were the same, but they’d be different if something truthy was used in place of nil: try True and 1 vs 1 and True
Hope this explains something!