Python’s Measure of Sameness
By Robert Schroll, TDI Data Scientist in Residence and Instructor for our Data Science Fellowship
It may be a bit surprising to those coming to the language, but Python has two ways of measuring “sameness”. There is equality (
a == b) and identity (
a is b), and new Pythonistas are often confused about which one to use. One way that I like to think of it is that equality is defined by Python objects, while identity is defined by Python itself.
Equality means whatever a Python object wants it to mean. For some things, this is obvious:
1 == 1
For others, it may not be so obvious. For example, Python lists are equal if they contain equal elements.
[1, 2, 3] == [1, 2, 3]
Equality can extend across types. Here, we can see that an integer and a floating point value can be equal.
1 == 1.0
Equality doesn’t have to return a Boolean value. For example, NumPy arrays define equality to work element-wise:
import numpy as np np.array([1,2,3]) == np.array([1,2,4])
Out : array([ True, True, False])
This can produce results that may seem strange. For example, IEEE floating point numbers are defined such that not-a-number (NaN) doesn’t equal anything, not even itself.
np.nan == np.nan
Out : False
You can get in on the fun, too. Any class can define a
__eq__ method in order to define what equality means for it. For example:
class NotEqualTo: def __init__(self, value): self.value = value def __eq__(self, other): return self.value != other not_three = NotEqualTo(3) not_three == 1, not_three == 2, not_three == 3
Out : (True, True, False)
We now have an object that is equal to everything other than 3.
In contrast to the flexibility provided by equality, Python’s identity check is controlled entirely by the Python interpreter. The
is operator checks whether two variables refer to the same object. Thus:
a = b = [1, 2, 3] a is b
Out : True
b both refer to the same underlying object, so a change to one will show up in the other.
Out : [1, 2, 3, 4]
However, another list with the same elements is another object, and therefore is not the same in identity.
c = [1, 2, 3, 4] a == c, a is c
Out : (True, False)
After all, as different objects, modifications to one will not affect the other.
Out : [1, 2, 3, 4]
As a result, they are no longer equal:
a == c
Out : False
This provides another way to think about the difference between equality and identity in Python. Equality is provisional: two objects may be equal now but unequal in the future. Identity is permanent and will never change.
(That said, don’t be confused by the fact that a given name in Python can be redirected to refer to another object. If we do
b = 0, then
a is b will yield
False, simply because
b now points to the number 0 instead of the list. The identity of the list has not changed.)
How does Python determine identity? Every object managed by the Python interpreter is given an ID. This can be looked up with the
Out : 139942585675264
is operator simply checks whether the two arguments have the same ID.
a is b, id(a) == id(b)
Out : (True, True)
Which to use when
This simplicity of the identity comparison provides its main advantage over equality. When you test equality of an object, arbitrary code could be running, and an arbitrary result could be returned. Identity will do a simple comparison and return a Boolean.
That said, equality will generally be the preferred operation. In most cases, you will care about the value of an object, not its identity. If you want to test if a user entered a particular string, you care about the value of that string, not the details of how the Python interpreter is storing it.
There are two primary cases where identity checking is preferred:
- To understand if the same object shows up with different names in different places. This is relatively rare, but sometimes you do need to know this.
- To compare a value to a flag value. In Python, this is most likely to be the value
None(although others exist and modules can create their own). These are used to indicate some condition. The actual value is unimportant; all information is simply in the value being used. Therefore, it’s generally preferred to test
value is Noneover
value == None. Odd data types could define that equality to be true, but the identity check will be false. (This is made possible by the fact that
Noneis a singleton—there is a single
Noneobject in the Python interpreter.)
Hopefully, this all seems relatively straightforward. If so, please read the follow-up article where we discuss a number of confusing subtleties about how this all works. Stay tuned for part II!
More about the author
Robert studied squishy physics in Chicago, Amherst, and Santiago, Chile, before uniting his love of computers, teaching, and making pretty graphs at The Data Incubator. In his free time, he plays tuba and right field, usually not simultaneously.