Identity vs Equality

Python’s Measure of Sameness

By Robert Schroll, TDI Data Scientist in Residence and Instructor for our Data Science Fellowship

It may be a bit surprising to those coming to the language, but Python has two ways of measuring “sameness”. There is equality (a == b) and identity (a is b), and new Pythonistas are often confused about which one to use. One way that I like to think of it is that equality is defined by Python objects, while identity is defined by Python itself.

Equality

Equality means whatever a Python object wants it to mean. For some things, this is obvious:

In [1]:

				
					1 == 1
				
			
Out [1]: True

For others, it may not be so obvious. For example, Python lists are equal if they contain equal elements.

In [2]:

				
					[1, 2, 3] == [1, 2, 3]
				
			
Out [2]: True

Equality can extend across types. Here, we can see that an integer and a floating point value can be equal.

In [3]:

				
					1 == 1.0
				
			
Out [3]: True

Equality doesn’t have to return a Boolean value. For example, NumPy arrays define equality to work element-wise:

In [4]:

				
					import numpy as np
np.array([1,2,3]) == np.array([1,2,4])
				
			

Out [4]: array([ True, True, False])

This can produce results that may seem strange. For example, IEEE floating point numbers are defined such that not-a-number (NaN) doesn’t equal anything, not even itself.

In [5]:

				
					np.nan == np.nan
				
			

Out [5]: False

You can get in on the fun, too. Any class can define a __eq__ method in order to define what equality means for it. For example:

In [6]:

				
					class NotEqualTo:
    
    def __init__(self, value):
        self.value = value
        
    def __eq__(self, other):
        return self.value != other

not_three = NotEqualTo(3)

not_three == 1, not_three == 2, not_three == 3
				
			

Out [6]: (True, True, False)

We now have an object that is equal to everything other than 3.

Identity.

In contrast to the flexibility provided by equality, Python’s identity check is controlled entirely by the Python interpreter. The is operator checks whether two variables refer to the same object. Thus: 

In [7]:

				
					a = b = [1, 2, 3]
a is b
				
			

Out [7]: True

a and b both refer to the same underlying object, so a change to one will show up in the other.

In [8]:

				
					
a.append(4)
b

				
			

Out [8]:  [1, 2, 3, 4]

However, another list with the same elements is another object, and therefore is not the same in identity.

In [9]:

				
					c = [1, 2, 3, 4]
a == c, a is c
				
			

Out [9]: (True, False)

After all, as different objects, modifications to one will not affect the other.

In [10]:

				
					a.append(5)
c
				
			

Out [10]: [1, 2, 3, 4]

As a result, they are no longer equal:

In [11]:

				
					a == c
				
			

Out [11]: False

This provides another way to think about the difference between equality and identity in Python. Equality is provisional: two objects may be equal now but unequal in the future. Identity is permanent and will never change.

(That said, don’t be confused by the fact that a given name in Python can be redirected to refer to another object. If we do b = 0, then a is b will yield False, simply because b now points to the number 0 instead of the list. The identity of the list has not changed.)

How does Python determine identity? Every object managed by the Python interpreter is given an ID. This can be looked up with the id function.

In [12]:

				
					id(a)

				
			

Out [12]: 139942585675264

This is operator simply checks whether the two arguments have the same ID.

In [12]:

				
					a is b, id(a) == id(b)
				
			

Out [13]: (True, True)

Which to use when

This simplicity of the identity comparison provides its main advantage over equality. When you test equality of an object, arbitrary code could be running, and an arbitrary result could be returned. Identity will do a simple comparison and return a Boolean.

That said, equality will generally be the preferred operation. In most cases, you will care about the value of an object, not its identity. If you want to test if a user entered a particular string, you care about the value of that string, not the details of how the Python interpreter is storing it.

There are two primary cases where identity checking is preferred:

  1. To understand if the same object shows up with different names in different places. This is relatively rare, but sometimes you do need to know this.
  2. To compare a value to a flag value. In Python, this is most likely to be the value None (although others exist and modules can create their own). These are used to indicate some condition. The actual value is unimportant; all information is simply in the value being used. Therefore, it’s generally preferred to test value is None over value == None. Odd data types could define that equality to be true, but the identity check will be false. (This is made possible by the fact that None is a singleton—there is a single None object in the Python interpreter.)

 

Hopefully, this all seems relatively straightforward. If so, please read the follow-up article where we discuss a number of confusing subtleties about how this all works. Stay tuned for part II!

More about the author

Robert-Schroll-headshot

Robert Schroll

Robert studied squishy physics in Chicago, Amherst, and Santiago, Chile, before uniting his love of computers, teaching, and making pretty graphs at The Data Incubator. In his free time, he plays tuba and right field, usually not simultaneously.

View Robert’s Resume

Related Blog Posts

Data-Science-Man-with-Code-scaled

The Shape of Code

The shape of code, or what we can learn from indentation. As a TDI data scientist in residence, I have learned to judge code quality at a quick glance by looking at indentation. The rule of thumb is: good code has frequent changes in indentation, but should not be deeply indented.

Read More »
woman in data engineering

How to Write an AI / Data Science Job Recommendation

The increasing demand for artificial Intelligence (AI) and data science experts, driven in part by the COVID-19 economic crisis, is showing no sign of abating. Many employers are failing to identify viable job candidates, much less interviewing or hiring them. What’s the biggest obstacle holding them back? In our experience, it is often a poor job posting.

Read More »