Python Generators and Comprehension¶
Digging into generators and comprehension - from basics to to implementation in a comprehensive tutorial. This is a walkthrough for beginners that will build up to real world examples.
Note
This is an in-progress draft.
import numpy as np
import string
Here we will build a dictionary of items that we can use for examples. Let’s make it keyed on alphabetical characters and random integers.
The algorithm for this will be like:
Do something 20 times so we can have 20 items as (key, value). For each of the 20 iterations, choose a random lowercase alphabetical character as string and a positive integer up to 100.
alpha = list(string.ascii_lowercase)
collection = {
np.random.choice(alpha): np.random.randint(100)
for _ in range(20)
}
collection
{'i': 15,
'u': 14,
'h': 74,
'x': 93,
'r': 0,
'd': 64,
'v': 31,
'k': 17,
'm': 93,
'p': 18,
'l': 80,
'o': 31,
'c': 48,
'q': 45,
'b': 55,
's': 40,
't': 53}
We did this through a method called dictionary comprehension where we
build a dictionary object on the fly. You can spot these comprehension
methods (dictionaries or lists) by iteration code within {}
or
[]
for dictionary or list comprehension, respectively.
Building our dictionary collection, we iterate over range(20)
so we
will have 20 key:value pairs. Since we are not using the number yielded
from the range
function, we use the python internal reference
variable name _
to indicate to the user that we are not utilizing
this variable; our range is simply letting us do something 20 times.
For each iteration generated by iterating over range
, we randomly
sample alpha using numpy.random.choice
. alpha
contains the
english language alphabet in lower case by way of
string.ascii_lowercase
and we call list
on this because
string.ascii_lowercase
is a generator. We must have the complete
list to sample with numpy otherwise we would not know our choices if
only observing one object yielded to us.
Next, our value is assigned by randomly selecting an integer up to 100
with numpy.random.randint
.
In order to tell our comprehension these are key:value pairs, our key
(numpy.random.choice(alpha)
) is assigned first, followed by :
,
then our value (numpy.random.randint(100)
). This gives us our
key: value
. These will all be collected within {}
and assigned
to the variable collection
, which we can call and the __repr__
function of our collection
object (a dict
class) will return a
string representation.
Now let’s demonstrate a few things we can do with this collection
dictionary.
from collections import Counter
value_counts = Counter(collection.values())
value_counts.most_common()
[(93, 2),
(31, 2),
(15, 1),
(14, 1),
(74, 1),
(0, 1),
(64, 1),
(17, 1),
(18, 1),
(80, 1),
(48, 1),
(45, 1),
(55, 1),
(40, 1),
(53, 1)]
collections.Counter
allows us to feed it an array of data and have
it tabulate occurrences. We call the most_common
function to sort
the counts descending by occurrence. We could have also called
collections.Counter.most_common(5)
to get the top 5, for example.
This is really doing something like the following.
value_counts = {}
for val in collection.values():
value_counts[val] = value_counts.get(val, 0) + 1
sorted(value_counts.items(), key=lambda count: count[1], reverse=True)
[(93, 2),
(31, 2),
(15, 1),
(14, 1),
(74, 1),
(0, 1),
(64, 1),
(17, 1),
(18, 1),
(80, 1),
(48, 1),
(45, 1),
(55, 1),
(40, 1),
(53, 1)]
First we set an empty dictionary object to which we will tabulate our
value occurrences. We then iterate our collection values through the
collection.values()
generator. For each value, we will assign
value_counts
a key of that value and increment it’s value by 1 for
each observation. To do this, we get the current value, val
, by
calling value_counts.get(val)
. But if this does not yet exist, we
get an error. So we use a default value of zero by calling this function
like value_counts.get(val, 0)
. Then we can take the actual value, or
default starting point of 0 and add 1 for this observation.
Next, to get things sorted like collections.Counter.most_common
we
will sort our list using sorted
. We iterate over key:value pairs,
and we tell it that the sorting key is value where count
represents
the tuple(key, value)
and we use the value by setting the key as
count[1]
which is the value position of our iteration tuple.
lambda
is letting is call in inline function and we could do any
sort of operation. Maybe this value is an error and we need to square
it: lambda count: count[1] ** 2
. However, we would be better doing
that in a seperate operation since it obfuscates from the user that we
are sorting on error and not the original value. Finally,
reverse=True
tells sorted
we want max -> min.
So we have taken our random collection of alphabetical keys and
generated the most common integer value occurrences. We could have
easily done the same for the alphabetical keys by calling
collections.Counter(collection.keys()).most_common()
.
Now let’s do some conditional selection. We will first find all keys whose value is gt 40.
gt40keys = [
k for (k, v) in collection.items()
if v > 40
]
gt40keys
['h', 'x', 'd', 'm', 'l', 'c', 'q', 'b', 't']
That used list comprehension (the list method of how we generated our collection by way of dictionary comprehension) to iterate through key: value pairs of collection and collect the key if the value is gt 40. The result is a list for keys.
We could have returned the full key: value pair with
(k, v) for (k, v) in collection.items() ...
or we could have made a
new dictionary of only those key: value pairs matching that condition.
Let’s do that because it is a common routine.
gt40collection = {
k: v for (k, v) in collection.items()
if v > 40
}
gt40collection
{'h': 74,
'x': 93,
'd': 64,
'm': 93,
'l': 80,
'c': 48,
'q': 45,
'b': 55,
't': 53}
So this looks just like our previous list comprehension, but to generate
a dictionary we use {}
to make it a dictionary comprehension, we
assign key k
the value v
by k: v
as we normally do but only
if the value is gt 40. The result is a dictionary that is a subset of
collection
with only entries matching our criteria.
But what if we want to search for a value. Let’s say we want create a
new dictionary that is a subset of collection
where the values equal
93. We will do this just like above.
equals93collection = {
k: v for (k, v) in collection.items()
if v == 93
}
equals93collection
{'x': 93, 'm': 93}
So we have 2 key: value pairs where the value met the condition we set. The result is a dictionary.
Before we go on, as a bit of an aside, what if we simply wanted to double check how many values are equal to 93. Nothing more, nothing less. We can combine a few of the methods we’ve highlight to get something like the following.
number_values_equal_to_93 = sum(
1 for v in collection.values()
if v == 93
)
number_values_equal_to_93
2
This might look slightly different, but it’s really all the same as we
have done. Let’s start inside and work out. first we are iterating over
the values of collection
and since we only need the values and new
key: value pairs, we use collection.values
(and if we were looking
at keys we would use collection.keys
, but we would never be counting
key occurrences with an equivalency because remember, we cannot have two
of the same key in a dictionary). We are testing if the value equals 93
and accumulating an integer 1 rather than the value. We use 1 because we
want to count the occurrences, not operate on them by summing the
values. Finally, we count up these ones with sum
.
Okay, I just wanted to highlight searching with comprehension and how we can use sum and specific values. Let’s move on.
While this was a “search” for values that equaled 93, we simple iterated over the entire collection to test each and every value. While this is common, another more “search”-like operation is to identify key: value entries where we now a condition is met, and we know how many exist. This could be finding the single user who’s hash equals a specific value.
Let’s first use our collection to find the two keys whose value equals 93, and since we know there exists only two entries, let’s do this with a generator so that we can stop as soon as we have found our two data points. There’s no reason we should search further than that; if 93 occurs at position 0 and 1, why should we be searching through the remainder of collection?
found = []
while len(found) < 2:
for (k, v) in collection.items():
if v == 93:
found.append(k)
found
['x', 'm']
As you see, this is just like the previous method where we accumulated key: value pairs where the value equaled 93. The syntax is different, but the operation is very similar. The difference here is that we stop iterating over the items as soon as we find our known matches of 2.
However, there are some issues here if we were working with much larger
data objects. When we call collection.items()
we generate the entire
list of key: value pairs for collection
. This puts all of those in
memory upon creation, and then we starting iterating over them. While
it’s better in that we stop iterating as soon as our 2 are found, we
still generate the entirety of collection
’s key: value pairings.
Let’s make this better by only creating one key: value pair, check it, then move on if we need to do so.
found = []
for (i, (k, v)) in enumerate(collection.items()):
if v == 93:
print(f'Found a 93 on loop {i=} for {k=}')
found.append(k)
if len(found) == 2:
break
else:
continue
print(f'collection is {len(collection)=} and we searched through {i=} key: value pairs')
found
Found a 93 on loop i=3 for k='x'
Found a 93 on loop i=8 for k='m'
collection is len(collection)=17 and we searched through i=8 key: value pairs
['x', 'm']
So you can see this almost identical to our last attempt, but we have
created a generator using enumerate
over our function. We then
iterate this generator as it yields an integer telling us the iteration
loop number (like a counter) and the tuple(key, value)
. So we assign
each yield as (i, (k, v)), check if the value v
equals 93, and if
so, we can print out which loop count we are at, the key k
it was
associated with, and then append the key to our found list. If we have
found 2, as len(found) == 2
, then we break
out of our loop,
otherwise we continue
to the next iterable.
You can then see that of our collection of length 17, we only searched
through 9 pairs (enumerate
started at 0 and ended at i=8 so that is
8 + 1 pairs). Pretty cool, especially if we are searching in the
billions, or our search criteria is a lot more complex than an identify
test. What if we were searching for the key were some costly function
f(x)
which took several {seconds/minutes/MBs/GBs} to compute yielded
the resultant value v
of our tuple(k, v)
pair?
Another important feature is that we are storing in ram the current
iterable, we are not inherently accumulating everything we have already
iterated over. We are looking at things one at a time and moving on.
It’s like we are at a market and searching for a good fruit. We sample
many fruits by taking one, looking it over, and either keeping it
because it matches our criteria for ripeness, or we put it back. We do
not stash in our arms every fruit we sample and then only after
exhausting the entire bin of fruits do we put them down, keeping our
selections. For shopping the market, iterating by yield
ing one
fruit at a time, our arms (and the farmer) are thankful. For our
programming, our RAM capacity is thankful.
We see this commonly with operating on file objects where we readily can encounter files >> size of our usable memory. This also extremely common with large arrays and networks when using a GPU.
Let’s make a quick generator just to see what’s really different here. First, though, let’s make a version like our list comprehension to see what that looks like, and then do the same thing with a generator to compare.
def list_building(n=10):
"""Generate a list of size 10"""
created_array = []
for i in range(n):
created_array.append(i)
return created_array
def generator_list_building(n=10):
for i in range(n):
yield i
complete_list = list_building(10)
print(f'{complete_list=} is {type(complete_list)=}')
iterable_list = generator_list_building(10)
print(f'{iterable_list=} is {type(iterable_list)=}')
complete_list=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] is type(complete_list)=<class 'list'>
iterable_list=<generator object generator_list_building at 0x7f8f315e9f90> is type(iterable_list)=<class 'generator'>
As you can see both by method, but also by inspection, the first list
building routine creates, then returns the entire array of 10 integers.
However, the generator method creates an in-memory generator which can
later by iterated over. You can see the difference by printing out the
result. complete_list
contains a complete, 10 integer array.
However, iterable_list
has no data and is instead a generator
object.
But how do we get something out of the generator? Let’s say for our list of objects, we wanted an array of squared values.
squared_list = [
n ** 2 for n in complete_list
]
print(squared_list)
squared_generator_list = [
n ** 2 for n in iterable_list
]
print(squared_generator_list)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Both resulted in our list of squared original values. However, with the
first method what do we have in memory? We have the original list
complete_list
which is of length 10. We now also have a squared list
squared_list
also of length 10. These are small, but what if we
simply wanted our squared list and it was to be a billion integers?
Further, do we need the original if we just want squares. We could use
del complete_list
but we still had both complete lists in memory at
some point for some duration.
The second, though started only with a generator. When we were done, we
only had a resultant list and an empty generator. That’s because we told
it what it will yield
, we then iterated it having it yield
a
particular integer, we operated on that iterable and accumulated it, and
we then threw away our yielded iterable before moving on. So as we
iterate over the generator, we yield and then move on. Just like our
fruit inspection at the market.
The great thing is, our squaring operation looks the same! Nice and simple, right? But our functions were a bit crude — can we write them in our comprehension style? Can we make a comprehension generator? Yes!
comprehension_based_complete_list = [
i for i in range(10)
]
print(f'{comprehension_based_complete_list=} is {type(comprehension_based_complete_list)=}')
comprehension_based_iterable_list = (
i for i in range(10)
)
print(f'{comprehension_based_iterable_list=} is {type(comprehension_based_iterable_list)=}')
comprehension_based_complete_list=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] is type(comprehension_based_complete_list)=<class 'list'>
comprehension_based_iterable_list=<generator object <genexpr> at 0x7f8f308f4890> is type(comprehension_based_iterable_list)=<class 'generator'>
Okay, there we have it. The same functions we already did but with list
and generator comprehension. Remember, this is done through the subtle
difference in using either []
for a list or ()
for a generator
around the comprehension logic.
Now, this is not the best example. That’s because in Python 3, range is
already a generator. So these would simply be list(range(10))
and
range(10)
, respectively. But you should be able to see that what we
are doing to make the data could be anything. Finally, as an exercise,
let’s just combine everything in the last few functions to make a
squared value list and generator. This should be a better example than
range
alone, and it will allow us to write something more succinct
than the multi-step functions above.
comprehension_based_squared_list = [
i ** 2 for i in range(10)
]
print(f'{comprehension_based_squared_list=} is {type(comprehension_based_squared_list)=}')
comprehension_based_squared_iterable = (
i ** 2 for i in range(10)
)
print(f'{comprehension_based_squared_iterable=} is {type(comprehension_based_squared_iterable)=}')
comprehension_based_squared_list=[0, 1, 4, 9, 16, 25, 36, 49, 64, 81] is type(comprehension_based_squared_list)=<class 'list'>
comprehension_based_squared_iterable=<generator object <genexpr> at 0x7f8f308f4ac0> is type(comprehension_based_squared_iterable)=<class 'generator'>
So we have a squared list and a squared generator.
Let’s get back to our “searching”. Remember way back we used
enumerate
to create a generator-based version of our data for which
we could iterate. Let’s take a look at another way we can do this.
collections_generator = iter(collection.items())
found = []
while len(found) < 2:
k, v = next(collections_generator)
if v == 93:
print(f'Found a 93 for {k=}')
found.append(k)
found
Found a 93 for k='x'
Found a 93 for k='m'
['x', 'm']
Here we created our own generate using iter
which let’s us work
through any collection of data (list, array, dict, tuple). We then use a
while
routine to iterate through the data until we have found our 2
matches collected under found
and assessed with len(found)
. We
yield and iteration of key: value by calling next
on our generator,
check if value v
is equal to 93, and if so, we append it to found.
Great, works just like before but much more concise using while
and
using iter
our own data object, we can make it adaptable. The
benefit of enumerate
is that we don’t have to yield the data
ourselves using next
and we also get the added feature of a built-in
counter as the first value in the yielded
tuple(count, iterable_item)
where our iterable_item is a key: value
pair as tuple(k, v)
since we are iterating collection.items()
.
Either works, fine and there are some pros and cons to each beyond what we just discussed. Generally it comes down to habit and I find myself using both almost equally.
Now one caveat of our generator. What if we couldn’t find our second match. Either we were not working on immuatable data and it changed since we began the search, or set our match criteria with our prior knowledge. We will hit an iteration error and break. We always want to catch such exceptions, deal with them gracefully, and move on so we do not break our code.
So how do we go about doing this here?
collections_generator = iter(collection.items())
found = []
while len(found) < 2:
try:
k, v = next(collections_generator)
if v == 53:
print(f'Found a 53 for {k=}')
found.append(k)
except StopIteration:
print('We ran out of data to search')
break
found
Found a 53 for k='t'
We ran out of data to search
['t']
To catch this issue, we nest our iterable assignment and condition check
within a try/except
routine. Here, we will try to get the next
iterable and test it. However, if we run out because there is nothing
left to yield when we call next
, iter
will
raise StopIteration
. We catch this specific (and we always want to
be specific) exception with except StopIteration
.
To ensure we had a scenario where this exception would throw, we
searched for a value where we know from our previous exercises that it
occurred only once, here 53
.
Pretty simple. But two important things to note.
We explicitly use
except StopIteration
to only catch this error. We do not want to catch any other issues that might arise as it would obfuscate any other errors since we assume running out of choices is all that could happen.More critically we must make sure that whatever we do in our
exception
handling, whether it’s simply print out an issue warning, like here, or maybe we try and do something else, we have to callbreak
.
If we do not call break
when we run out of iterables, what will
happen with our while
routine? Well, we’ll never meet the condition
len(found) < 2
and we will loop to infinite. (Okay, not really. We
will generally hit a recursion warning in most environments, but it’s
equally bad. Think about this happening in product!)
Another consideration is that we can use else
and finally
with
our try/except
routine. While I’ll leave that for another time, in
brief, it allows us to run a secondary operation if the first exception
does not through, and a way to always run a cleanup regardless if we
succeeded or not (think file object close, etc). So else
only runs
if no exception was caught with except
and finally
will always
run no matter the prior outcome.
Okay, I think we hit a lot of good stuff so far. We discussed constructing objects through comprehension building (list or dictionary), and generating some random toy data in doing so. We ran through a few different ways to summarize and understand our data using either built-ins or from scratch methods. We saw how we can “search” conditionally to find data or simply count occurrences. We then dove into using generators and iterating to optimize our searching. Finally, we briefly touched on adding some error handling.
Let’s take a look at one last method. What if we either knew our data
had only one match, for example, our user with a specific. Let’s create
a simple class object, User
and highlight to things we can do wrap
up everything we’ve gone over thus far.
class User:
def __init__(self, name, age, active=True):
self.name = name
self.age = age
self.active = active
def toggle_active(self):
self.active = not self.active
return True
def __repr__(self):
return f'<User> {self.name=} | {self.age=} | {self.active}'
Okay, we now have a User
class that represents a user with a
name
, age
, and an active
status. We also have a function
that we can call on a User
that will toggle the active status. We
also set a __repr__
function so we can get a string representation
of our users.
Let’s add some users, and we will store them in a list. So let’s do this with some list comprehension!
user_names = ['Patrick', 'Matthew', 'Linux Admin', 'Operating Doctor', 'Data Scientist']
users = [
User(name=name, age=np.random.randint(80))
for name in user_names
]
users
[<User> self.name='Patrick' | self.age=3 | True,
<User> self.name='Matthew' | self.age=45 | True,
<User> self.name='Linux Admin' | self.age=12 | True,
<User> self.name='Operating Doctor' | self.age=46 | True,
<User> self.name='Data Scientist' | self.age=74 | True]
So we used our list comprehension to build out a list of users as
users
from our preset user names array user_names
. In doing so
we assigned them a random age up to 80 with
numpy.random.randint(80)
, similar to what we did with
collection
. Also note since we did not supply an active argument, it
defaulted to True
as we set in User.__init__
. And because we
made a nice __repr__
, we get an informative string representation of
all of our users in users
.
Now let’s use what we have done already to perform a few somewhat real world operations.
I see two users whose age is under 18. Patrick and Linux Admin should not be active on our platform without additional parental consent. Let’s go ahead and take care of that. But let’s highlight a few methods to do this. I will comment out the first few methods and only run the last since we only want to toggle this once, but we want to highlight them all.
# # Pretty standard approach
for user in users:
if user.age < 18:
user.toggle_active()
# # This works, but map is a generator so to perform the mapped function we
# # have to iterate it - so for our simple purpose this is a bit terse
# list(map(lambda user: user.toggle_active(), (user for user in users if user.age < 18))
# # If we wanted to do the mapped approach, we could do it in a list comprehension
# # further, since we return True after toggling as long as there is no thrown error
# # we could add a check to make sure all requested toggling ran successfully
# assert all(user.toggle_active() for user in users if user.age < 18)
users
[<User> self.name='Patrick' | self.age=3 | False,
<User> self.name='Matthew' | self.age=45 | True,
<User> self.name='Linux Admin' | self.age=12 | False,
<User> self.name='Operating Doctor' | self.age=46 | True,
<User> self.name='Data Scientist' | self.age=74 | True]
As you can see, we simply ran a very readable loop to run
User.toggle_active()
on any user we encounter who’s age is less than
18. And we can see that, yes, it did indeed work.
I also commented out a few other approaches using more advanced, or in
some cases merely more confusing, examples. The version with an
assert
ion is pretty handy. You’d do something like this if we were
unit testing, however, you don’t usually want any assertions in your
production code. In production, though, we might say
if not all(blah): send_alert("status toggle failures")
or something
like that. And again, we can do that without inspecting the users simply
because we return True
if the code executes properly. But remember,
that’s not checking that the toggle resulted in the correct status
state, only that it actually ran without raising an exception.
Okay, so we have our users setup, and their status is no age appropriate. Let’s come back to our techniques. Let’s find users who’s age is over 30 (so similar to what we just did) but now we want to do a bit more than a simple one-time function execution.
Remember, we can do this with a list or generator. Let’s pretend that
our set of patients now should go into a new class object. Let’s also
set the caveat that the number of users we find will be >> the actual 2
in our example. We might also assume that each User
in users
also contains addition attributes where one contains health history that
might be megabytes in size. So because we have a few things we want to
do with our found users, and each found user is memory intensive and
already exists somewhere in the users
list, we do not want to copy
an entire subset of that list.
We are also going to pretend that our conditional patients can only
accumulate in our intensive care unit so much before we fill our
capacity. So because every bit of memory is important, remember all that
patient history we naively read into memory when making our User
s?
And because we can only take so many, let’s use our generator approach.
*Note: if we were really memory intensive here and speed was important, we could use indexing and other tricks. But let’s assume we are somewhere between our example of 5 and Google big data.
users_iter = iter(users)
max_capacity = 2
intensive_care_patients = []
while len(intensive_care_patients) < max_capacity:
try:
patient = next(users_iter)
if patient.age > 30:
intensive_care_patients.append(patient.name)
except StopIteration:
print(f'We still have capacity!')
intensive_care_patients
['Matthew', 'Operating Doctor']
This should all seem standard fare now. So I will leave the breakdown to you at this point.
Notice how we only used the patient name in our resultant list? Again,
we are assuming it’s incredibly expensive to hold a User
in memory
so let’s not duplicate things unless we have to do so. In reality, we
might reference a database table primary key, or maybe a unique user
hash since names are common. We also do not want people causing havoc on
the system by guessing actual data references (thus, why we normally use
hashes or other unpredictable identifiers), but I digress.
Okay, remember what I just said about copying. Well in reality, we could just reference the original object so we have a list of objects and another list of references to that object. But for our purposes, let’s just assume we are dealing with copies above where maybe we would otherwise attach additional information making the reference and the original no longer equivalent and causing us to have two objects for one user.
If we wanted to make ICU patient objects, we could have simply done so
above. Rather than appending the User.name
to a list, we could have
simply made those objects and collected them. Let’s do that quick here
just as an example.
class VulnerablePatient:
def __init__(self, patient):
self.patient = patient
def __repr__(self):
return f'<VulnerablePatient> {self.patient}'
users_iter = iter(users)
max_capacity = 2
intensive_care_patients = []
while len(intensive_care_patients) < max_capacity:
try:
patient = next(users_iter)
if patient.age > 30:
intensive_care_patients.append(
VulnerablePatient(patient)
)
except StopIteration:
print(f'We still have capacity!')
intensive_care_patients
[<VulnerablePatient> <User> self.name='Matthew' | self.age=45 | True,
<VulnerablePatient> <User> self.name='Operating Doctor' | self.age=46 | True]
Okay, so one thing we did different here is that we passed the entire
User
object as the patient to our VulnerablePatient
object. I
will also point out that it may or may not be obvious that here we rely
on the User.__repr__
to provide the actual patient information in
VulnerablePatient.__repr__
.
Here is where we can check to see if we have made two patients, one a
User
and one user User
within VulnerablePatient
or we are
referencing the same User
object and thus same memory block. Let’s
check.
check_user = next(
user for user in users
if user.age > 30
)
compare_vulnerable_patient = next(
patient for patient in intensive_care_patients
if patient.patient.name == check_user.name
)
print(check_user, compare_vulnerable_patient)
<User> self.name='Matthew' | self.age=45 | True <VulnerablePatient> <User> self.name='Matthew' | self.age=45 | True
Okay two things here. First, the simplest, jump down to the last line of
code and the output after. We can see that the user we grabbed as
check_user
that met our condition for which we made ICU patients is
properly found and is the same VulnerablePatient
. How we condition
that should be familiar now if it wasn’t before we started.
The second thing to note is that we combine a few of our methods, finally, into a nice and concise example.
We first want to get a user that we know would be a
VulnerablePatient
by conditioning the same way, here, age > 30. We
do this in a generator because we now love them, but more imporantly we
want to just get the first user that meets these conditions. We do not
want to generate a full list of condition matching users because,
remember, they are expensive, and we only need one and really fast. So
to do this, we use a generator that will yield matching conditions, and
we use next
one time so we get the first iteration that is yielded.
You will see we do this in a shortcut by using a generator comprehension
that is wrapped in next
. We could have also done this like
(generator comprehension).next()
, but that’s not as clean, nor is it
necessary in modern Python.
You’ll then see we do the same with our ICU patients list, but here we
condition on the name being the same as our single check_user
we
just found. So we search for a single match (even though we know there
are more) and then again, we search for a single match (but here we know
there should be only one - yet our syntax is the same for our purposes).
You might note, we did not break this out to catch a StopIteration
because we know the data exists and each will successfully return. (We
say that all the time in production and then things blow up, don’t
they?)
Okay, back to it. We now have a User
and a VulnerablePatient
that should be the same person. But is this a copy or a reference to one
memory block? Let’s check.
print(f'{id(check_user)=}, {id(compare_vulnerable_patient)=}, and {id(compare_vulnerable_patient.patient)=})', end='\n\n')
def assert_equivalency(obj1, obj2):
"""Assert the equivalency of two objects"""
try:
assert id(obj1) == id(obj2)
except AssertionError:
print(f'{obj1=} and {obj2=} are not equivalent', end='\n\n')
else:
print(f'{obj1=} and {obj2=} are equivalent', end='\n\n')
assert_equivalency(check_user, compare_vulnerable_patient)
assert_equivalency(check_user, compare_vulnerable_patient.patient)
id(check_user)=140252981884720, id(compare_vulnerable_patient)=140252983924240, and id(compare_vulnerable_patient.patient)=140252981884720)
obj1=<User> self.name='Matthew' | self.age=45 | True and obj2=<VulnerablePatient> <User> self.name='Matthew' | self.age=45 | True are not equivalent
obj1=<User> self.name='Matthew' | self.age=45 | True and obj2=<User> self.name='Matthew' | self.age=45 | True are equivalent
So a few ways to look at it. The first we simply print the result of the
id
function which will tell us the object’s unique identifier in
memory. The second way is to assert
the equivalency of the id
.
And the third was is to directly check equivalency by comparing.
As you can see, all three methods work and give us the same result.
While the User
and the VulnerableUser
are not the same object,
which is what we expect, the VulnerableUser.patient
and the User
are the same. So we did not create a second in-memory patient when we
put them in a VulnerableUser
object.
Okay, this last part was a bit of a tangent from what we were going
through up to this point. However, we did a nice culmination of our
methods in getting check_user
and compare_vulnerable_patient
,
and then hopefully the explanation of in-memory and referencing, and how
we can check these things, proves to be helpful.
I think that should do it for generators and generator comprehensions, and when we might want to use them over lists and list comprehensions.