.. Python Generators and Comprehension .. post:: Nov 13, 2020 :tags: atag :author: Matthew Martz Python Generators and Comprehension ======================================== Digging into generators and comprehension - from basics to to implementation in a comprehensive tutorial. This is a walkthrough for beginners that will build up to real world examples. .. panels:: :column: col-lg-12 p-0 :header: font-weight-bold bg-info Note ^^^^^^^^^^^^^^ This is an in-progress draft. .. code:: python import numpy as np import string Here we will build a dictionary of items that we can use for examples. Let's make it keyed on alphabetical characters and random integers. The algorithm for this will be like: Do something 20 times so we can have 20 items as (key, value). For each of the 20 iterations, choose a random lowercase alphabetical character as string and a positive integer up to 100. .. code:: python alpha = list(string.ascii_lowercase) collection = { np.random.choice(alpha): np.random.randint(100) for _ in range(20) } collection .. parsed-literal:: {'i': 15, 'u': 14, 'h': 74, 'x': 93, 'r': 0, 'd': 64, 'v': 31, 'k': 17, 'm': 93, 'p': 18, 'l': 80, 'o': 31, 'c': 48, 'q': 45, 'b': 55, 's': 40, 't': 53} We did this through a method called dictionary comprehension where we build a dictionary object on the fly. You can spot these comprehension methods (dictionaries or lists) by iteration code within ``{}`` or ``[]`` for dictionary or list comprehension, respectively. Building our dictionary collection, we iterate over ``range(20)`` so we will have 20 key:value pairs. Since we are not using the number yielded from the ``range`` function, we use the python internal reference variable name ``_`` to indicate to the user that we are not utilizing this variable; our range is simply letting us do something 20 times. For each iteration generated by iterating over ``range``, we randomly sample alpha using ``numpy.random.choice``. ``alpha`` contains the english language alphabet in lower case by way of ``string.ascii_lowercase`` and we call ``list`` on this because ``string.ascii_lowercase`` is a generator. We must have the complete list to sample with numpy otherwise we would not know our choices if only observing one object yielded to us. Next, our value is assigned by randomly selecting an integer up to 100 with ``numpy.random.randint``. In order to tell our comprehension these are key:value pairs, our key (``numpy.random.choice(alpha)``) is assigned first, followed by ``:``, then our value (``numpy.random.randint(100)``). This gives us our ``key: value``. These will all be collected within ``{}`` and assigned to the variable ``collection``, which we can call and the ``__repr__`` function of our ``collection`` object (a ``dict`` class) will return a string representation. Now let's demonstrate a few things we can do with this ``collection`` dictionary. .. code:: python from collections import Counter value_counts = Counter(collection.values()) value_counts.most_common() .. parsed-literal:: [(93, 2), (31, 2), (15, 1), (14, 1), (74, 1), (0, 1), (64, 1), (17, 1), (18, 1), (80, 1), (48, 1), (45, 1), (55, 1), (40, 1), (53, 1)] ``collections.Counter`` allows us to feed it an array of data and have it tabulate occurrences. We call the ``most_common`` function to sort the counts descending by occurrence. We could have also called ``collections.Counter.most_common(5)`` to get the top 5, for example. This is really doing something like the following. .. code:: python value_counts = {} for val in collection.values(): value_counts[val] = value_counts.get(val, 0) + 1 sorted(value_counts.items(), key=lambda count: count[1], reverse=True) .. parsed-literal:: [(93, 2), (31, 2), (15, 1), (14, 1), (74, 1), (0, 1), (64, 1), (17, 1), (18, 1), (80, 1), (48, 1), (45, 1), (55, 1), (40, 1), (53, 1)] First we set an empty dictionary object to which we will tabulate our value occurrences. We then iterate our collection values through the ``collection.values()`` generator. For each value, we will assign ``value_counts`` a key of that value and increment it's value by 1 for each observation. To do this, we get the current value, ``val``, by calling ``value_counts.get(val)``. But if this does not yet exist, we get an error. So we use a default value of zero by calling this function like ``value_counts.get(val, 0)``. Then we can take the actual value, or default starting point of 0 and add 1 for this observation. Next, to get things sorted like ``collections.Counter.most_common`` we will sort our list using ``sorted``. We iterate over key:value pairs, and we tell it that the sorting key is value where ``count`` represents the ``tuple(key, value)`` and we use the value by setting the key as ``count[1]`` which is the value position of our iteration tuple. ``lambda`` is letting is call in inline function and we could do any sort of operation. Maybe this value is an error and we need to square it: ``lambda count: count[1] ** 2``. However, we would be better doing that in a seperate operation since it obfuscates from the user that we are sorting on error and not the original value. Finally, ``reverse=True`` tells ``sorted`` we want max -> min. So we have taken our random collection of alphabetical keys and generated the most common integer value occurrences. We could have easily done the same for the alphabetical keys by calling ``collections.Counter(collection.keys()).most_common()``. Now let's do some conditional selection. We will first find all keys whose value is gt 40. .. code:: python gt40keys = [ k for (k, v) in collection.items() if v > 40 ] gt40keys .. parsed-literal:: ['h', 'x', 'd', 'm', 'l', 'c', 'q', 'b', 't'] That used list comprehension (the list method of how we generated our collection by way of dictionary comprehension) to iterate through key: value pairs of collection and collect the key if the value is gt 40. The result is a list for keys. We could have returned the full key: value pair with ``(k, v) for (k, v) in collection.items() ...`` or we could have made a new dictionary of only those key: value pairs matching that condition. Let's do that because it is a common routine. .. code:: python gt40collection = { k: v for (k, v) in collection.items() if v > 40 } gt40collection .. parsed-literal:: {'h': 74, 'x': 93, 'd': 64, 'm': 93, 'l': 80, 'c': 48, 'q': 45, 'b': 55, 't': 53} So this looks just like our previous list comprehension, but to generate a dictionary we use ``{}`` to make it a dictionary comprehension, we assign key ``k`` the value ``v`` by ``k: v`` as we normally do but only if the value is gt 40. The result is a dictionary that is a subset of ``collection`` with only entries matching our criteria. But what if we want to search for a value. Let's say we want create a new dictionary that is a subset of ``collection`` where the values equal 93. We will do this just like above. .. code:: python equals93collection = { k: v for (k, v) in collection.items() if v == 93 } equals93collection .. parsed-literal:: {'x': 93, 'm': 93} So we have 2 key: value pairs where the value met the condition we set. The result is a dictionary. Before we go on, as a bit of an aside, what if we simply wanted to double check how many values are equal to 93. Nothing more, nothing less. We can combine a few of the methods we've highlight to get something like the following. .. code:: python number_values_equal_to_93 = sum( 1 for v in collection.values() if v == 93 ) number_values_equal_to_93 .. parsed-literal:: 2 This might look slightly different, but it's really all the same as we have done. Let's start inside and work out. first we are iterating over the values of ``collection`` and since we only need the values and new key: value pairs, we use ``collection.values`` (and if we were looking at keys we would use ``collection.keys``, but we would never be counting key occurrences with an equivalency because remember, we cannot have two of the same key in a dictionary). We are testing if the value equals 93 and accumulating an integer 1 rather than the value. We use 1 because we want to count the occurrences, not operate on them by summing the values. Finally, we count up these ones with ``sum``. Okay, I just wanted to highlight searching with comprehension and how we can use sum and specific values. Let's move on. While this was a "search" for values that equaled 93, we simple iterated over the entire collection to test each and every value. While this is common, another more "search"-like operation is to identify key: value entries where we now a condition is met, and we know how many exist. This could be finding the single user who's hash equals a specific value. Let's first use our collection to find the two keys whose value equals 93, and since we know there exists only two entries, let's do this with a generator so that we can stop as soon as we have found our two data points. There's no reason we should search further than that; if 93 occurs at position 0 and 1, why should we be searching through the remainder of collection? .. code:: python found = [] while len(found) < 2: for (k, v) in collection.items(): if v == 93: found.append(k) found .. parsed-literal:: ['x', 'm'] As you see, this is just like the previous method where we accumulated key: value pairs where the value equaled 93. The syntax is different, but the operation is very similar. The difference here is that we stop iterating over the items as soon as we find our known matches of 2. However, there are some issues here if we were working with much larger data objects. When we call ``collection.items()`` we generate the entire list of key: value pairs for ``collection``. This puts all of those in memory upon creation, and *then* we starting iterating over them. While it's better in that we stop iterating as soon as our 2 are found, we still generate the entirety of ``collection``'s key: value pairings. Let's make this better by only creating one key: value pair, check it, then move on if we need to do so. .. code:: python found = [] for (i, (k, v)) in enumerate(collection.items()): if v == 93: print(f'Found a 93 on loop {i=} for {k=}') found.append(k) if len(found) == 2: break else: continue print(f'collection is {len(collection)=} and we searched through {i=} key: value pairs') found .. parsed-literal:: Found a 93 on loop i=3 for k='x' Found a 93 on loop i=8 for k='m' collection is len(collection)=17 and we searched through i=8 key: value pairs .. parsed-literal:: ['x', 'm'] So you can see this almost identical to our last attempt, but we have created a generator using ``enumerate`` over our function. We then iterate this generator as it yields an integer telling us the iteration loop number (like a counter) and the ``tuple(key, value)``. So we assign each yield as (i, (k, v)), check if the value ``v`` equals 93, and if so, we can print out which loop count we are at, the key ``k`` it was associated with, and then append the key to our found list. If we have found 2, as ``len(found) == 2``, then we ``break`` out of our loop, otherwise we ``continue`` to the next iterable. You can then see that of our collection of length 17, we only searched through 9 pairs (``enumerate`` started at 0 and ended at i=8 so that is 8 + 1 pairs). Pretty cool, especially if we are searching in the billions, or our search criteria is a lot more complex than an identify test. What if we were searching for the key were some costly function ``f(x)`` which took several {seconds/minutes/MBs/GBs} to compute yielded the resultant value ``v`` of our ``tuple(k, v)`` pair? Another important feature is that we are storing in ram the current iterable, we are not inherently accumulating everything we have already iterated over. We are looking at things one at a time and moving on. It's like we are at a market and searching for a good fruit. We sample many fruits by taking one, looking it over, and either keeping it because it matches our criteria for ripeness, or we put it back. We do not stash in our arms every fruit we sample and then only after exhausting the entire bin of fruits do we put them down, keeping our selections. For shopping the market, iterating by ``yield``\ ing one fruit at a time, our arms (and the farmer) are thankful. For our programming, our RAM capacity is thankful. We see this commonly with operating on file objects where we readily can encounter files >> size of our usable memory. This also extremely common with large arrays and networks when using a GPU. Let's make a quick generator just to see what's really different here. First, though, let's make a version like our list comprehension to see what that looks like, and then do the same thing with a generator to compare. .. code:: python def list_building(n=10): """Generate a list of size 10""" created_array = [] for i in range(n): created_array.append(i) return created_array def generator_list_building(n=10): for i in range(n): yield i complete_list = list_building(10) print(f'{complete_list=} is {type(complete_list)=}') iterable_list = generator_list_building(10) print(f'{iterable_list=} is {type(iterable_list)=}') .. parsed-literal:: complete_list=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] is type(complete_list)= iterable_list= is type(iterable_list)= As you can see both by method, but also by inspection, the first list building routine creates, then returns the entire array of 10 integers. However, the generator method creates an in-memory generator which can later by iterated over. You can see the difference by printing out the result. ``complete_list`` contains a complete, 10 integer array. However, ``iterable_list`` has no data and is instead a generator object. But how do we get something out of the generator? Let's say for our list of objects, we wanted an array of squared values. .. code:: python squared_list = [ n ** 2 for n in complete_list ] print(squared_list) squared_generator_list = [ n ** 2 for n in iterable_list ] print(squared_generator_list) .. parsed-literal:: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] Both resulted in our list of squared original values. However, with the first method what do we have in memory? We have the original list ``complete_list`` which is of length 10. We now also have a squared list ``squared_list`` also of length 10. These are small, but what if we simply wanted our squared list and it was to be a billion integers? Further, do we need the original if we just want squares. We could use ``del complete_list`` but we still had both complete lists in memory at some point for some duration. The second, though started only with a generator. When we were done, we only had a resultant list and an empty generator. That's because we told it what it *will* ``yield``, we then iterated it having it ``yield`` a particular integer, we operated on that iterable and accumulated it, and we then threw away our yielded iterable before moving on. So as we iterate over the generator, we yield and then move on. Just like our fruit inspection at the market. The great thing is, our squaring operation looks the same! Nice and simple, right? But our functions were a bit crude --- can we write them in our comprehension style? Can we make a comprehension generator? Yes! .. code:: python comprehension_based_complete_list = [ i for i in range(10) ] print(f'{comprehension_based_complete_list=} is {type(comprehension_based_complete_list)=}') comprehension_based_iterable_list = ( i for i in range(10) ) print(f'{comprehension_based_iterable_list=} is {type(comprehension_based_iterable_list)=}') .. parsed-literal:: comprehension_based_complete_list=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] is type(comprehension_based_complete_list)= comprehension_based_iterable_list= at 0x7f8f308f4890> is type(comprehension_based_iterable_list)= Okay, there we have it. The same functions we already did but with list and generator comprehension. Remember, this is done through the subtle difference in using either ``[]`` for a list or ``()`` for a generator around the comprehension logic. Now, this is not the best example. That's because in Python 3, range is already a generator. So these would simply be ``list(range(10))`` and ``range(10)``, respectively. But you should be able to see that what we are doing to make the data could be anything. Finally, as an exercise, let's just combine everything in the last few functions to make a squared value list and generator. This should be a better example than ``range`` alone, and it will allow us to write something more succinct than the multi-step functions above. .. code:: python comprehension_based_squared_list = [ i ** 2 for i in range(10) ] print(f'{comprehension_based_squared_list=} is {type(comprehension_based_squared_list)=}') comprehension_based_squared_iterable = ( i ** 2 for i in range(10) ) print(f'{comprehension_based_squared_iterable=} is {type(comprehension_based_squared_iterable)=}') .. parsed-literal:: comprehension_based_squared_list=[0, 1, 4, 9, 16, 25, 36, 49, 64, 81] is type(comprehension_based_squared_list)= comprehension_based_squared_iterable= at 0x7f8f308f4ac0> is type(comprehension_based_squared_iterable)= So we have a squared list and a squared generator. Let's get back to our "searching". Remember way back we used ``enumerate`` to create a generator-based version of our data for which we could iterate. Let's take a look at another way we can do this. .. code:: python collections_generator = iter(collection.items()) found = [] while len(found) < 2: k, v = next(collections_generator) if v == 93: print(f'Found a 93 for {k=}') found.append(k) found .. parsed-literal:: Found a 93 for k='x' Found a 93 for k='m' .. parsed-literal:: ['x', 'm'] Here we created our own generate using ``iter`` which let's us work through any collection of data (list, array, dict, tuple). We then use a ``while`` routine to iterate through the data until we have found our 2 matches collected under ``found`` and assessed with ``len(found)``. We yield and iteration of key: value by calling ``next`` on our generator, check if value ``v`` is equal to 93, and if so, we append it to found. Great, works just like before but much more concise using ``while`` and using ``iter`` our own data object, we can make it adaptable. The benefit of ``enumerate`` is that we don't have to yield the data ourselves using ``next`` and we also get the added feature of a built-in counter as the first value in the yielded ``tuple(count, iterable_item)`` where our iterable\_item is a key: value pair as ``tuple(k, v)`` since we are iterating ``collection.items()``. Either works, fine and there are some pros and cons to each beyond what we just discussed. Generally it comes down to habit and I find myself using both almost equally. Now one caveat of our generator. What if we couldn't find our second match. Either we were not working on immuatable data and it changed since we began the search, or set our match criteria with our prior knowledge. We will hit an iteration error and break. We always want to catch such exceptions, deal with them gracefully, and move on so we do not break our code. So how do we go about doing this here? .. code:: python collections_generator = iter(collection.items()) found = [] while len(found) < 2: try: k, v = next(collections_generator) if v == 53: print(f'Found a 53 for {k=}') found.append(k) except StopIteration: print('We ran out of data to search') break found .. parsed-literal:: Found a 53 for k='t' We ran out of data to search .. parsed-literal:: ['t'] To catch this issue, we nest our iterable assignment and condition check within a ``try/except`` routine. Here, we will try to get the next iterable and test it. However, if we run out because there is nothing left to yield when we call ``next``, ``iter`` will ``raise StopIteration``. We catch this specific (and we always want to be specific) exception with ``except StopIteration``. To ensure we had a scenario where this exception would throw, we searched for a value where we know from our previous exercises that it occurred only once, here ``53``. Pretty simple. But two important things to note. 1) We explicitly use ``except StopIteration`` to only catch this error. We do not want to catch any other issues that might arise as it would obfuscate any other errors since we assume running out of choices is all that could happen. 2) **More critically** we must make sure that whatever we do in our ``exception`` handling, whether it's simply print out an issue warning, like here, or maybe we try and do something else, we have to call ``break``. If we do not call ``break`` when we run out of iterables, what will happen with our ``while`` routine? Well, we'll never meet the condition ``len(found) < 2`` and we will loop to infinite. (Okay, not really. We will generally hit a recursion warning in most environments, but it's equally bad. Think about this happening in product!) Another consideration is that we can use ``else`` and ``finally`` with our ``try/except`` routine. While I'll leave that for another time, in brief, it allows us to run a secondary operation if the first exception does not through, and a way to always run a cleanup regardless if we succeeded or not (think file object close, etc). So ``else`` only runs if no exception was caught with ``except`` and ``finally`` will always run no matter the prior outcome. Okay, I think we hit a lot of good stuff so far. We discussed constructing objects through comprehension building (list or dictionary), and generating some random toy data in doing so. We ran through a few different ways to summarize and understand our data using either built-ins or from scratch methods. We saw how we can "search" conditionally to find data or simply count occurrences. We then dove into using generators and iterating to optimize our searching. Finally, we briefly touched on adding some error handling. Let's take a look at one last method. What if we either knew our data had only one match, for example, our user with a specific. Let's create a simple class object, ``User`` and highlight to things we can do wrap up everything we've gone over thus far. .. code:: python class User: def __init__(self, name, age, active=True): self.name = name self.age = age self.active = active def toggle_active(self): self.active = not self.active return True def __repr__(self): return f' {self.name=} | {self.age=} | {self.active}' Okay, we now have a ``User`` class that represents a user with a ``name``, ``age``, and an ``active`` status. We also have a function that we can call on a ``User`` that will toggle the active status. We also set a ``__repr__`` function so we can get a string representation of our users. Let's add some users, and we will store them in a list. So let's do this with some list comprehension! .. code:: python user_names = ['Patrick', 'Matthew', 'Linux Admin', 'Operating Doctor', 'Data Scientist'] users = [ User(name=name, age=np.random.randint(80)) for name in user_names ] users .. parsed-literal:: [ self.name='Patrick' | self.age=3 | True, self.name='Matthew' | self.age=45 | True, self.name='Linux Admin' | self.age=12 | True, self.name='Operating Doctor' | self.age=46 | True, self.name='Data Scientist' | self.age=74 | True] So we used our list comprehension to build out a list of users as ``users`` from our preset user names array ``user_names``. In doing so we assigned them a random age up to 80 with ``numpy.random.randint(80)``, similar to what we did with ``collection``. Also note since we did not supply an active argument, it defaulted to ``True`` as we set in ``User.__init__``. And because we made a nice ``__repr__``, we get an informative string representation of all of our users in ``users``. Now let's use what we have done already to perform a few somewhat real world operations. I see two users whose age is under 18. Patrick and Linux Admin should not be active on our platform without additional parental consent. Let's go ahead and take care of that. But let's highlight a few methods to do this. I will comment out the first few methods and only run the last since we only want to toggle this once, but we want to highlight them all. .. code:: python # # Pretty standard approach for user in users: if user.age < 18: user.toggle_active() # # This works, but map is a generator so to perform the mapped function we # # have to iterate it - so for our simple purpose this is a bit terse # list(map(lambda user: user.toggle_active(), (user for user in users if user.age < 18)) # # If we wanted to do the mapped approach, we could do it in a list comprehension # # further, since we return True after toggling as long as there is no thrown error # # we could add a check to make sure all requested toggling ran successfully # assert all(user.toggle_active() for user in users if user.age < 18) users .. parsed-literal:: [ self.name='Patrick' | self.age=3 | False, self.name='Matthew' | self.age=45 | True, self.name='Linux Admin' | self.age=12 | False, self.name='Operating Doctor' | self.age=46 | True, self.name='Data Scientist' | self.age=74 | True] As you can see, we simply ran a very readable loop to run ``User.toggle_active()`` on any user we encounter who's age is less than 18. And we can see that, yes, it did indeed work. I also commented out a few other approaches using more advanced, or in some cases merely more confusing, examples. The version with an ``assert``\ ion is pretty handy. You'd do something like this if we were unit testing, however, you don't usually want any assertions in your production code. In production, though, we might say ``if not all(blah): send_alert("status toggle failures")`` or something like that. And again, we can do that without inspecting the users simply because we ``return True`` if the code executes properly. But remember, that's not checking that the toggle resulted in the correct status state, only that it actually ran without raising an exception. Okay, so we have our users setup, and their status is no age appropriate. Let's come back to our techniques. Let's find users who's age is over 30 (so similar to what we just did) but now we want to do a bit more than a simple one-time function execution. Remember, we can do this with a list or generator. Let's pretend that our set of patients now should go into a new class object. Let's also set the caveat that the number of users we find will be >> the actual 2 in our example. We might also assume that each ``User`` in ``users`` also contains addition attributes where one contains health history that might be megabytes in size. So because we have a few things we want to do with our found users, and each found user is memory intensive and already exists somewhere in the ``users`` list, we do not want to copy an entire subset of that list. We are also going to pretend that our conditional patients can only accumulate in our intensive care unit so much before we fill our capacity. So because every bit of memory is important, remember all that patient history we naively read into memory when making our ``User``\ s? And because we can only take so many, let's use our generator approach. \*Note: if we were really memory intensive here and speed was important, we could use indexing and other tricks. But let's assume we are somewhere between our example of 5 and Google big data. .. code:: python users_iter = iter(users) max_capacity = 2 intensive_care_patients = [] while len(intensive_care_patients) < max_capacity: try: patient = next(users_iter) if patient.age > 30: intensive_care_patients.append(patient.name) except StopIteration: print(f'We still have capacity!') intensive_care_patients .. parsed-literal:: ['Matthew', 'Operating Doctor'] This should all seem standard fare now. So I will leave the breakdown to you at this point. Notice how we only used the patient name in our resultant list? Again, we are assuming it's incredibly expensive to hold a ``User`` in memory so let's not duplicate things unless we have to do so. In reality, we might reference a database table primary key, or maybe a unique user hash since names are common. We also do not want people causing havoc on the system by guessing actual data references (thus, why we normally use hashes or other unpredictable identifiers), but I digress. Okay, remember what I just said about copying. Well in reality, we could just reference the original object so we have a list of objects and another list of references to that object. But for our purposes, let's just assume we are dealing with copies above where maybe we would otherwise attach additional information making the reference and the original no longer equivalent and causing us to have two objects for one user. If we wanted to make ICU patient objects, we could have simply done so above. Rather than appending the ``User.name`` to a list, we could have simply made those objects and collected them. Let's do that quick here just as an example. .. code:: python class VulnerablePatient: def __init__(self, patient): self.patient = patient def __repr__(self): return f' {self.patient}' users_iter = iter(users) max_capacity = 2 intensive_care_patients = [] while len(intensive_care_patients) < max_capacity: try: patient = next(users_iter) if patient.age > 30: intensive_care_patients.append( VulnerablePatient(patient) ) except StopIteration: print(f'We still have capacity!') intensive_care_patients .. parsed-literal:: [ self.name='Matthew' | self.age=45 | True, self.name='Operating Doctor' | self.age=46 | True] Okay, so one thing we did different here is that we passed the entire ``User`` object as the patient to our ``VulnerablePatient`` object. I will also point out that it may or may not be obvious that here we rely on the ``User.__repr__`` to provide the actual patient information in ``VulnerablePatient.__repr__``. Here is where we can check to see if we have made two patients, one a ``User`` and one user ``User`` within ``VulnerablePatient`` or we are referencing the same ``User`` object and thus same memory block. Let's check. .. code:: python check_user = next( user for user in users if user.age > 30 ) compare_vulnerable_patient = next( patient for patient in intensive_care_patients if patient.patient.name == check_user.name ) print(check_user, compare_vulnerable_patient) .. parsed-literal:: self.name='Matthew' | self.age=45 | True self.name='Matthew' | self.age=45 | True Okay two things here. First, the simplest, jump down to the last line of code and the output after. We can see that the user we grabbed as ``check_user`` that met our condition for which we made ICU patients is properly found and is the same ``VulnerablePatient``. How we condition that should be familiar now if it wasn't before we started. The second thing to note is that we combine a few of our methods, finally, into a nice and concise example. We first want to get a user that we know would be a ``VulnerablePatient`` by conditioning the same way, here, age > 30. We do this in a generator because we now love them, but more imporantly we want to just get the first user that meets these conditions. We *do not* want to generate a full list of condition matching users because, remember, they are expensive, and we only need one and really fast. So to do this, we use a generator that will yield matching conditions, and we use ``next`` one time so we get the first iteration that is yielded. You will see we do this in a shortcut by using a generator comprehension that is wrapped in ``next``. We could have also done this like ``(generator comprehension).next()``, but that's not as clean, nor is it necessary in modern Python. You'll then see we do the same with our ICU patients list, but here we condition on the name being the same as our single ``check_user`` we just found. So we search for a single match (even though we know there are more) and then again, we search for a single match (but here we know there should be only one - yet our syntax is the same for our purposes). You might note, we did not break this out to catch a ``StopIteration`` because we know the data exists and each will successfully return. (*We say that all the time in production and then things blow up, don't they?*) Okay, back to it. We now have a ``User`` and a ``VulnerablePatient`` that should be the same person. But is this a copy or a reference to one memory block? Let's check. .. code:: python print(f'{id(check_user)=}, {id(compare_vulnerable_patient)=}, and {id(compare_vulnerable_patient.patient)=})', end='\n\n') def assert_equivalency(obj1, obj2): """Assert the equivalency of two objects""" try: assert id(obj1) == id(obj2) except AssertionError: print(f'{obj1=} and {obj2=} are not equivalent', end='\n\n') else: print(f'{obj1=} and {obj2=} are equivalent', end='\n\n') assert_equivalency(check_user, compare_vulnerable_patient) assert_equivalency(check_user, compare_vulnerable_patient.patient) .. parsed-literal:: id(check_user)=140252981884720, id(compare_vulnerable_patient)=140252983924240, and id(compare_vulnerable_patient.patient)=140252981884720) obj1= self.name='Matthew' | self.age=45 | True and obj2= self.name='Matthew' | self.age=45 | True are not equivalent obj1= self.name='Matthew' | self.age=45 | True and obj2= self.name='Matthew' | self.age=45 | True are equivalent So a few ways to look at it. The first we simply print the result of the ``id`` function which will tell us the object's unique identifier in memory. The second way is to ``assert`` the equivalency of the ``id``. And the third was is to directly check equivalency by comparing. As you can see, all three methods work and give us the same result. While the ``User`` and the ``VulnerableUser`` are not the same object, which is what we expect, the ``VulnerableUser.patient`` and the ``User`` are the same. So we did not create a second in-memory patient when we put them in a ``VulnerableUser`` object. Okay, this last part was a bit of a tangent from what we were going through up to this point. However, we did a nice culmination of our methods in getting ``check_user`` and ``compare_vulnerable_patient``, and then hopefully the explanation of in-memory and referencing, and how we can check these things, proves to be helpful. I think that should do it for generators and generator comprehensions, and when we might want to use them over lists and list comprehensions.