.. Python Generators and Comprehension

.. post:: Nov 13, 2020
   :tags: atag
   :author: Matthew Martz

Python Generators and Comprehension
========================================
Digging into generators and comprehension - from basics to
to implementation in a comprehensive tutorial. This is a walkthrough
for beginners that will build up to real world examples.

.. panels::
    :column: col-lg-12 p-0
    :header: font-weight-bold bg-info

    Note
    ^^^^^^^^^^^^^^
    This is an in-progress draft.

.. code:: python

    import numpy as np
    import string

Here we will build a dictionary of items that we can use for examples.
Let's make it keyed on alphabetical characters and random integers.

The algorithm for this will be like:

Do something 20 times so we can have 20 items as (key, value). For each
of the 20 iterations, choose a random lowercase alphabetical character
as string and a positive integer up to 100.

.. code:: python

    alpha = list(string.ascii_lowercase)
    collection = {
        np.random.choice(alpha): np.random.randint(100)
        for _ in range(20)
    }
    collection


.. parsed-literal::

    {'i': 15,
     'u': 14,
     'h': 74,
     'x': 93,
     'r': 0,
     'd': 64,
     'v': 31,
     'k': 17,
     'm': 93,
     'p': 18,
     'l': 80,
     'o': 31,
     'c': 48,
     'q': 45,
     'b': 55,
     's': 40,
     't': 53}


We did this through a method called dictionary comprehension where we
build a dictionary object on the fly. You can spot these comprehension
methods (dictionaries or lists) by iteration code within ``{}`` or
``[]`` for dictionary or list comprehension, respectively.

Building our dictionary collection, we iterate over ``range(20)`` so we
will have 20 key:value pairs. Since we are not using the number yielded
from the ``range`` function, we use the python internal reference
variable name ``_`` to indicate to the user that we are not utilizing
this variable; our range is simply letting us do something 20 times.

For each iteration generated by iterating over ``range``, we randomly
sample alpha using ``numpy.random.choice``. ``alpha`` contains the
english language alphabet in lower case by way of
``string.ascii_lowercase`` and we call ``list`` on this because
``string.ascii_lowercase`` is a generator. We must have the complete
list to sample with numpy otherwise we would not know our choices if
only observing one object yielded to us.

Next, our value is assigned by randomly selecting an integer up to 100
with ``numpy.random.randint``.

In order to tell our comprehension these are key:value pairs, our key
(``numpy.random.choice(alpha)``) is assigned first, followed by ``:``,
then our value (``numpy.random.randint(100)``). This gives us our
``key: value``. These will all be collected within ``{}`` and assigned
to the variable ``collection``, which we can call and the ``__repr__``
function of our ``collection`` object (a ``dict`` class) will return a
string representation.

Now let's demonstrate a few things we can do with this ``collection``
dictionary.

.. code:: python

    from collections import Counter

    value_counts = Counter(collection.values())
    value_counts.most_common()


.. parsed-literal::

    [(93, 2),
     (31, 2),
     (15, 1),
     (14, 1),
     (74, 1),
     (0, 1),
     (64, 1),
     (17, 1),
     (18, 1),
     (80, 1),
     (48, 1),
     (45, 1),
     (55, 1),
     (40, 1),
     (53, 1)]


``collections.Counter`` allows us to feed it an array of data and have
it tabulate occurrences. We call the ``most_common`` function to sort
the counts descending by occurrence. We could have also called
``collections.Counter.most_common(5)`` to get the top 5, for example.

This is really doing something like the following.

.. code:: python

    value_counts = {}
    for val in collection.values():
        value_counts[val] = value_counts.get(val, 0) + 1

    sorted(value_counts.items(), key=lambda count: count[1], reverse=True)


.. parsed-literal::

    [(93, 2),
     (31, 2),
     (15, 1),
     (14, 1),
     (74, 1),
     (0, 1),
     (64, 1),
     (17, 1),
     (18, 1),
     (80, 1),
     (48, 1),
     (45, 1),
     (55, 1),
     (40, 1),
     (53, 1)]


First we set an empty dictionary object to which we will tabulate our
value occurrences. We then iterate our collection values through the
``collection.values()`` generator. For each value, we will assign
``value_counts`` a key of that value and increment it's value by 1 for
each observation. To do this, we get the current value, ``val``, by
calling ``value_counts.get(val)``. But if this does not yet exist, we
get an error. So we use a default value of zero by calling this function
like ``value_counts.get(val, 0)``. Then we can take the actual value, or
default starting point of 0 and add 1 for this observation.

Next, to get things sorted like ``collections.Counter.most_common`` we
will sort our list using ``sorted``. We iterate over key:value pairs,
and we tell it that the sorting key is value where ``count`` represents
the ``tuple(key, value)`` and we use the value by setting the key as
``count[1]`` which is the value position of our iteration tuple.
``lambda`` is letting is call in inline function and we could do any
sort of operation. Maybe this value is an error and we need to square
it: ``lambda count: count[1] ** 2``. However, we would be better doing
that in a seperate operation since it obfuscates from the user that we
are sorting on error and not the original value. Finally,
``reverse=True`` tells ``sorted`` we want max -> min.

So we have taken our random collection of alphabetical keys and
generated the most common integer value occurrences. We could have
easily done the same for the alphabetical keys by calling
``collections.Counter(collection.keys()).most_common()``.

Now let's do some conditional selection. We will first find all keys
whose value is gt 40.

.. code:: python

    gt40keys = [
        k for (k, v) in collection.items()
        if v > 40
    ]
    gt40keys


.. parsed-literal::

    ['h', 'x', 'd', 'm', 'l', 'c', 'q', 'b', 't']


That used list comprehension (the list method of how we generated our
collection by way of dictionary comprehension) to iterate through key:
value pairs of collection and collect the key if the value is gt 40. The
result is a list for keys.

We could have returned the full key: value pair with
``(k, v) for (k, v) in collection.items() ...`` or we could have made a
new dictionary of only those key: value pairs matching that condition.
Let's do that because it is a common routine.

.. code:: python

    gt40collection = {
        k: v for (k, v) in collection.items()
        if v > 40
    }
    gt40collection


.. parsed-literal::

    {'h': 74,
     'x': 93,
     'd': 64,
     'm': 93,
     'l': 80,
     'c': 48,
     'q': 45,
     'b': 55,
     't': 53}


So this looks just like our previous list comprehension, but to generate
a dictionary we use ``{}`` to make it a dictionary comprehension, we
assign key ``k`` the value ``v`` by ``k: v`` as we normally do but only
if the value is gt 40. The result is a dictionary that is a subset of
``collection`` with only entries matching our criteria.

But what if we want to search for a value. Let's say we want create a
new dictionary that is a subset of ``collection`` where the values equal
93. We will do this just like above.

.. code:: python

    equals93collection = {
        k: v for (k, v) in collection.items()
        if v == 93
    }
    equals93collection


.. parsed-literal::

    {'x': 93, 'm': 93}


So we have 2 key: value pairs where the value met the condition we set.
The result is a dictionary.

Before we go on, as a bit of an aside, what if we simply wanted to
double check how many values are equal to 93. Nothing more, nothing
less. We can combine a few of the methods we've highlight to get
something like the following.

.. code:: python

    number_values_equal_to_93 = sum(
        1 for v in collection.values()
        if v == 93
    )
    number_values_equal_to_93


.. parsed-literal::

    2


This might look slightly different, but it's really all the same as we
have done. Let's start inside and work out. first we are iterating over
the values of ``collection`` and since we only need the values and new
key: value pairs, we use ``collection.values`` (and if we were looking
at keys we would use ``collection.keys``, but we would never be counting
key occurrences with an equivalency because remember, we cannot have two
of the same key in a dictionary). We are testing if the value equals 93
and accumulating an integer 1 rather than the value. We use 1 because we
want to count the occurrences, not operate on them by summing the
values. Finally, we count up these ones with ``sum``.

Okay, I just wanted to highlight searching with comprehension and how we
can use sum and specific values. Let's move on.

While this was a "search" for values that equaled 93, we simple iterated
over the entire collection to test each and every value. While this is
common, another more "search"-like operation is to identify key: value
entries where we now a condition is met, and we know how many exist.
This could be finding the single user who's hash equals a specific
value.

Let's first use our collection to find the two keys whose value equals
93, and since we know there exists only two entries, let's do this with
a generator so that we can stop as soon as we have found our two data
points. There's no reason we should search further than that; if 93
occurs at position 0 and 1, why should we be searching through the
remainder of collection?

.. code:: python

    found = []
    while len(found) < 2:
        for (k, v) in collection.items():
            if v == 93:
                found.append(k)
    found


.. parsed-literal::

    ['x', 'm']


As you see, this is just like the previous method where we accumulated
key: value pairs where the value equaled 93. The syntax is different,
but the operation is very similar. The difference here is that we stop
iterating over the items as soon as we find our known matches of 2.

However, there are some issues here if we were working with much larger
data objects. When we call ``collection.items()`` we generate the entire
list of key: value pairs for ``collection``. This puts all of those in
memory upon creation, and *then* we starting iterating over them. While
it's better in that we stop iterating as soon as our 2 are found, we
still generate the entirety of ``collection``'s key: value pairings.

Let's make this better by only creating one key: value pair, check it,
then move on if we need to do so.

.. code:: python

    found = []
    for (i, (k, v)) in enumerate(collection.items()):
        if v == 93:
            print(f'Found a 93 on loop {i=} for {k=}')
            found.append(k)

        if len(found) == 2:
            break
        else:
            continue

    print(f'collection is {len(collection)=} and we searched through {i=} key: value pairs')
    found


.. parsed-literal::

    Found a 93 on loop i=3 for k='x'
    Found a 93 on loop i=8 for k='m'
    collection is len(collection)=17 and we searched through i=8 key: value pairs


.. parsed-literal::

    ['x', 'm']


So you can see this almost identical to our last attempt, but we have
created a generator using ``enumerate`` over our function. We then
iterate this generator as it yields an integer telling us the iteration
loop number (like a counter) and the ``tuple(key, value)``. So we assign
each yield as (i, (k, v)), check if the value ``v`` equals 93, and if
so, we can print out which loop count we are at, the key ``k`` it was
associated with, and then append the key to our found list. If we have
found 2, as ``len(found) == 2``, then we ``break`` out of our loop,
otherwise we ``continue`` to the next iterable.

You can then see that of our collection of length 17, we only searched
through 9 pairs (``enumerate`` started at 0 and ended at i=8 so that is
8 + 1 pairs). Pretty cool, especially if we are searching in the
billions, or our search criteria is a lot more complex than an identify
test. What if we were searching for the key were some costly function
``f(x)`` which took several {seconds/minutes/MBs/GBs} to compute yielded
the resultant value ``v`` of our ``tuple(k, v)`` pair?

Another important feature is that we are storing in ram the current
iterable, we are not inherently accumulating everything we have already
iterated over. We are looking at things one at a time and moving on.
It's like we are at a market and searching for a good fruit. We sample
many fruits by taking one, looking it over, and either keeping it
because it matches our criteria for ripeness, or we put it back. We do
not stash in our arms every fruit we sample and then only after
exhausting the entire bin of fruits do we put them down, keeping our
selections. For shopping the market, iterating by ``yield``\ ing one
fruit at a time, our arms (and the farmer) are thankful. For our
programming, our RAM capacity is thankful.

We see this commonly with operating on file objects where we readily can
encounter files >> size of our usable memory. This also extremely common
with large arrays and networks when using a GPU.

Let's make a quick generator just to see what's really different here.
First, though, let's make a version like our list comprehension to see
what that looks like, and then do the same thing with a generator to
compare.

.. code:: python

    def list_building(n=10):
        """Generate a list of size 10"""
        created_array = []
        for i in range(n):
            created_array.append(i)

        return created_array

    def generator_list_building(n=10):
        for i in range(n):
            yield i


    complete_list = list_building(10)
    print(f'{complete_list=} is {type(complete_list)=}')

    iterable_list = generator_list_building(10)
    print(f'{iterable_list=} is {type(iterable_list)=}')


.. parsed-literal::

    complete_list=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] is type(complete_list)=<class 'list'>
    iterable_list=<generator object generator_list_building at 0x7f8f315e9f90> is type(iterable_list)=<class 'generator'>


As you can see both by method, but also by inspection, the first list
building routine creates, then returns the entire array of 10 integers.
However, the generator method creates an in-memory generator which can
later by iterated over. You can see the difference by printing out the
result. ``complete_list`` contains a complete, 10 integer array.
However, ``iterable_list`` has no data and is instead a generator
object.

But how do we get something out of the generator? Let's say for our list
of objects, we wanted an array of squared values.

.. code:: python

    squared_list = [
        n ** 2 for n in complete_list
    ]
    print(squared_list)

    squared_generator_list = [
        n ** 2 for n in iterable_list
    ]
    print(squared_generator_list)


.. parsed-literal::

    [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
    [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


Both resulted in our list of squared original values. However, with the
first method what do we have in memory? We have the original list
``complete_list`` which is of length 10. We now also have a squared list
``squared_list`` also of length 10. These are small, but what if we
simply wanted our squared list and it was to be a billion integers?
Further, do we need the original if we just want squares. We could use
``del complete_list`` but we still had both complete lists in memory at
some point for some duration.

The second, though started only with a generator. When we were done, we
only had a resultant list and an empty generator. That's because we told
it what it *will* ``yield``, we then iterated it having it ``yield`` a
particular integer, we operated on that iterable and accumulated it, and
we then threw away our yielded iterable before moving on. So as we
iterate over the generator, we yield and then move on. Just like our
fruit inspection at the market.

The great thing is, our squaring operation looks the same! Nice and
simple, right? But our functions were a bit crude --- can we write them
in our comprehension style? Can we make a comprehension generator? Yes!

.. code:: python

    comprehension_based_complete_list = [
        i for i in range(10)
    ]
    print(f'{comprehension_based_complete_list=} is {type(comprehension_based_complete_list)=}')

    comprehension_based_iterable_list = (
        i for i in range(10)
    )
    print(f'{comprehension_based_iterable_list=} is {type(comprehension_based_iterable_list)=}')


.. parsed-literal::

    comprehension_based_complete_list=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] is type(comprehension_based_complete_list)=<class 'list'>
    comprehension_based_iterable_list=<generator object <genexpr> at 0x7f8f308f4890> is type(comprehension_based_iterable_list)=<class 'generator'>


Okay, there we have it. The same functions we already did but with list
and generator comprehension. Remember, this is done through the subtle
difference in using either ``[]`` for a list or ``()`` for a generator
around the comprehension logic.

Now, this is not the best example. That's because in Python 3, range is
already a generator. So these would simply be ``list(range(10))`` and
``range(10)``, respectively. But you should be able to see that what we
are doing to make the data could be anything. Finally, as an exercise,
let's just combine everything in the last few functions to make a
squared value list and generator. This should be a better example than
``range`` alone, and it will allow us to write something more succinct
than the multi-step functions above.

.. code:: python

    comprehension_based_squared_list = [
        i ** 2 for i in range(10)
    ]
    print(f'{comprehension_based_squared_list=} is {type(comprehension_based_squared_list)=}')

    comprehension_based_squared_iterable = (
        i ** 2 for i in range(10)
    )
    print(f'{comprehension_based_squared_iterable=} is {type(comprehension_based_squared_iterable)=}')


.. parsed-literal::

    comprehension_based_squared_list=[0, 1, 4, 9, 16, 25, 36, 49, 64, 81] is type(comprehension_based_squared_list)=<class 'list'>
    comprehension_based_squared_iterable=<generator object <genexpr> at 0x7f8f308f4ac0> is type(comprehension_based_squared_iterable)=<class 'generator'>


So we have a squared list and a squared generator.

Let's get back to our "searching". Remember way back we used
``enumerate`` to create a generator-based version of our data for which
we could iterate. Let's take a look at another way we can do this.

.. code:: python

    collections_generator = iter(collection.items())

    found = []
    while len(found) < 2:
        k, v = next(collections_generator)
        if v == 93:
            print(f'Found a 93 for {k=}')
            found.append(k)

    found


.. parsed-literal::

    Found a 93 for k='x'
    Found a 93 for k='m'


.. parsed-literal::

    ['x', 'm']


Here we created our own generate using ``iter`` which let's us work
through any collection of data (list, array, dict, tuple). We then use a
``while`` routine to iterate through the data until we have found our 2
matches collected under ``found`` and assessed with ``len(found)``. We
yield and iteration of key: value by calling ``next`` on our generator,
check if value ``v`` is equal to 93, and if so, we append it to found.

Great, works just like before but much more concise using ``while`` and
using ``iter`` our own data object, we can make it adaptable. The
benefit of ``enumerate`` is that we don't have to yield the data
ourselves using ``next`` and we also get the added feature of a built-in
counter as the first value in the yielded
``tuple(count, iterable_item)`` where our iterable\_item is a key: value
pair as ``tuple(k, v)`` since we are iterating ``collection.items()``.

Either works, fine and there are some pros and cons to each beyond what
we just discussed. Generally it comes down to habit and I find myself
using both almost equally.

Now one caveat of our generator. What if we couldn't find our second
match. Either we were not working on immuatable data and it changed
since we began the search, or set our match criteria with our prior
knowledge. We will hit an iteration error and break. We always want to
catch such exceptions, deal with them gracefully, and move on so we do
not break our code.

So how do we go about doing this here?

.. code:: python

    collections_generator = iter(collection.items())

    found = []
    while len(found) < 2:
        try:
            k, v = next(collections_generator)
            if v == 53:
                print(f'Found a 53 for {k=}')
                found.append(k)
        except StopIteration:
            print('We ran out of data to search')
            break

    found


.. parsed-literal::

    Found a 53 for k='t'
    We ran out of data to search


.. parsed-literal::

    ['t']


To catch this issue, we nest our iterable assignment and condition check
within a ``try/except`` routine. Here, we will try to get the next
iterable and test it. However, if we run out because there is nothing
left to yield when we call ``next``, ``iter`` will
``raise StopIteration``. We catch this specific (and we always want to
be specific) exception with ``except StopIteration``.

To ensure we had a scenario where this exception would throw, we
searched for a value where we know from our previous exercises that it
occurred only once, here ``53``.

Pretty simple. But two important things to note.

1) We explicitly use ``except StopIteration`` to only catch this error.
   We do not want to catch any other issues that might arise as it would
   obfuscate any other errors since we assume running out of choices is
   all that could happen.

2) **More critically** we must make sure that whatever we do in our
   ``exception`` handling, whether it's simply print out an issue
   warning, like here, or maybe we try and do something else, we have to
   call ``break``.

If we do not call ``break`` when we run out of iterables, what will
happen with our ``while`` routine? Well, we'll never meet the condition
``len(found) < 2`` and we will loop to infinite. (Okay, not really. We
will generally hit a recursion warning in most environments, but it's
equally bad. Think about this happening in product!)

Another consideration is that we can use ``else`` and ``finally`` with
our ``try/except`` routine. While I'll leave that for another time, in
brief, it allows us to run a secondary operation if the first exception
does not through, and a way to always run a cleanup regardless if we
succeeded or not (think file object close, etc). So ``else`` only runs
if no exception was caught with ``except`` and ``finally`` will always
run no matter the prior outcome.

Okay, I think we hit a lot of good stuff so far. We discussed
constructing objects through comprehension building (list or
dictionary), and generating some random toy data in doing so. We ran
through a few different ways to summarize and understand our data using
either built-ins or from scratch methods. We saw how we can "search"
conditionally to find data or simply count occurrences. We then dove
into using generators and iterating to optimize our searching. Finally,
we briefly touched on adding some error handling.

Let's take a look at one last method. What if we either knew our data
had only one match, for example, our user with a specific. Let's create
a simple class object, ``User`` and highlight to things we can do wrap
up everything we've gone over thus far.

.. code:: python

    class User:
        def __init__(self, name, age, active=True):
            self.name = name
            self.age = age
            self.active = active

        def toggle_active(self):
            self.active = not self.active

            return True

        def __repr__(self):
            return f'<User> {self.name=} | {self.age=} | {self.active}'

Okay, we now have a ``User`` class that represents a user with a
``name``, ``age``, and an ``active`` status. We also have a function
that we can call on a ``User`` that will toggle the active status. We
also set a ``__repr__`` function so we can get a string representation
of our users.

Let's add some users, and we will store them in a list. So let's do this
with some list comprehension!

.. code:: python

    user_names = ['Patrick', 'Matthew', 'Linux Admin', 'Operating Doctor', 'Data Scientist']
    users = [
        User(name=name, age=np.random.randint(80))
        for name in user_names
    ]
    users


.. parsed-literal::

    [<User> self.name='Patrick' | self.age=3 | True,
     <User> self.name='Matthew' | self.age=45 | True,
     <User> self.name='Linux Admin' | self.age=12 | True,
     <User> self.name='Operating Doctor' | self.age=46 | True,
     <User> self.name='Data Scientist' | self.age=74 | True]


So we used our list comprehension to build out a list of users as
``users`` from our preset user names array ``user_names``. In doing so
we assigned them a random age up to 80 with
``numpy.random.randint(80)``, similar to what we did with
``collection``. Also note since we did not supply an active argument, it
defaulted to ``True`` as we set in ``User.__init__``. And because we
made a nice ``__repr__``, we get an informative string representation of
all of our users in ``users``.

Now let's use what we have done already to perform a few somewhat real
world operations.

I see two users whose age is under 18. Patrick and Linux Admin should
not be active on our platform without additional parental consent. Let's
go ahead and take care of that. But let's highlight a few methods to do
this. I will comment out the first few methods and only run the last
since we only want to toggle this once, but we want to highlight them
all.

.. code:: python

    # # Pretty standard approach
    for user in users:
        if user.age < 18:
            user.toggle_active()

    # # This works, but map is a generator so to perform the mapped function we
    # # have to iterate it - so for our simple purpose this is a bit terse
    # list(map(lambda user: user.toggle_active(), (user for user in users if user.age < 18))

    # # If we wanted to do the mapped approach, we could do it in a list comprehension
    # # further, since we return True after toggling as long as there is no thrown error
    # # we could add a check to make sure all requested toggling ran successfully
    # assert all(user.toggle_active() for user in users if user.age < 18)

    users


.. parsed-literal::

    [<User> self.name='Patrick' | self.age=3 | False,
     <User> self.name='Matthew' | self.age=45 | True,
     <User> self.name='Linux Admin' | self.age=12 | False,
     <User> self.name='Operating Doctor' | self.age=46 | True,
     <User> self.name='Data Scientist' | self.age=74 | True]


As you can see, we simply ran a very readable loop to run
``User.toggle_active()`` on any user we encounter who's age is less than
18. And we can see that, yes, it did indeed work.

I also commented out a few other approaches using more advanced, or in
some cases merely more confusing, examples. The version with an
``assert``\ ion is pretty handy. You'd do something like this if we were
unit testing, however, you don't usually want any assertions in your
production code. In production, though, we might say
``if not all(blah): send_alert("status toggle failures")`` or something
like that. And again, we can do that without inspecting the users simply
because we ``return True`` if the code executes properly. But remember,
that's not checking that the toggle resulted in the correct status
state, only that it actually ran without raising an exception.

Okay, so we have our users setup, and their status is no age
appropriate. Let's come back to our techniques. Let's find users who's
age is over 30 (so similar to what we just did) but now we want to do a
bit more than a simple one-time function execution.

Remember, we can do this with a list or generator. Let's pretend that
our set of patients now should go into a new class object. Let's also
set the caveat that the number of users we find will be >> the actual 2
in our example. We might also assume that each ``User`` in ``users``
also contains addition attributes where one contains health history that
might be megabytes in size. So because we have a few things we want to
do with our found users, and each found user is memory intensive and
already exists somewhere in the ``users`` list, we do not want to copy
an entire subset of that list.

We are also going to pretend that our conditional patients can only
accumulate in our intensive care unit so much before we fill our
capacity. So because every bit of memory is important, remember all that
patient history we naively read into memory when making our ``User``\ s?
And because we can only take so many, let's use our generator approach.

\*Note: if we were really memory intensive here and speed was important,
we could use indexing and other tricks. But let's assume we are
somewhere between our example of 5 and Google big data.

.. code:: python

    users_iter = iter(users)
    max_capacity = 2
    intensive_care_patients = []
    while len(intensive_care_patients) < max_capacity:
        try:
            patient = next(users_iter)
            if patient.age > 30:
                intensive_care_patients.append(patient.name)
        except StopIteration:
            print(f'We still have capacity!')

    intensive_care_patients


.. parsed-literal::

    ['Matthew', 'Operating Doctor']


This should all seem standard fare now. So I will leave the breakdown to
you at this point.

Notice how we only used the patient name in our resultant list? Again,
we are assuming it's incredibly expensive to hold a ``User`` in memory
so let's not duplicate things unless we have to do so. In reality, we
might reference a database table primary key, or maybe a unique user
hash since names are common. We also do not want people causing havoc on
the system by guessing actual data references (thus, why we normally use
hashes or other unpredictable identifiers), but I digress.

Okay, remember what I just said about copying. Well in reality, we could
just reference the original object so we have a list of objects and
another list of references to that object. But for our purposes, let's
just assume we are dealing with copies above where maybe we would
otherwise attach additional information making the reference and the
original no longer equivalent and causing us to have two objects for one
user.

If we wanted to make ICU patient objects, we could have simply done so
above. Rather than appending the ``User.name`` to a list, we could have
simply made those objects and collected them. Let's do that quick here
just as an example.

.. code:: python

    class VulnerablePatient:
        def __init__(self, patient):
            self.patient = patient

        def __repr__(self):
            return f'<VulnerablePatient> {self.patient}'


    users_iter = iter(users)
    max_capacity = 2
    intensive_care_patients = []
    while len(intensive_care_patients) < max_capacity:
        try:
            patient = next(users_iter)
            if patient.age > 30:
                intensive_care_patients.append(
                    VulnerablePatient(patient)
                )
        except StopIteration:
            print(f'We still have capacity!')

    intensive_care_patients


.. parsed-literal::

    [<VulnerablePatient> <User> self.name='Matthew' | self.age=45 | True,
     <VulnerablePatient> <User> self.name='Operating Doctor' | self.age=46 | True]


Okay, so one thing we did different here is that we passed the entire
``User`` object as the patient to our ``VulnerablePatient`` object. I
will also point out that it may or may not be obvious that here we rely
on the ``User.__repr__`` to provide the actual patient information in
``VulnerablePatient.__repr__``.

Here is where we can check to see if we have made two patients, one a
``User`` and one user ``User`` within ``VulnerablePatient`` or we are
referencing the same ``User`` object and thus same memory block. Let's
check.

.. code:: python

    check_user = next(
        user for user in users
        if user.age > 30
    )
    compare_vulnerable_patient = next(
        patient for patient in intensive_care_patients
        if patient.patient.name == check_user.name
    )

    print(check_user, compare_vulnerable_patient)


.. parsed-literal::

    <User> self.name='Matthew' | self.age=45 | True <VulnerablePatient> <User> self.name='Matthew' | self.age=45 | True


Okay two things here. First, the simplest, jump down to the last line of
code and the output after. We can see that the user we grabbed as
``check_user`` that met our condition for which we made ICU patients is
properly found and is the same ``VulnerablePatient``. How we condition
that should be familiar now if it wasn't before we started.

The second thing to note is that we combine a few of our methods,
finally, into a nice and concise example.

We first want to get a user that we know would be a
``VulnerablePatient`` by conditioning the same way, here, age > 30. We
do this in a generator because we now love them, but more imporantly we
want to just get the first user that meets these conditions. We *do not*
want to generate a full list of condition matching users because,
remember, they are expensive, and we only need one and really fast. So
to do this, we use a generator that will yield matching conditions, and
we use ``next`` one time so we get the first iteration that is yielded.

You will see we do this in a shortcut by using a generator comprehension
that is wrapped in ``next``. We could have also done this like
``(generator comprehension).next()``, but that's not as clean, nor is it
necessary in modern Python.

You'll then see we do the same with our ICU patients list, but here we
condition on the name being the same as our single ``check_user`` we
just found. So we search for a single match (even though we know there
are more) and then again, we search for a single match (but here we know
there should be only one - yet our syntax is the same for our purposes).
You might note, we did not break this out to catch a ``StopIteration``
because we know the data exists and each will successfully return. (*We
say that all the time in production and then things blow up, don't
they?*)

Okay, back to it. We now have a ``User`` and a ``VulnerablePatient``
that should be the same person. But is this a copy or a reference to one
memory block? Let's check.

.. code:: python

    print(f'{id(check_user)=}, {id(compare_vulnerable_patient)=}, and {id(compare_vulnerable_patient.patient)=})', end='\n\n')

    def assert_equivalency(obj1, obj2):
        """Assert the equivalency of two objects"""
        try:
            assert id(obj1) == id(obj2)
        except AssertionError:
            print(f'{obj1=} and {obj2=} are not equivalent', end='\n\n')
        else:
            print(f'{obj1=} and {obj2=} are equivalent', end='\n\n')


    assert_equivalency(check_user, compare_vulnerable_patient)
    assert_equivalency(check_user, compare_vulnerable_patient.patient)


.. parsed-literal::

    id(check_user)=140252981884720, id(compare_vulnerable_patient)=140252983924240, and id(compare_vulnerable_patient.patient)=140252981884720)

    obj1=<User> self.name='Matthew' | self.age=45 | True and obj2=<VulnerablePatient> <User> self.name='Matthew' | self.age=45 | True are not equivalent

    obj1=<User> self.name='Matthew' | self.age=45 | True and obj2=<User> self.name='Matthew' | self.age=45 | True are equivalent


So a few ways to look at it. The first we simply print the result of the
``id`` function which will tell us the object's unique identifier in
memory. The second way is to ``assert`` the equivalency of the ``id``.
And the third was is to directly check equivalency by comparing.

As you can see, all three methods work and give us the same result.
While the ``User`` and the ``VulnerableUser`` are not the same object,
which is what we expect, the ``VulnerableUser.patient`` and the ``User``
are the same. So we did not create a second in-memory patient when we
put them in a ``VulnerableUser`` object.

Okay, this last part was a bit of a tangent from what we were going
through up to this point. However, we did a nice culmination of our
methods in getting ``check_user`` and ``compare_vulnerable_patient``,
and then hopefully the explanation of in-memory and referencing, and how
we can check these things, proves to be helpful.

I think that should do it for generators and generator comprehensions,
and when we might want to use them over lists and list comprehensions.