In Python, itertools.groupby is a convenient function for grouping together consecutive elements from an iterable which share some property.

from itertools import groupby

locations = [
    {"state": "MA", "city": "Boston"},
    {"state": "NY", "city": "New York"},
    {"state": "NY", "city": "Albany"},
]

def get_state(location):
    return location["state"]

locations_grouped_by_state = [
    list(group)
    for state, group 
    in groupby(locations, key=get_state)
]

locations_grouped_by_state will have the following value.

[
    
    [
        {"state": "MA", "city": "Boston"}
    ],
    [
        {"state": "NY", "city": "New York"},
        {"state": "NY", "city": "Albany"},
    ],
]

Element order matters

groupby groups on consecutive elements. What if we reversed the order of Boston and New York?

locations = [
    {"state": "NY", "city": "New York"},
    {"state": "MA", "city": "Boston"},
    {"state": "NY", "city": "Albany"},
]

Because the two elements with "state": "NY" are no longer consecutive, they are not grouped together anymore. Now we end up with three separate groups.

[
    [{"state": "NY", "city": "New York"}], 
    [{"state": "MA", "city": "Boston"}], 
    [{"state": "NY", "city": "Albany"}],
]

To ensure that all like elements get grouped together, we first have to sort the elements with the same grouping function we're going to use with groupby.

locations_sorted_by_state = sorted(locations, key=get_state)
locations_grouped_by_state = [
    list(group)
    for state, group 
    in groupby(locations_sorted_by_state, key=get_state)
]

itertools is for iterators

When I first learned of this caveat, I was rather confused. The documentation does make this behavior clear, but the name groupby can be confusing if you're thinking about it in terms of a list and not a generic iterable. When I think about grouping elements of a list, I don't expect order to matter. What groupby does for lists makes me think of splitting, not grouping.

On the other hand, when I think about a generic iterable, then the behavior of groupby makes much more sense to me. Suppose we're operating on a stream instead of a list. With a stream, we don't know how many elements we're going to get ahead of time. The stream might even be infinite! In that case, we will never able to return a list we are sure contains all locations with "state": "NY". The ambiguity of what "group by" could mean for a list does not exist when thinking only about an iterable.

Personally, I think it's asking a bit much of us to provide a sorted list whenever we want to group like elements regardless of order. For that, we could define our own function that does the sorting for us.

def groupby_regardless_of_order(list_, key):
    return groupby(sorted(list_, key=key), key=key)

This function will group all elements with the same grouping key together, even if they are not consecutive.