A Tour of Python Collections

In this tutorial, I'm going to cover several different types of collections in Python.

Before we get started, let's define what a collection is. A collection is similar to a basket that you can add and remove items from. In some cases, they are the same types of items, and in others they are different. Basically, it's a storage construct that allows you to collect things.

For example, you might have a car type. You create several instances of the car and want some way to group all of those cars together and access them easily. This is the perfect scenario for a collection.

The collection will survive in memory. You don't need to build the collection or create any type of scaffolding. All of that is provided for free. Just create a collection instance and start adding your cars. When you're ready, you can pull them out by name or by index (position within the collection).

Python offers several built-in types that fall under a vague category called collections. While there isn't a formal type called collection in Python, there are lists, mappings, and sets.

In this tutorial, we're going cover the following types:

lists
strings
dictionaries
sets

List

Lists in Python are a built-in type called a sequence. List are mutable and allow you to add items of the same type or different types, making them very versatile constructs. Python lists are similar to arrays in other languages.

Python lists do allow you to have non-unique elements.

The following is an example of creating a list and adding items to it:

alist = ["item1", "item2", 4]

Notice the list is heterogenous, containing string and numeric types.

To retrieve an item from a list, simply reference the item's index. Python lists are zero indexed. If I want the last item, which is position 3, I need to use an index of 2:

alist[2]
> 4

The number 4 is returned. When referencing a list item, just subtract one from its position to get the correct index value.

Checking the length of a list can be done using the len command:

len(alist))
> 3

To add more items to the list, use the append() function:

alist.append(False)
len(alist)
> 4

We've increased the list by one and added a different type — the boolean. The list doesn't complain at all.

We can delete elements by calling remove():

alist.remove("item2")

remove() doesn't return a value. The list will be updated and now contains three items:

['item1', 4, False]

There are a couple of other ways to get items out of a list. We saw how to access an item using its index. If I access index 2, I'll get item 3:

thevalue = alist[2]
print(thevalue)
> False

The above code will supply us with a copy of the item. That item is still in the list. The overall list count isn't affected.

However, if we use pop(), we get the item, but it is also removed from the list:

thevalue = alist.pop(1)
print(thevalue)
> 4
print("after pop", alist)
> ['item1', False]

Lists can also be sorted. If I have the following list of strings:

alpha = ["z", "b", "a", "c"]

you can sort it using the sort() command:

alpha.sort()

sort() doesn't return a value. However, alpha is now sorted. You can see this by printing the list:

print(alpha)

Elements can be reversed just as easily by calling reverse():

alpha.reverse()

reverse() also doesn't return a value, and will reverse the current list.

Is a String a List?

Strings have some similarities to lists. However, strings are immutable, while lists are mutable.

Strings are index based like a list. You can also get a count of characters in a string, just like you can get a count of items in a list.

For example:

mystring = "The quick brown fox." 
print(len(mystring)) 
> 20
print(mystring[4])
> q

Unlike a list, you can't add another character by appending it. You also can't update a specific element within the string.

Notice what happens if we try to assign a character to a specific position within the string:

mystring[4] = 'z'
> TypeError: 'str' object does not support item assignment

This is where the immutable part of strings comes into play.

Depending on the string, we can convert a string into a list. Take our mystring variable from above. If we split() the string, it will default to splitting on spaces:

stringlist = mystring.split()
stringlist
>['The', 'quick', 'brown', 'fox.']
type(stringlist)
><class 'list'>

Each word in the string becomes an element in a list. We can also see the type is clearly a list.

If the string has no spaces, we can still split it. But what will the result be? Let's check it:

mystring2 = "Thequickbrownfox." 
stringlist2 = mystring2.split()
stringlist2
type(stringlist2)
>['Thequickbrownfox.']
><class 'list'>

We still get a list but this time with only one element. Ultimately, there isn't much utility to splitting a string in this case.

Mappings

Mappings are another built-in type. The only mapping available in Python is the dictionary. Dictionaries are key/value based. Unlike a list, which is index based, we don't access dictionary values using indexes. Instead, we access values using keys.

Creating a dictionary is similar to creating a list with the exception of adding key/value pairs rather than single items. Here's an example:

mydictionary = {"item1":45, "item2":76, "item3":145}

Each key/value is separated by a colon. The first part is the key and the second part is the value. In the first item, item1 is the key and 45 is the value. Also, notice we use braces instead of brackets to enclose our items.

When getting items from a dictionary, we think in terms of the key, since accessing via index isn't possible. If we want item2, we use:

mydictionary["item2"]
> 76

We can check a dictionary's length, in the same way we check a list's length:

len(mydictionary)
> 3

To update item2, we use do the following:

mydictionary["item2"] = 100

Adding an item is the same syntax as updating:

mydictionary["item62"] = 433

item62 now exists in the dictionary, and the total count has increased by one.

Dictionary items can be deleted by referencing a specific key:

del mydictionary["item2"]

item2 is now removed. As you can see, dictionary operations are fairly straight forward.

As mentioned earlier, dictionaries have key/value pairs. If you'd like to access only keys, you can do this:

mydictionary.keys()
> dict_keys(['item3', 'item1', 'item62'])

Values are accessed the same way:

mydictionary.values()
> dict_values([145, 45, 433])

Sets

Sets are unordered collections that can't have duplicate elements. Sets can't be sorted. The sort() method isn't available for sets.

In comparison to lists, sets can check for the existence of an element faster than lists.

To create a set, simply do the following:

myset = {3,4,5,1}

Or use the set method and supply an existing structure. For example:

mylist = [0,1,5,4,3,7,6,6]
myset = set(mylist)
>{0, 1, 3, 4, 5, 6, 7}

Since sets can only contain unique items, notice that one of the duplicate 6s was removed. Using set() is great for creating a unique collection of items from existing data.

If I try to add the 6 back, it doesn't have any effect:

myset.add(6)
>{0, 1, 3, 4, 5, 6, 7}

To remove an element from a set, you can call the remove() method:

myset.remove(4)

4 is no longer in myset.

Sets also don't support indexing. Trying to access an element within a set throws and error:

myset[2]
>TypeError: 'set' object does not support indexing

Sets have some methods unique to them. If you're familiar with the mathematical operations of sets (difference, intersection, and union), these methods will be well known to you.

We'll start with difference(). Suppose I have these two sets:

set1 = {1,3,6,7}
set2 = {1,3,6,8,10}

Using set1, the difference with set2 is 7. 7 is in set1 but not in set2. In Python, this looks like the following:

set1.difference(set2)
>{7}

To go the other way:

set2.difference(set1)
>{8, 10}

What about finding what is common in two sets:

set1.intersection(set2)
>{1, 3, 6}

And finally, combine both sets to create a new set:

set3 = set1.union(set2)
>{1, 3, 6, 7, 8, 10}

The union set includes all items from both sets.

Summary

We've gone through Python lists, strings (although not a sequence), dictionaries and sets. We've looked at specific operations and unique abilities of all. I hope this tutorial has further enhanced your knowledge of collections in Python.