Intro to Python Data Structures
Strings, Lists, Tuples, Sets, Dicts
Created in Python 3.7
© Joe James, 2019.
Sequences: String, List, Tuple
Documentation
indexing - access any item in the sequence using its index.
Indexing starts with 0 for the first element.
# string
x = 'frog'
print (x[3])
# list
x = ['pig', 'cow', 'horse']
print (x[1])
# tuple
x = ('Kevin', 'Niklas', 'Jenny', 'Craig')
print(x[0])
g
cow
Kevin
slicing - slice out substrings, sublists, subtuples using indexes.
[start : end+1 : step]
x = 'computer'
print(x[1:4])
print(x[1:6:2])
print(x[3:])
print(x[:5])
print(x[-1])
print(x[-3:])
print(x[:-2])
omp
opt
puter
compu
r
ter
comput
adding / concatenating - combine 2 sequences of the same type by using +
# string
x = 'horse' + 'shoe'
print(x)
# list
y = ['pig', 'cow'] + ['horse']
print(y)
# tuple
z = ('Kevin', 'Niklas', 'Jenny') + ('Craig',)
print(z)
horseshoe
['pig', 'cow', 'horse']
('Kevin', 'Niklas', 'Jenny', 'Craig')
multiplying - multiply a sequence using *
# string
x = 'bug' * 3
print(x)
# list
y = [8, 5] * 3
print(y)
# tuple
z = (2, 4) * 3
print(z)
bugbugbug
[8, 5, 8, 5, 8, 5]
(2, 4, 2, 4, 2, 4)
checking membership - test whether an item is or is not in a sequence.
# string
x = 'bug'
print('u' in x)
# list
y = ['pig', 'cow', 'horse']
print('cow' not in y)
# tuple
z = ('Kevin', 'Niklas', 'Jenny', 'Craig')
print('Niklas' in z)
True
False
True
iterating - iterating through the items in a sequence
# item
x = [7, 8, 3]
for item in x:
print(item)
# index & item
y = [7, 8, 3]
for index, item in enumerate(y):
print(index, item)
7
8
3
0 7
1 8
2 3
number of items - count the number of items in a sequence
# string
x = 'bug'
print(len(x))
# list
y = ['pig', 'cow', 'horse']
print(len(y))
# tuple
z = ('Kevin', 'Niklas', 'Jenny', 'Craig')
print(len(z))
3
3
4
minimum - find the minimum item in a sequence lexicographically.
Alpha or numeric types, but cannot mix types.
# string
x = 'bug'
print(min(x))
# list
y = ['pig', 'cow', 'horse']
print(min(y))
# tuple
z = ('Kevin', 'Niklas', 'Jenny', 'Craig')
print(min(z))
b
cow
Craig
maximum - find the maximum item in a sequence lexicographically.
Alpha or numeric types, but cannot mix types.
# string
x = 'bug'
print(max(x))
# list
y = ['pig', 'cow', 'horse']
print(max(y))
# tuple
z = ('Kevin', 'Niklas', 'Jenny', 'Craig')
print(max(z))
u
pig
Niklas
sum - find the sum of items in a sequence.
Entire sequence must be numeric.
# string -> error
# x = [5, 7, 'bug']
# print(sum(x)) # generates an error
# list
y = [2, 5, 8, 12]
print(sum(y))
print(sum(y[-2:]))
# tuple
z = (50, 4, 7, 19)
print(sum(z))
27
20
80
sorting - returns a new list of items in sorted order.
Does not change the original list.
# string
x = 'bug'
print(sorted(x))
# list
y = ['pig', 'cow', 'horse']
print(sorted(y))
# tuple
z = ('Kevin', 'Niklas', 'Jenny', 'Craig')
print(sorted(z))
['b', 'g', 'u']
['cow', 'horse', 'pig']
['Craig', 'Jenny', 'Kevin', 'Niklas']
sorting - sort by second letter
Add a key parameter and a lambda function to return the second character.
(the word key here is a defined parameter name, k is an arbitrary variable name).
z = ('Kevin', 'Niklas', 'Jenny', 'Craig')
print(sorted(z, key=lambda k: k[1]))
['Kevin', 'Jenny', 'Niklas', 'Craig']
count(item) - returns count of an item
# string
x = 'hippo'
print(x.count('p'))
# list
y = ['pig', 'cow', 'horse', 'cow']
print(y.count('cow'))
# tuple
z = ('Kevin', 'Niklas', 'Jenny', 'Craig')
print(z.count('Kevin'))
2
2
1
index(item) - returns the index of the first occurence of an item.
# string
x = 'hippo'
print(x.index('p'))
# list
y = ['pig', 'cow', 'horse', 'cow']
print(y.index('cow'))
# tuple
z = ('Kevin', 'Niklas', 'Jenny', 'Craig')
print(z.index('Jenny'))
2
1
2
unpacking - unpack the n items of a sequence into n variables
x = ['pig', 'cow', 'horse']
a, b, c = x
print(a, b, c)
pig cow horse
Lists
- General purpose
- Most widely used data structure
- Grow and shrink size as needed
- Sequence type
- Sortable
constructors - creating a new list
x = list()
y = ['a', 25, 'dog', 8.43]
tuple1 = (10, 20)
z = list(tuple1)
# list comprehension
a = [m for m in range(8)]
print(a)
b = [i**2 for i in range(10) if i>4]
print(b)
[0, 1, 2, 3, 4, 5, 6, 7]
[25, 36, 49, 64, 81]
delete - delete a list or an item in a list
x = [5, 3, 8, 6]
del(x[1])
print(x)
del(x) # list x no longer exists
[5, 8, 6]
append - append an item to a list
x = [5, 3, 8, 6]
x.append(7)
print(x)
[5, 3, 8, 6, 7]
extend - append a sequence to a list
x = [5, 3, 8, 6]
y = [12, 13]
x.extend(y)
print(x)
[5, 3, 8, 6, 12, 13]
insert - insert an item at a given index
x = [5, 3, 8, 6]
x.insert(1, 7)
print(x)
x.insert(1, ['a', 'm'])
print(x)
[5, 7, 3, 8, 6]
[5, ['a', 'm'], 7, 3, 8, 6]
pop - pops last item off list and returns item
x = [5, 3, 8, 6]
x.pop() # pop off the 6
print(x)
print(x.pop())
[5, 3, 8]
8
remove - remove first instance of an item
x = [5, 3, 8, 6, 3]
x.remove(3)
print(x)
[5, 8, 6, 3]
reverse - reverse the order of the list. It is an in-place sort, meaning it changes the original list.
x = [5, 3, 8, 6]
x.reverse()
print(x)
[6, 8, 3, 5]
sort - sort the list in place.
Note:
sorted(x) returns a new sorted list without changing the original list x.
x.sort() puts the items of x in sorted order (sorts in place).
x = [5, 3, 8, 6]
x.sort()
print(x)
[3, 5, 6, 8]
reverse sort - sort items descending.
Use reverse=True parameter to the sort function.
x = [5, 3, 8, 6]
x.sort(reverse=True)
print(x)
[8, 6, 5, 3]
Tuples
- Immutable (can’t add/change)
- Useful for fixed data
- Faster than Lists
- Sequence type
constructors - creating new tuples.
x = ()
x = (1, 2, 3)
x = 1, 2, 3
x = 2, # the comma tells Python it's a tuple
print(x, type(x))
list1 = [2, 4, 6]
x = tuple(list1)
print(x, type(x))
(2,) <class 'tuple'>
(2, 4, 6) <class 'tuple'>
tuples are immutable, but member objects may be mutable.
x = (1, 2, 3)
# del(x[1]) # fails
# x[1] = 8 # fails
print(x)
y = ([1, 2], 3) # a tuple where the first item is a list
del(y[0][1]) # delete the 2
print(y) # the list within the tuple is mutable
y += (4,) # concatenating two tuples works
print(y)
(1, 2, 3)
([1], 3)
([1], 3, 4)
Sets
- Store non-duplicate items
- Very fast access vs Lists
- Math Set ops (union, intersect)
- Sets are Unordered
constructors - creating new sets
x = {3, 5, 3, 5}
print(x)
y = set()
print(y)
list1 = [2, 3, 4]
z = set(list1)
print(z)
{3, 5}
set()
{2, 3, 4}
set operations
x = {3, 8, 5}
print(x)
x.add(7)
print(x)
x.remove(3)
print(x)
# get length of set x
print(len(x))
# check membership in x
print(5 in x)
# pop random item from set x
print(x.pop(), x)
# delete all items from set x
x.clear()
print(x)
{8, 3, 5}
{8, 3, 5, 7}
{8, 5, 7}
3
True
8 {5, 7}
set()
Mathematical set operations
intersection (AND): set1 & set2
union (OR): set1 | set2
symmetric difference (XOR): set1 ^ set2
difference (in set1 but not set2): set1 - set2
subset (set2 contains set1): set1 <= set2
superset (set1 contains set2): set1 >= set2
s1 = {1, 2, 3}
s2 = {3, 4, 5}
print(s1 & s2)
print(s1 | s2)
print(s1 ^ s2)
print(s1 - s2)
print(s1 <= s2)
print(s1 >= s2)
{3}
{1, 2, 3, 4, 5}
{1, 2, 4, 5}
{1, 2}
False
False
Dictionaries (dict)
- Key/Value pairs
- Associative array, like Java HashMap
- Dicts are Unordered
constructors - creating new dictionaries
x = {'pork':25.3, 'beef':33.8, 'chicken':22.7}
print(x)
x = dict([('pork', 25.3),('beef', 33.8),('chicken', 22.7)])
print(x)
x = dict(pork=25.3, beef=33.8, chicken=22.7)
print(x)
{'pork': 25.3, 'beef': 33.8, 'chicken': 22.7}
{'pork': 25.3, 'beef': 33.8, 'chicken': 22.7}
{'pork': 25.3, 'beef': 33.8, 'chicken': 22.7}
dict operations
x['shrimp'] = 38.2 # add or update
print(x)
# delete an item
del(x['shrimp'])
print(x)
# get length of dict x
print(len(x))
# delete all items from dict x
x.clear()
print(x)
# delete dict x
del(x)
{'pork': 25.3, 'beef': 33.8, 'chicken': 22.7, 'shrimp': 38.2}
{'pork': 25.3, 'beef': 33.8, 'chicken': 22.7}
3
{}
accessing keys and values in a dict
Not compatible with Python 2.
y = {'pork':25.3, 'beef':33.8, 'chicken':22.7}
print(y.keys())
print(y.values())
print(y.items()) # key-value pairs
# check membership in y_keys (only looks in keys, not values)
print('beef' in y)
# check membership in y_values
print('clams' in y.values())
dict_keys(['pork', 'beef', 'chicken'])
dict_values([25.3, 33.8, 22.7])
dict_items([('pork', 25.3), ('beef', 33.8), ('chicken', 22.7)])
True
False
iterating a dict - note, items are in random order.
for key in y:
print(key, y[key])
for k, v in y.items():
print(k, v)
pork 25.3
beef 33.8
chicken 22.7
pork 25.3
beef 33.8
chicken 22.7