Tuple

Learning Objectives

At the end of this sub-unit, students should

understand tuple.
know how to work with tuple.
know how to use tuple in for-loop.

Sequence of Anything

A string is a sequence of character where each character is a string. A range¹ is a sequence of only integer. Is there a sequence of anything?

A sequence of anything is called a tuple with the data type in Python written as tuple. It is a data type with a built-in support from Python. To create a tuple, we can enclose values with parentheses. If there are multiple values, we separate it with a comma.

Tuple

The syntax for tuple is as follows.

(❬ expression ❭, ❬ expression ❭, ❬ expression ❭, ..., ❬ expression ❭)

To avoid an overly complicated syntax², we will simply explain the few cases of interest.

Empty Tuple: ()
Singleton: ( expr, ) (singleton is a tuple with single value, the comma at the end is required)
General #1: ( expr1, expr2, expr3 ) (can have as many expressions as needed)
General #2: ( expr1, expr2, expr3, ) (can have trailing comma)

Why Different Singleton?

The problem with parentheses is that it is used in many different places. We will list some of the places it is used. Understanding the context where parentheses occurs is a prerequisite to understanding tuple.

Parentheses to disambiguate order of operation: (expr1 + expr2) * expr3.
- There must be no comma inside the outer parentheses.
- There may be comma inside the expr1 or expr2.
Parentheses to invoke functions: function(arg1, arg2).
- The name function must be a function.
Parentheses for tuple: (expr1, expr2).
- There must not be name before the parentheses.
- Note that (expr) cannot create a tuple because that is already the same as case 1.
  - Hence, we use (expr,) to differentiate the two cases.

Given that, let us give some concrete example of a tuple.

Tuple

>>> ()   # empty tuple
()
>>> (1,) # singleton
(1,)
>>> (1)  # not a tuple!  has no comma!
1
>>> (1, 2, 3)
(1, 2, 3)

Bad Practice

Although we can have trailing comma, we cannot have a blank expression. This is usually indicated by a presence of two consecutive commas, except when it is for the first element. So the following examples causes error.

>>> (,)
SyntaxError
>>> (1,,)
SyntaxError
>>> (1,,2)
SyntaxError

Anything?

We say that a tuple is a sequence of anything, but what is anything? We can actually put any values inside a tuple. These values do not have to be of one type either. So we can have a tuple containing both a string and an integer as shown below.

Mixed Type

>>> ('CS', 1010, 'S')
('CS', 1010, 'S')

But that is not yet anything. What is the latest value we have learnt? That right, we can also create a tuple within a tuple. This is a glimpse of a nested structure we will explore in the future. A possible use is a matrix which is a 2-dimensional structure.

Nested Tuple

>>> (('CS', 1010, 'S'), ('is', 'easy'))
(('CS', 1010, 'S'), ('is', 'easy'))

Box-and-Arrow Diagram

Before we look at the operations that can be performed on a tuple, it will be easier to explain them if we can visually represent a tuple. The representation we choose is as follows. Consider the tuple tpl = (1, 2, 3) first.

BoxArrow01

Let us break down the meaning of the box-and-arrow diagram a little here. On the leftmost, we have the variable x. This is represented by an open box because we can modify the value of x via an assignment (e.g., tpl = (3, 2, 1)). So the value can be changed.

It then has an arrow to a sequence of three elements. Whenever the value inside the box is an arrow, we need to indicate where it starts. So we put the symbol ✱ as the origin of this arrow.

The arrow and the symbol ✱ have another meaning. The actual value inside the box is the address of the tuple. But remember that we are abstracting the address away. The address is managed by the computer so we do not care where it is located in memory. We are only interested in the fact that the value should be whatever the address of that tuple is. So we also abstract this idea away to simply connect the two location via an arrow. In short, when we put ✱ inside the box, that is the actual value in the box. This will have an impact on function call.

Now, the positive index is shown at the top for clarity, but it is not necessary. The element at index 0 is 1, the element at index 1 is 2, and the element at index 2 is 3. This corresponds to the expression (1, 2, 3).

Notice how the box is closed. This is because we cannot re-assign the value once it is created. We choose the term re-assign carefully as modify is ambiguous. There are different ways of modification without potentially re-assigning values. But for now, we have no ways of doing that.

Hopefully that is simple enough if we ignore the open/close box convention which will only be useful later. What about nested tuple? Consider the tuple tpl = (('CS', 1010, 'S'), ('is', 'easy')). The box-and-arrow diagram is shown below.

BoxArrow02

Note the difference. The first element of x is the entire expression ('CS', 1010, 'S') (shown at the bottom to avoid crossing arrows). So the first element itself is a tuple! Similarly, the second element is also a tuple, but a different one from the expression ('is', 'easy').

Box-and-Arrow for String

You may realize something weird here, why don't we represent a string such as 'CS' with a sequence of boxes too? The reason is simple, the weird expression 'X'[0][0][0][0] being evaluated to 'X' necessitates the need for a loop from the element to itself. Yet at the same time, the value inside the box should be the ✱ to indicate that it has an arrow but also the character 'X'. This becomes more and more like string theory with the superposition of values and the string forming a loop. So we choose to simplify the representation of a string with the entire value written inside the box. The indexes are left as an exercise to the reader.

Sequence Operations

With the help of the box-and-arrow diagram, we can figure out most of the operations.

Indexing

The first operation we are going to discuss is indexing. We will use the example of the nested tuple (i.e., tpl = (('CS', 1010, 'S'), ('is', 'easy'))). Say we want to get the first element. How can we do it? As before, we simply write tpl[0]. What is the result? Well, the first element is a tuple, so we have this entire tuple ('CS', 1010, 'S').

>>> tpl = (('CS', 1010, 'S'), ('is', 'easy'))
>>> tpl[0]
('CS', 1010, 'S')

Okay, that is easy. But now note that since tpl[0] is a tuple, we can also ask what is the second element of this tuple. If the resulting tuple is in variable y, then we can get the second element as y[1]. However, because y is in fact tpl[0], we should be able to write it as tpl[0][1]. And indeed, that is the case.

>>> tpl[0][1]
1010

This will get more complicated with more nesting. Luckily, we have the box-and-arrow diagram. At the very least, we will assume that we can draw the box-and-arrow diagram correctly. If not, we need to practice drawing it correctly. But let us assume we can. Then we can describe the operation of seq[idx] as simply the following assuming that there is no error.

Indexing

Evaluate seq by finding the variable called seq.
- This variable should have the value of ✱.
Follow the arrow from the value ✱ inside the box.
- This should lead to a sequence where idx is a valid index.
Retrieve the value at idx.
- If idx is negative, use negative index.

Then we can explain tpl[0][1] as first evaluating the leftmost operation tpl[0] before evaluating the result with the remaining [1]. Each one will be evaluated based on the indexing procedure above. We will highlight the full step below in tabs. Follow the tabs to see the progressing.

With this procedure, we can evaluate any indexing as long as we can draw the box-and-arrow diagram.

Length

There is another advantage of the box-and-arrow diagram. Using the diagram, the answer to len(x) is quite obvious. In this case, it really is just the number of element in the tuple represented by x, which is 2.

BoxArrow04

We do not look further inside to count the endpoints. This corresponds nicely to the idea from before that for a sequence of n elements, the valid indexes are shown below.

Positive: 0 to n - 1
Negative: -1 to -n

Connecting back to the nested tuple above, tpl[2] (and higher) as well as tpl[-3] (and lower) will give us IndexError.

>>> tpl = (('CS', 1010, 'S'), ('is', 'easy'))
>>> tpl[2]
IndexError
>>> tpl[-3]
IndexError

Slicing

As for slicing, there is actually no change to the procedure. Also note that we will be constructing another tuple as the result. The original tuple will not be modified in any way. Our procedure simply states the following.

... we start from the element at the index determined by start ... until we reach the element determined by stop.

So as long as we can do indexing, we can do slicing. But it is still instructive to show the behavior since we have nesting. Let us expand the tuple into the following.

tpl1 = (('CS', 1010, 'S'), 'is', ('very', 'very'), 'easy')
tpl2 = tpl1[1:-1:1]
print(tpl2)

1	`('is', ('very', 'very'))`

Notice that the element at stop is the entire string 'easy'. Therefore we exclude this but we will include the element before this, which is a tuple ('very', 'very'). This tuple is copied as it is. The box-and-arrow diagram before and after the slicing is shown below.

BeforeAfter

BoxArrow05A

BoxArrow05B

The tuple created by slicing (i.e., tpl2) is a new tuple as seen from a new sequence of boxes. Also, the value of 'is' and 'easy' are copied. Can we also say the same thing for the value of tpl1[2]? If we say that the value is whatever is inside the box, this is exactly the arrow starting from ✱. We are --in fact-- copying this arrow.

This has a very important implication. The actual tuple ('very', 'very') is not duplicated. The two arrows (i.e., one from tpl1[2] and the other from tpl2[1]) are pointing to the same location. This is called aliasing and will be the source of many problems in the future. For now, this will not really cause us problems but it is good to know potential problems in advanced.

Aliasing

An aliasing happens when we have two different arrows pointing to the same location. This means that the data is shared between the two location. If we have two variables v1 and v2 pointing to the same location, then we say that v1 is an alias of v2 (_and vice versa).

Iteration

With all the work we have done, the explanation for iteration is also simple. We will simply indexing the element one by one from left to right. The potential confusion is the same as before, what is an element. But hopefully by now it is clear that an entire tuple can be a single element of another tuple.

tpl = (('CS', 1010, 'S'), 'is', ('very', 'very'), 'easy')
for elem in tpl:
  print(tpl)

('CS', 1010, 'S')
'is'
('very', 'very')
'easy'

Tuple Operation

Similar to string, there are arithmetic-like operations for tuple.

Arithmetic-Like Operations

Operation	Symbol	Example Code	Result
Repetition	`*`	`2 * ('CS', 1010, 'S')`	`('CS', 1010, 'S', 'CS', 1010, 'S')`
Concatenation	`+`	`('CS', 1010) + ('S',)`	`('CS', 1010, 'S')`

Also similar to string, these will always create a new tuple. However, the element may be shared (i.e., aliased). Relational operation on tuple also follows the same procedure as string. But because tuple can now be nested, the comparison will be also nested in a way. Our example below uses < but the same nesting comparison also works for other relational operator like ==³.

Tuple Relational Operation

>>> t1 = (1, 2, 3)
>>> t2 = (1, 2)
>>> t3 = (2, 1, 3)
>>> t1 < t2         # (1, 2, 3) < (1, 2)  [false because last element of t1]
False
>>> t2 < t3         # (1, 2) < (2, 1, 3)  [true because first element of both]
True
>>> (t1, t2) < (t2, t3)  # False because t2 < t1
False
>>> (t2, t1) < (t3, t2)  # True because t2 < t3

There is an additional problem that may arise because a tuple can be a sequence of anything. We may have two elements to be incomparable. This was not a problem in string because the element of a string is somehow always a string. But not so in tuple. This may lead to the following common mistakes.

Common Mistakes

>>> (1, 2, 3) < ("1", "2", "3")  # cannot compare int and str
TypeError
>>> (1, 2, 3) < ((1,), (2,))     # cannot compare int and tuple
TypeError

What this also means is that certain built-in functions that requires comparison may not always work. For instance, the built-in function min or max may not work if there are incomparable elements in the tuple.

Recap that there is another relational operator for sequence called the in operator, There is also the corresponding not in operator. This operator behaves differently from string. In the case of string, we are looking for a consecutive substring.

Recap of String in Operator

>>> "bcd" in "abcd"
True
>>> "ad" in "abcd"
False
>>> "abcd" in "abcd"
True
>>> "" in "abcd"
True
>>> "" in ""
True

However, for tuple, we are looking only for a specific element and not for consecutive subsequence. This specific element must also be directly in the tuple and not nested deeper inside.

Tuple in Operator

>>> 2 in (1, 2, 3)
True
>>> (2, 3) in (1, 2, 3)
False
>>> (2, 3) in (1, (2, 3), 4)
True
>>> 2 in (1, (2, 3), 4)
False
>>> () in (1, (2, 3), 4)
False
>>> () in (1, (), 4)
True

We will no longer use monospace font for range. ↩
The overly complicated syntax: (❲ ❬ expression ❭, ❳ ❲ ❲ ❬ expression ❭ ❳ ❲, ❬ expression ❭ ❳✱). Not really much explanation will be given. ↩
And if we have these two we can easily construct the rest. ↩