Scoping
Learning Objectives
At the end of this sub-unit, students should
- appreciate the benefit of scoping of variables.
- understand lexical scoping of variables.
- know how to resolve variables to values based on scoping.
- know how to evaluate functions with access to global.
Naming Conflict
Without function, we have to use conventions to resolve name conflicts.
In particular, the variables that are deemed local to a sequence of code should not be used outside of this code.
That leads to a lot of complication as our code gets bigger.
Imagine not being able to name a variable student_name
because the same name has been used in other unrelated parts of the code.
The way we represent our function is a black box where we do not care about the name of the variables local to the function. How does this work? To answer this, we need to understand how variables are created.
There is no variable declaration in Python. But based on the rule of assignment, a variable is created if the name does not exist. Unfortunately, this rule is incomplete. To make the rule complete, we need to know where a variable is located.
In Python, an assignment to a variable1 will create the variable within the scope of the function. This creation is done even before the assignment is executed. We call this the local scope as it is local to the function. Alternatively, if a variable is assigned outside of any function, we call that the global scope. Global scope is available to all functions but local scope is only available to the function.
This may create a rather confusing code if we are not careful. So let us illustrate this with several examples. The first set of example is to show that we can access variables in the global scope.
Here, n
is created in the global scope.
The function f
can access global scope so it will print 3
.
Similar to before, but note that defining f
does not evaluate the body.
So as long as n
is declared before f
is invoked, there is no error.
The second set of examples is to show that assignment inside the function will create a new variable in the local scope. If this variable has the same name as another variable in the global scope, then the global variable is shadowed by the local variable. When we refer to this name, we will be accessing the local version.
At Line 3, we create a new variable that is available only locally within f
.
So the global n
is unchanged even by the assignment n = 99
at Line 3.
This is seen from Line 8 where 3 is still printed.
This is a confusing behavior because print(n)
appears before n = 99
.
So we might expect print(n)
to refer to global version.
Instead, it still refers to the local version.
Since call by value behaves as if the arguments is assigned to the parameter. We can think of it as if there is an assignment that creates the local variable with the same name as the parameter. So parameter is treated like local variable.
Think of it like a "summarizing" process. When we define a function, we analyze the function such that we know which variables are local to the variable. So if we look at the example in "Local Variable #2", we can summarize this as follows.
From this summary, it is hopefully clear that print(n)
refers to the local n
regardless of whether there is a variable n
declared globally or not.
Here we have variable n
declared locally but without any value.
The value should be treated as unbounded instead of nothing because we represent nothing with None
.
However, the value is not even None
, it is simply unbounded.
Hence, we get UnboundLocalError
if we try to get the value of the variable via substitution2.
The illustration with toilet roll is shown below.
An unbounded variable does not even have an empty toilet roll.
But at least there is still the place to eventually hold one.
An undeclared variable does not even have a hypothetical space.
This corresponds to a NameError
.
Local and Global State
With function, we now have two different scopes. One scope is available globally and the other locally within each function. The same name can exist in both scope so we need a different way to represent the state of the program at a particular line in the code. We need a state that can capture both global and local scope.
The actual state memory model of Python is much more complicated than what we are going to use here. But the complication is the result of function closure due to higher-order function. As we are not going to go into the details of higher-order function, this representation will be sufficient for our purpose.
We will now represent our state as a "chain" of scopes.
We use lambda (i.e., λ) to represent local scope and we use gamma (i.e., γ) to represent global scope3. If the context is unclear, we may also add the function name as a subscript (e.g., λfactorial(5)) to indicate that we are referring to the local scope of factorial when invoked with argument with the value of 5. This last part is quite important because a scope will only be used when we are actually executing the function body.
So now, we can explain the behavior of global and local variables above in greater details.
The highlighted parts are function definition.
We highlighted them because the function is not executed.
Instead, we treat it as a whole and add a mapping from the name to the function definition.
Also note the use the symbol n ↦ ∅
to represent that n
is inside the current scope but currently unbounded.
In particular, note that if we try to substitute the name n
with the value ∅
, we get UnboundLocalError
.
The executions for global variables are shown below. Because some execution produces error, we will stop the execution at that point.
The executions for global variables are shown below. We leave "Local Variable #3" as an exercise for the reader.
Observe from the execution of local variables, we have variable n
in both the local and global scope.
Any changes made to the local n
will not affect the global n
.
This is good for making a self-contained function.
But it is not good if we want our function to modify global values.
The concensus on the best practice is that we want our function to be self-contained. So making a self-contained function should be easier than the opposite. This is why the behavior is as shown above. Another reason is because there is no explicit variable declaration. So a convention have to be adopted and the convention adopted is the one that makes writing self-contained function easier.
Bad Practice
We mentioned that finding all the locals is like a "summarizing" process. In fact, we have shown that even if the assignment is never executed, the variable is still considered inside the function.
So in the example below, f
will not cause produce an error but g
will cause an error.
In both cases, the execution exits the function immediately when it encounters return n
.
This means you need to be really really careful with the indentation.
Function-Level Scoping
A little bit of clarification is needed about the local scope. We mentioned that local scope is created for each function. So this means that variable declared within another block (e.g., inside if-statement or while-loop) will not create a new variable that exists only within that block. That is why our if-statement and while-loop can work.
We may have taken this for granted earlier, but there are languages where scoping is per block instead of per function. Usually, languages with block-level scoping have an explicit variable declaration. There is even at least one language called JavaScript that allows for both block-level scoping and function-level scoping.
Python Function-Level Scoping
Python uses function-level scoping. In particular, if there is an assignment to a variable inside the function regardless of whether the assignment is executed (or even if it ever will be executed), the variable is in the local scope of the funciton.
Lexical Scoping
In Python, the scope of the variable depends only on the code. In particular, it depends only on the location of the variable in the code. More specifically, where the assignment is located.
If the assignment is located outisde of any function, we say that the variable is in the global scope. Otherwise, it must be within some function. In that case, the variable is in the local scope of the function.
That is merely a rephrasing of what we have said before. So let us focus on what is not said instead. As this is difficult, we will guide you through this. What the definition is not saying is that the scope --and hence the existence-- of a variable does not depend on the previous function. Consider the following code.
We can evaluate the code above and arrive at the following trace that produces a NameError
.
To simplify our state, we will use λf
instead of the longer <def f>
.
This is like mathematics, we will invent simpler notations when necessary. Do not be afraid of notation, it is a powerful tool to have. You can invent your own notation for your own work when necessary but be sure to use the common notation when answering questions.
In this execution, we arrive at NameError
because we follow the execution from global to λf
and then to λg
.
This can be quite troublesome, especially for so little benefit as a NameError
.
Even worse, if there is a loop, then we have to go through the loop until completion.
What we want is a simpler analysis that allows us to quickly determine if a variable exists or not. That way, maybe we can check for error quicker. This analysis can be thought of as the inverse of scoping rule above. Given a variable, can we determine which assignment produces it?
This is the summarizing procedure we had before.
Putting it in context, we can visually represent the functions as box of scope with local variables.
The actual values will be written in the place of the underscore (i.e., _
).
As a summary, this is sufficient.
But we will typically deal only with the functions that are not finished executing yet.
After a function is completed, we can safely remove its box.
As you are summarizing this, note the scope for each function is used directly and not prepended to the front of the scope chain. This is why we have the following series of scopes.
- γ:{ n ↦ 3 , f ↦ λf , g ↦ λg }
- λf:{ m ↦ 2 } ⟶ γ:{ n ↦ 3 , f ↦ λf , g ↦ λg }
- λg:{ n ↦ 5 } ⟶ γ:{ n ↦ 3 , f ↦ λf , g ↦ λg }
Here, the arrow ⟶ corresponds to the scope directly enclosing the current scope. Also note that the third is important because if we were merely prepending the scope, we would have gotten the following instead.
Incorrect Scoping
λg:{ n ↦ 5 } ⟶ λf:{ m ↦ 2 } ⟶ γ:{ n ↦ 3 , f ↦ λf , g ↦ λg }
This distinction is important because if the incorrect scoping is used, we would not have gotten the NameError
because we will have the mapping m ↦ 2
.
A lot of things can be learnt even from an error4.
Try to do this kind of reasoning to fully understand the behavior of Python.
At some point, the amount of explanation we can give is insufficient due to the sheer amount of interactions with other constructs.
Global
If we really really want to modify the global variable, what can we do?
Since every assignment only modifies the current scope by either creating a new variable or modifying its value, how do we modify a variable from outside of the scope?
First, we cannot have a variable name shadowing the outer scope.
Second, we need to add the keyword global
to indicate that a variable is supposed to come from global scope.
Note that this is a bad practice as it makes reasoning about a function more difficult. In general, we want a function to only use all the information available from its parameter. This makes the function behaves as if it is a mathematical function. There is a name for this, it is called pure function.
Call Tree
To fully understand lexical scoping, let us show the behavior on a non-error execution. This will also illustrate the benefit of having functions that prevents clashes in variable names. We will use the following function definitions.
Let us evaluate hypot(3, 4)
.
We use a small font size as we need to show the full evaluation.
Also, we will exclude the function name from λ
as the context is clear.
At this point, you may be wondering if there is a simpler way to understand the behavior of a function call so that we do not have to through that long steps. There is, but it requires us to know clearly what each function does. If the function only depends on the input parameters and has no other side-effect, then we can treat it like a true black box called a pure function. We should strive to make all our functions this way. If we need more information, we can add more parameters if the problem permits. It is often the case that the problem already specifies the required parameters which cannot be changed.
Assuming that all our functions are pure functions, then we can simply write them like the black box we did before.
We put the way the function is invoked inside box to form our call tree.
This way, we do not need to put the input on multiple incoming arrows.
Additionally, we put the return value as a note on the outgoing dashed arrow.
The call tree for the function call hypot(3, 4)
above is shown below.
-
This part will be important later when we have an assignment to a update the content of a mutable element. ↩
-
Notice how the error is different between local and global scope. If the variable is not declared at all --not even globally-- then we get
NameError
. But if the variable is declared locally but not yet bounded to any value then we getUnboundLocalError
. ↩ -
If you know your greek, just remember λ → lambda → l → local and γ → gamma → g → global. ↩
-
There is a name for this kind of scoping mechanism. This is called dynamic scoping as opposed to our lexical scoping which is also called static scoping. ↩