CS2109S AY 2024/25 Semester 2
Lecture 4 Math Explained - Decision Trees
Entropy and Information Gain
Entropy measures the level of information associated with a variable. For example, as the answer to a True/False question has 2 possible outcomes, it conveys 1 bit of information. On the other hand, the answer to a multiple-choice question with 4 options conveys 2 bits of information. (We assume the outcomes are uniformly distributed here.)
Now, suppose an event happens with probability $p$. Since the event happens once every $\dfrac{1}{p}$ times, intuitively, the fact that this event happened would convey $\log \left( \dfrac{1}{p} \right)$ bits of information. Using this intuition, we can define the information content (also known as surprisal) of an event $e$ as:
$$I(e) = \log \left( \dfrac{1}{p} \right) = - \log p$$
- If $p = 1$, then $I(e) = 0$ (the fact that the event happens does not convey any information, i.e. we won't feel surprised at all).
- If $p = 0$, then $I(e) = \log \dfrac{1}{0}$ is undefined (the event would not happen). If $p$ is close to 0, then $I(e)$ tends to infinity (it can convey LOTS OF information).
Entropy measures the expected amount of information conveyed by identifying the outcome of a random trial.
$$H(X) = \sum_{e \in E} P(e) I(e) = - \sum_{e \in E} P(e) \log P(e)$$
Information Gain measures the effect of splitting the data by an attribute on its entropy. Therefore, the information gain is the difference between the entropy before the split and the weighted entropy after the split (a larger subset should be given more weight).
$$IG(D, A) = H(D) - \sum_{v \in A} \dfrac{|D_v|}{|D|} H(D_v)$$
where $D$ is the dataset before the split and $D_v$ is the subset of data with value $v$ after splitting by attribute $A$.
Table of Entropy Values
Individual Terms
No. of positive samples No. of samples |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0 | 0 | |||||||||
2 | 0 | 0.5 | 0 | ||||||||
3 | 0 | 0.5283 | 0.3900 | 0 | |||||||
4 | 0 | 0.5 | 0.5 | 0.3113 | 0 | ||||||
5 | 0 | 0.4644 | 0.5288 | 0.4422 | 0.2575 | 0 | |||||
6 | 0 | 0.4308 | 0.5283 | 0.5 | 0.3900 | 0.2192 | 0 | ||||
7 | 0 | 0.4011 | 0.5164 | 0.5239 | 0.4613 | 0.3467 | 0.1906 | 0 | |||
8 | 0 | 0.3750 | 0.5 | 0.5306 | 0.5 | 0.4238 | 0.3113 | 0.1686 | 0 | ||
9 | 0 | 0.3522 | 0.4822 | 0.5283 | 0.5200 | 0.4711 | 0.3900 | 0.2820 | 0.1510 | 0 | |
10 | 0 | 0.3322 | 0.4644 | 0.5211 | 0.5288 | 0.5 | 0.4422 | 0.3602 | 0.2575 | 0.1368 | 0 |
Entropy with Two Outcomes
No. of positive samples No. of samples |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0 | 0 | |||||||||
2 | 0 | 1 | 0 | ||||||||
3 | 0 | 0.9183 | 0.9183 | 0 | |||||||
4 | 0 | 0.8113 | 1 | 0.8113 | 0 | ||||||
5 | 0 | 0.7219 | 0.9710 | 0.9710 | 0.7219 | 0 | |||||
6 | 0 | 0.6500 | 0.9183 | 1 | 0.9183 | 0.6500 | 0 | ||||
7 | 0 | 0.5917 | 0.8631 | 0.9852 | 0.9852 | 0.8631 | 0.5917 | 0 | |||
8 | 0 | 0.5436 | 0.8113 | 0.9544 | 1 | 0.9544 | 0.8113 | 0.5436 | 0 | ||
9 | 0 | 0.5033 | 0.7642 | 0.9183 | 0.9911 | 0.9911 | 0.9183 | 0.7642 | 0.5033 | 0 | |
10 | 0 | 0.4690 | 0.7219 | 0.8813 | 0.9710 | 1 | 0.9710 | 0.8813 | 0.7219 | 0.4690 | 0 |
Last updated: 30 January 2025