Lecture 4 Math Explained - Decision Trees

CS2109S AY 2024/25 Semester 2

Lecture 4 Math Explained - Decision Trees

Entropy and Information Gain

Entropy measures the level of information associated with a variable. For example, as the answer to a True/False question has 2 possible outcomes, it conveys 1 bit of information. On the other hand, the answer to a multiple-choice question with 4 options conveys 2 bits of information. (We assume the outcomes are uniformly distributed here.)

Now, suppose an event happens with probability $p$. Since the event happens once every $\dfrac{1}{p}$ times, intuitively, the fact that this event happened would convey $\log \left( \dfrac{1}{p} \right)$ bits of information. Using this intuition, we can define the information content (also known as surprisal) of an event $e$ as:

$$I(e) = \log \left( \dfrac{1}{p} \right) = - \log p$$

If $p = 1$, then $I(e) = 0$ (the fact that the event happens does not convey any information, i.e. we won't feel surprised at all).
If $p = 0$, then $I(e) = \log \dfrac{1}{0}$ is undefined (the event would not happen). If $p$ is close to 0, then $I(e)$ tends to infinity (it can convey LOTS OF information).

Entropy measures the expected amount of information conveyed by identifying the outcome of a random trial.

$$H(X) = \sum_{e \in E} P(e) I(e) = - \sum_{e \in E} P(e) \log P(e)$$

Information Gain measures the effect of splitting the data by an attribute on its entropy. Therefore, the information gain is the difference between the entropy before the split and the weighted entropy after the split (a larger subset should be given more weight).

$$IG(D, A) = H(D) - \sum_{v \in A} \dfrac{|D_v|}{|D|} H(D_v)$$

where $D$ is the dataset before the split and $D_v$ is the subset of data with value $v$ after splitting by attribute $A$.

Table of Entropy Values

Individual Terms

No. of positive samples No. of samples	1	2	3	4	5	6	7	8	9	10
1	0
2	0.5	0
3	0.5283	0.3900	0
4	0.5	0.5	0.3113	0
5	0.4644	0.5288	0.4422	0.2575	0
6	0.4308	0.5283	0.5	0.3900	0.2192	0
7	0.4011	0.5164	0.5239	0.4613	0.3467	0.1906	0
8	0.3750	0.5	0.5306	0.5	0.4238	0.3113	0.1686	0
9	0.3522	0.4822	0.5283	0.5200	0.4711	0.3900	0.2820	0.1510	0
10	0.3322	0.4644	0.5211	0.5288	0.5	0.4422	0.3602	0.2575	0.1368	0

Entropy with Two Outcomes

No. of positive samples No. of samples	1	2	3	4	5	6	7	8	9	10
1	0
2	1	0
3	0.9183	0.9183	0
4	0.8113	1	0.8113	0
5	0.7219	0.9710	0.9710	0.7219	0
6	0.6500	0.9183	1	0.9183	0.6500	0
7	0.5917	0.8631	0.9852	0.9852	0.8631	0.5917	0
8	0.5436	0.8113	0.9544	1	0.9544	0.8113	0.5436	0
9	0.5033	0.7642	0.9183	0.9911	0.9911	0.9183	0.7642	0.5033	0
10	0.4690	0.7219	0.8813	0.9710	1	0.9710	0.8813	0.7219	0.4690	0

Last updated: 30 January 2025