The online calculator below parses the set of training examples, then builds decision tree, using Information Gain as criterion of a split. If you are unsure what it is all about, read short recall text on decision trees below the calculator.
Note: Training examples should be entered as csv list, with semicolon used as separator. First row is considered to be row of labels, starting from attributes/features labels, then the class label. All other rows are examples. The default data in this calculator is the famous example of data for "Play Tennis" decision tree
A decision tree is a flowchart-like structure in which each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represent classification rules.1
Let's look at the calculator's default data.
Attributes to be analyzed are:
- Outlook: Sunny/Overcast/Rain
- Humidity: High/Normal
- Wind: True/False
- Temperature: Hot/Mild/Cool
Class label is:
- Play: Yes/No
So, by analyzing the attributes one by one, algorithm should effectifely answer the question: "Should we play tennis?" Thus, in order to perform as less steps as possible, we need to choose the best decision attribute on each step. The one which gives us the maximum information. This attribute is used as first split. Then process continues until we have no need to split anymore (after the split all remaining samples are homogeneous, in other words, we can identify the class label), or there are no more attributes to split on.
The generated decision tree first splits on "Outlook". If the answer is "Sunny", then it checks the "Humidity" attribute. If the answer is "High", then it is "No" for "Play". If the answer is "Normal", then it is "Yes" to "Play". If the "Outlook" is "Outcast", then it is "Yes" to "Play" immediately. If the "Outlook" is "Rainy", then it needs to check "Windy" attribute. Note that this decision tree does not need to check the "Temperature" feature at all!
You can use different metrics as split criterion, for example, Entropy (via Information Gain or Gain Ratio), Gini Index, Classification Error. This particular calculator uses Information Gain.
You might think why we need decision tree if we can just provide the decision for each combination of attributes. Of course you can, but even for this small example, total number of combinations is 3*2*2*3=36. From the other side, we have just used a subset of combinations (14 examples) to train our algorithm (by building decision tree) and now it can classify all other combinations without our help. That's the point of machine learning. Of course, there are many implications regarding non-robustness, overfitting, biasing, etc, and for more information you may want to check Decision tree learning article on Wikipedia.