ID3
A higher tendency to choose an attribute containing more values
C4.5
SplitInfonumber of distinct values, penalize an attribute containing more values
CART
Measurments
Lift Chart
- Sort by predicted value then actual yes
- Draw lift curve (cumulative actual yes)
- The larger the area between lift curve and base line the better
Decile-wise Lift Chart
- Sort by predicted value then actual yes
- Draw vertical line by actual yes ()
- Draw vertical line by decile ()
- Compute decile mean ()
- Compute global mean ()
- Draw chart ()
XLMiner
Glossaries
- Success probability cutoff terminal node yes/no cutoff
Partitioning
- Training set
- Create
- Validation set
- Adjust/Fine-tune
- Test set
- Test
Usually it is training 50%, validation 30%, test 20%
Categories Reduction
- Automatic
- Sort values by frequency
- Assign categories in descending order of frequency
- Remaining values will be assigned to the last category
- Manual
This might be needed, if input/output attribute has too much distinct values.