ID3

A higher tendency to choose an attribute containing more values

C4.5

SplitInfo number of distinct values, penalize an attribute containing more values

CART

Measurments

Lift Chart

  1. Sort by predicted value then actual yes
  2. Draw lift curve (cumulative actual yes)
  3. The larger the area between lift curve and base line the better

Decile-wise Lift Chart

  1. Sort by predicted value then actual yes
  2. Draw vertical line by actual yes ()
  3. Draw vertical line by decile ()
  4. Compute decile mean ()
  5. Compute global mean ()
  6. Draw chart ()

XLMiner

Glossaries

  • Success probability cutoff terminal node yes/no cutoff

Partitioning

  • Training set
    • Create
  • Validation set
    • Adjust/Fine-tune
  • Test set
    • Test

Usually it is training 50%, validation 30%, test 20%

Categories Reduction

  • Automatic
    1. Sort values by frequency
    2. Assign categories in descending order of frequency
    3. Remaining values will be assigned to the last category
  • Manual

This might be needed, if input/output attribute has too much distinct values.