E
The rationale for this is as follows: –log2(p) is the amount of information in bits associated with an event of probabilityp - for example, with an event of probability ½, like flipping a fair coin, log2((p) is –log2(½) = 1, so there is one bit of information. This should coincide with our intuition of what a bit means (if we have one). If there is a range of possible outcomes with associated probabilities, then to work out the average number of bits, we need to multiply the number of bits for each outcome (–log2(p)) by the probabilityp and sum over all the outcomes. This is where the formula comes from.
Entropy is used in the ID3 decision tree induction algorithm.
Error backpropagation learning is often familiarly referred to just as backprop.
The "point" defined by the current set of weights is termed a point in weight space. Thus weight space is the set of all possible values of the weights.
See also local minimum and gradient descent.
S | is the set of instances in a node |
k | is the number of classes (e.g. 2 if instances are just being classified into 2 classes: say positive and negative) |
N | is the is the number of instances in S |
C | is the majority class in S |
n | out of N examples in S belong to C |