C
See also propositional learning systems and covering algorithm.
The algorithm - given a set of examples:
D
The local gradient, for an output node, is the product to the derivative of the squashing function evaluated at the total net input to node j, and the error signal (i.e. the difference between the target output and the actual output). In the case of a hidden node, the local gradient is the product of the derivative the squashing function (as above) and the weighted sum of the local gradients of the nodes to which node j is connected in subsequent layers of the net. Got it?