Download example (4.44MB) - project and data files.
Short introduction:
Definitely, neural networks will classify better and faster than simple algorithms like
kNN,
BayesG
or LBGBins
(estimators of conditional likelihood) in most real-life cases. These techniques are implemented
here mainly for comparison and may be helpful if you've got no idea how good results should be expected from the network or you experience difficulties in network training.
This example shows how to configure classification block in a most simple way. Input data has been generated in the same way as for previous examples, but training set is much bigger to avoid
any statistical effects (these algorithms are less resistant to the poor statistics than networks). Results are compared also with the neural network (MLP, dynamic structure) that was trained on the same big training set (it took quite a long time and such a big set is not necessary, but I wanted to keep the same conditions).
In the second part the same comparison
is done for the real-life data from COMPASS experiment (this data is not
included in project files).
Remember: in most real-life cases it is necessary to use some preprocessing (N(0,1) normalization at least); this is true for all implemented algorithms.
What is in the project:
Project contains classification block set up to perform bayesian estimation
(BayesG) with a width parameter optimal to the training data (this training set is used as testing data in previous examples). Classification is done on a testing set of another 290k events. Calculation time for such a big sets is very long, if you want to see results quickly - use smaller sets from previous examples, file format is the same.
As usual, set of uniformly distributed events is attached to the project (2D test block). Use it to visualize output of the classificators as a function of inputs in XY space. Classified events are forwarded to DataSets connected to the output of Classify block (true "signal" and true "background" events are separated using forwarding filter rules).
How to run this example:
Open Setup dialog window, push the Go button of the Classify block and wait until red indicator on it will turn green again. Classification results are stored in output vectors of testing_set events and in true signal / true background blocks (separated basing on a t1 target value). Use testing_set as a data source for Signal Selection graph to obtain purity-efficiency curve. Connect 2D test block to the input of Classify block if you want to see the classificator output on the XY plane. Change classification algorithm if you like and release calculations again.
Default algorithm configured in project is
BayesG estimator of conditional likelihood. It attaches multidimensional gaussian functions with a specified width parameter to each event in the training set. Estimated likelihood is calculated as a sum of gaussians for "signal" events over
the sum of gaussians for all events. Result is closing to analytical solution
(conditional likelihood calculated from true probability density functions
of all classes) with growing number of training events (and smaller values
of width). Influence of the width parameter is shown below:
Wider functions smoothen the output, but also loose
more details. The best value should be chosen basing on the purity-efficiency plots in interesting range - curves may cross each other, like in this example:

whole range

moderate purities and efficiencies

highest purities
Performance of kNN, LBGBins
and
BayesG algorithms is comparable to the
neural network only when big enough training set is used; results become
rough when statistics is lowered; it is hard to obtain significantly
different result for this simple 2-dimmensional example, but the effect is well
visible in the results presented in the
second part of this section. Following images show the classificators output for
kNN
algorithm with number of neighbors set to 32, LBGBins algorithm with 512 sectors
and
BayesG
algorithm with width
= 0.06. For comparison, neural network was also trained on the same training set and its output is also shown. Weights of this network are attached to the zip file (network_weights.NetBin).

kNN | 
LBGBins |

BayesG | 
Neural network |
And finally, purity-efficiency for various classificators:

whole range

highest purities

highest efficiencies