NetMaker

Data Sets

Formatting string syntax
Math and filter expression syntax

DataSet setup (Setup button)
   - reading data from the file
   - writing data to the file
   - displaying DataSet contents
Receiving data from Transform / Network block (>> button)
Connecting DataSet to Transform / Network block (I/O button)

DataSet is a collection of events. Each event consist of an input, output and target vector. When events are processed, network reads in the input vectors and stores the answers in the output vectors. Target vectors are used in a supervised training process to keep the desired network answers for each event in the training DataSet. Target vectors can be used also to calculate the error or to plot the Signal Selection graphs (for classification tasks), when the network is tested. Each event can be accompanied by the vector of not-used values that are available to filter calculations, to pass to the next blocks or for various plots.

All data (network inputs, outputs, desired answers and not-used values) can be stored in an ASCII or binary file. To add the DataSet block to the project choose menu Edit - Add Compnent - Data Set.

Formatting string syntax

Formatting strings are used to describe format of the data files and the mapping infos. Each string consists of the semicolon separated entries (vector elements) composed of the entry type and index; available types are: i (input), o (output), t (target), b uncertainty, n (nothing or not-used) e (error); 1-based index indicates the place in the corresponding input / output / target / error_bar / not-used vector; n_xx entries compose the vector that is not used during the training nor testing the network (however this vector could be used to filter other data when creating graphs, writing output files, to keep and pass forward values that are not used at the current stage, or as the event weights during the training).

Examples:

i1; i2; i3; t1; t2 - if applied to input/output file format: file contains events with input vectors (length = 3) followed by target vectors (length = 2); appropriate output vectors (of the length = 2) will be created automatically (and initialized to "0").
i1; o1; t1; i2; o2; t2 - if applied to input/output file format: all data (input / output / target) is stored in file, but vector components are mixed.
n1; i1; i2; n2; t1 - if applied to input/output file format: there are two numbers which are not used, put them into the not-used vector; except that, two-element input vector and single target value is available for each event.
ri:1-10; o1; t1 - sequence of entries can be replaced with the single range entry rt:start-stop, where t is the entries type, start and stop are the indices of the first and the last entry respectively; this format string is equivalent to i1; i2; i3; i4; i5; i6; i7; i8; i9; i10; o1; t1;
ri:1-1000; rt:1-16; ri:1001-1024 - another example of the range entry use.

Math and filter expression syntax

Math expression parser is available for:

all filter-expression text-boxes (expresion should evaluate to boolean value);

Output Format tab of all Setup dialog windows (numerical value should be returned);

weight variable in the network training setup (numerical value should be returned).

user-defined error function and its derivative (numerical value should be returned, use x symbol as a function argument)

histogram and scatter plot variables

C# syntax is used (practically same as C++). Remember that single precision floating point format is used for data vector elements, so correct comparison, for example, is: t1 == 0.05F (without F, number is treated like a double precision, and is slightly different than 0.05F). Vector elements are accessed in the standard way described above: i1, o5, etc. All math functions have intuitive aliases: sin(), log(), sqrt(), .... Two additional variables are available with each event: index - the 0-based index of the event in DataSet; size - number of the events in DataSet.

Examples:

i1 > 3.0F - all events with the first input vector element above 3.0F meet this condition and will be put on the graph or passed to the next block - depending on the place where this filter expression was used.
(i2 >= 2*i3)&&(t1 == 0.9F) - another example of the filter expression.
(index % 2) == 1 - only odd events will pass through this filter expression.
size - index < 10 - last ten events will pass through this filter expression.
0.5*o1 + 2*log(n1) - calculates value that may be used as a vector element for events passed to the next block (via output connection and mapping).
x * sqrt(x) - example of an expression that may be used as a definition of the error function

DataSet setup (Setup button)

DataSet setup options allow input/output operations and viewing the event data. Block and project names may be chnged here (names should not contain "\" symbols). All the other options are divided in tabs.

reading data from the file

dialog window - read data

Source Files: List of files to be read into the DataSet's event collection. Click the Add / Remove buttons to change the contents. Files in the list must be of the same type (ASCII or Binary) and format.

Format: Represents data alignment in the source file (Binary or ASCII either).
Data in ASCII files should be aligned in lines containing following events. Values in each line should be separated with white spaces (commas are accepted also); decimal separator is: "." (dot).
Data in Binary files should be written as a raw 32-bit floating (single precision) point values. There are no event separators. Data is read and interpreted basing on the format string only.

Error Fn: Error function that is to be minimized during the training process. More on the error functions here.

invert: Flips asymmetric error functions e_inverted(x) = e(-x); has no effect for symmetric functions.

err: Shows the last calculated error value.

err( x ): Error function expression, enabled for User Defined error function.

d[err( x )] / dx: Error function derivative expression, enabled for User Defined error function.

#outputs: If there are no target and output entries in the format string (only network inputs are written in the source file) this option lets you allocate memory for the specified number of outputs.

Load: Push this button to load/reload data from file(s). If data reading is successful, information about the total number of samples and allocated memory appear in the label at the bottom-right corner of the dialog window.

Shuffle: Shuffles events in the DataSet.

Internal source: When checked, disables Source File group box and enables receiving data from Transform or Network blocks.

writing data to the file

dialog window - save data

Name: Name of the output file. Click button "..." to select the file names with a dialog window.

Format: Represents data alignment in the destination file.

Filter: Events are written to the file only if they meet specified condition (see filter syntax). If you want to disable filter, just put n character (not-used entry without the index) in the text box.

Sample range: Range of the indices of the events to be written to the file.

Save: Push this button to write down data to the file.

displaying DataSet contents

Preview tab allows viewing data vector contents. The output column shows "-" if the event is not processed by the network. Variable names (displayed in column headers) may be changed - click on the column header, type new name in the text-box and click "√" button to accept changes. Then save the project file to keep these changes permanent.

dialog window - view data

<- previous 50 / next 50 ->: shows previous / next group of 50 events from the DataSet's event collection.

<< first / last >>: shows first / last group of 50 events from the DataSet's event collection.

Receiving data from Transform / Network block (">>" button)

Button >> is enabled only if the Internal source option is checked. It opens the following dialog window:

Common Connection Add dialog window.

Double-click on the item from the Available list adds the new connection. Double-click on the item from the Added list removes connection. When new connection is created, default mapping is assigned at the Transform / Network block for the DataSet that is establishing the connection. Mapping change is possible through the Setup button and Output Format tab of the corresponding Transform / Network.

Connecting DataSet to Transform / Network block (I/O button)

Connects input of the Transform / Network block to the DataSet. I/O button opens the common Connection Add dialog window.

Manual

Data Sets

Examples:

Examples: