Formatting string syntax

DataSet setup (Setup button)
   - reading data from a file
   - writing data to a file
   - displaying
DataSet contents
Receiving data from
Transform/Network block (">>" button)
Connecting
DataSet to Transform/Network block (I/O button)

DataSet is a collection of events. Each event consist of an input, output and target vector. When events are processed, network reads in the input vectors and stores the answers in the output vectors. Target vectors are used in a supervised training process to keep the desired network answers for each event in the training DataSet. Target vectors also can be used to calculate the error or to plot the Signal Selection graphs (in classification tasks) when network is tested. Each event can be accompanied by the vector of not-used values that are available to plot on graphs or to pass to the next blocks without any modifications.
All data (network inputs, outputs, desired answers and not-used values) can be stored in an
ASCII or Binary file.

To add the DataSet block to the project choose menu:

  • Edit - Add Compnent - Data Set

Formatting string syntax

Formatting strings are used to describe format of the data files and the mapping infos. Each string consists of the comma separated entries (vector elements) composed of the entry
type and index; available types are: "i" (input), "o" (output), "t" (target), "n" (nothing or not-used) "e" (error); 1-based index indicates the place in the corresponding input / output / target / nothing vector; "nx" entries compose the vector that is not used during the training nor testing the network (however this vector could be used to filter other data when creating the graphs or writing output files or to keep and pass forward values that are not used at the current stage or as event weights during the training).

Examples:

"i1, i2, i3, t1, t2" - If applied to input/output file format, means: file contains events with input vectors (length = 3) followed by target vectors (length = 2); appropriate output vectors (of the length = 2) will be created automatically (and initialized to "0"). If applied to forward mapping, means: destination event vector will be created from source event input vector (its first, second and third element) and target vector (its first and second element).
"i1, o1, t1, i2, o2, t2" -
If applied to input/output file format, means: all data (input / output / target) is stored in file, but vector components are mixed. If applied to forward mapping, means: put into the destination event vector following values: source event input vector first element, then source event output vector first element, then source target vector first element, then source input vector second element and so on...
"n1, i1, i2, n2, t1" - If applied to input/output file format, means: there are two numbers which are not used, put them into the not-used vector. If applied to forward mapping, means: mix the not-used values with the input values and with the target value from source event and put it all into the destination event vector.
"ri:1-10, o1, t1" - Sequence of entries of any type can be replaced with single
range entry rt:start-stop, where t is the entries type, start and stop are the indexes of the first and last entries respectively; this format string is equivalent to "i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, o1, t1";
"ri:1-1000, rt:1-16, ri:1001-1024" - ...another example of the
range entry use.

Filter string syntax

Filter strings are sets of conditions used to select events when creating graphs, writing data to the file or exchanging events between blocks. Simplified C syntax is used. Each condition consists of two operands (number or vector element encoded like above) and relation operator: ==, !=, >, >=, <, <=. Brackets and logical operators & (AND) and | (OR) may be used.

Examples:
"i1 > 3.0";  "(i2 <= i3)&(t1 == 0.9)".

--top--   --up--

DataSet setup (Setup button)

Setup has only one independent control - block's name. Name should not contain white spaces nor "\" symbols - these characters may lead to mistakes in some cases.
Other controls are grouped in three tabs that allows input/output operations and viewing the contents of
DataSet.

   - reading data from a file

File Names: List of files to be read into the DataSet's event collection. Click the Add / Remove buttons to change the contents. Files in list must be of the same type (ASCII or Binary) and format.

File Format: Represents data alignment in the source file (Binary or ASCII either).
Data in
ASCII files should be aligned in lines containing following events. Values in each line should be separated with white spaces; decimal separator is: "." (dot).
Data in
Binary files should be written as a raw 32-bit floating (single precision) point values. There are no event separators. Data is read and interpreted basing on the format string only.

Error Function: Error function that is to be minimized during the training process. You will need to reload the source files (push the Load button) after changing this option. More on the error functions here.

invert: Flips asymmetric error functions
einverted(x) = e(-x); has no effect for symmetric functions.

#outputs: If there is no
target and output entries in the format string (only network inputs are written in the source file) this option lets you allocate memory for specified number of outputs.

Load: Push this button to load data from file(s). If load is successful, information about the total number of samples and allocated memory amount appear in the label at the bottom-right corner of the dialog window.

Shuffle: Randomly reorders events in the DataSet.

Internal source: When checked, disables
Source File group box and enables receiving data from Transform or Network blocks.

   - writing data to a file

Name: Name of the output file. Click button "..." to select the file name with a dialog window.

Format: Represents data alignment in the destination file.

Filter
: Events are written to the file only if they meet the specified condition (see filter syntax). If you want to disable filter, just put "n"
character (not-used entry without the index) in the text box.

Sample range: Range of the indexes of the events to be written to the file.

Save: Push this button to write down data to the file.

   - displaying DataSet contents

Preview tab allows viewing data vectors contents. If the event is not processed by the network the output column shows "-". Variable names (displayed in column headers) may be changed - click on the column header, type new name in the text-box and click "√" button to accept changes. Then save the project file to keep these changes permanent.

<- previous 50 / next 50 ->: shows previous / next group of 50 events from the DataSet's event collection.

<< first / last >>: shows first / last group of 50 events from the
DataSet's event collection.

--top--   --up--

Receiving data from Transform/Network block (">>" button)

Button ">>" is enabled only if the Internal source setup option is checked. Button ">>" opens following dialog window:

Double-click on the item from the Available list adds the new connection. Double-click on the item from the Added list removes connection. When the new connection is created, a default forward mapping is assigned at the Transform / Network block for the DataSet that is establishing the connection. If you want to change this mapping it is possible through the Setup button and Forwarding tab of the corresponding Transform / Network block.

--top--   --up--

Connecting DataSet to Transform/Network block (I/O button)

Connects input of the Transform / Network block to the DataSet. I/O button opens the common Connection Add dialog window.

--top--   --up--