SYNOPSIS
svm-subset [ -s method ] dataset number [ output1 ] [ output2 ]
DESCRIPTION
Training large data is time consuming. Sometimes one should work on a smaller subset first. The python script subset.py randomly selects a specified number of samples. For classification data, we provide a stratified selection to ensure the same class distribution in the subset.
OPTIONS
- -s method
- 0
- -- stratified selection (classification only) (default)
- 1
- -- random selection
- output1
-
- The subset. If output1 is omitted, the subset will be printed on the screen.
- output2
-
The rest of data.
EXAMPLES
- svm-subset heart_scale 100 file1 file2
From heart_scale 100 samples are randomly selected and stored in file1. All remaining instances are stored in file2.