pymvpa2-mkds(1) create a PyMVPA dataset from various sources

SYNOPSIS

pymvpa2 ,mkds /[,--version/] [,-h/] [,-i /[,dataset /[,dataset /...]]] [,--txt-data VALUE /[,VALUE /...] ,| --npy-data VALUE /[,VALUE /...] ,| --mri-data IMAGE /[,IMAGE /...] ,| --openfmri-modelbold SPEC SPEC SPEC SPEC/] [,--add-sa VALUE /[,VALUE /...]] [,--add-fa VALUE /[,VALUE /...]] [,--add-sa-txt VALUE /[,VALUE /...]] [,--add-fa-txt VALUE /[,VALUE /...]] [,--add-sa-attr FILENAME/] [,--add-sa-npy VALUE /[,VALUE /...]] [,--add-fa-npy VALUE /[,VALUE /...]] [,--mask IMAGE/] [,--add-vol-attr ARG ARG/] [,--add-fsl-mcpar FILENAME/] ,-o OUTPUT /[,--hdf5-compression TYPE/]

DESCRIPTION

Create a PyMVPA dataset from various sources.

This command converts data from various sources, such as text files, NumPy's NPY files, and MR (magnetic resonance) images into a PyMVPA dataset that gets stored in HDF5 format. An arbitrary number of sample and feature attributes can be added to a dataset, and individual attributes can be read from heterogeneous sources (e.g. they do not have to be all from text files).

For datasets from MR images this command also supports automatic conversion of additional images into (volumetric) feature attributes. This can be useful for describing features with, for example, atlas labels.

COMPOSE ATTRIBUTES ON THE COMMAND LINE

Options --add-sa and --add-fa can be used to compose dataset attributes directly on The command line. The syntax is:

... --add-sa <attribute name> <comma-separated values> [DTYPE]

where the optional 'DTYPE' is any identifier of a NumPy data type (e.g. 'int', or 'float32'). If no data type is specified the attribute values will be strings.

If only one attribute value is given, it will copied and assigned to all entries in the dataset.

LOAD DATA FROM TEXT FILES

All options for loading data from text files support optional parameters to Tweak the conversion:

... --add-sa-txt <mandatory values> [DELIMITER [DTYPE [SKIPROWS [COMMENTS]]]]

where 'DELIMITER' is the string that is used to separate values in the input file, 'DTYPE' is any identifier of a NumPy data type (e.g. 'int', or 'float32'), 'SKIPROWS' is an integer indicating how many lines at the beginning of the respective file shall be ignored, and 'COMMENTS' is a string indicating how to-be-ignored comment lines are prefixed in the file.

LOAD DATA FROM NUMPY NPY FILES

All options for loading data from NumPy NPY files support an optional parameter:

... --add-fa-npy <mandatory values> [MEMMAP]

where 'MEMMAP' is a flag that triggers whether the respective file shall be read by memory-mapping, i.e. not read (immediately) into memory. Enable by with on of: yes|1|true|enable|on'.

OPTIONS

--version
show program's version and license information and exit
-h, --help, --help-np
show this help message and exit. --help-np forcefully disables the use of a pager for displaying the help.
-i [dataset [dataset ...]], --input [dataset [dataset ...]]
path(s) to one or more PyMVPA dataset files. All datasets will be merged into a single dataset (vstack'ed) in order of specification. In some cases this option may need to be specified more than once if multiple, but separate, input datasets are required.

Input data sources:

--txt-data VALUE [VALUE ...]
load samples from a text file. The first value is the filename the data will be loaded from. Additional values modifying the way the data is loaded are described in the section "Load data from text files".
--npy-data VALUE [VALUE ...]
load samples from a Numpy .npy file. Compressed files (i.e. .npy.gz) are supported as well. The first value is the filename the data will be loaded from. Additional values modifying the way the data is loaded are described in the section "Load data from Numpy NPY files".
--mri-data IMAGE [IMAGE ...]
load data from an MR image, such as a NIfTI file. This can either be a single 4D image, or a list of 3D images, or a combination of both.
--openfmri-modelbold SPEC SPEC SPEC SPEC
load all data associated with a stimulation model in an OpenFMRI-compliant dataset. This option needs 4 argument values: <path> <model ID> <subj ID> <flavor>. The first value is the base directory of the dataset. The next two are (integer) ID for the desired stimulus model and subject. The last argument is either a string indicating the data flavor to load, or an empty string for the default image (bold.nii.gz).

Options for attributes from the command line:

--add-sa VALUE [VALUE ...]
compose a sample attribute from the command line input. The first value is the desired attribute name, the second value is a comma-separated list (appropriately quoted) of actual attribute values. An optional third value can be given to specify a data type. Additional information on defining dataset attributes on the command line are given in the section "Compose attributes on the command line.
--add-fa VALUE [VALUE ...]
compose a feature attribute from the command line input. The first value is the desired attribute name, the second value is a comma-separated list (appropriately quoted) of actual attribute values. An optional third value can be given to specify a data type. Additional information on defining dataset attributes on the command line are given in the section "Compose attributes on the command line.

Options for attributes from text files:

--add-sa-txt VALUE [VALUE ...]
load sample attribute from a text file. The first value is the desired attribute name, the second value is the filename the attribute will be loaded from. Additional values modifying the way the data is loaded are described in the section "Load data from text files".
--add-fa-txt VALUE [VALUE ...]
load feature attribute from a text file. The first value is the desired attribute name, the second value is the filename the attribute will be loaded from. Additional values modifying the way the data is loaded are described in the section "Load data from text files".
--add-sa-attr FILENAME
load sample attribute values from an legacy 'attributes file'. Column data is read as "literal". Only two column files ('targets' + 'chunks') without headers are supported. This option allows for reading attributes files from early PyMVPA versions.

Options for attributes from stored Numpy arrays:

--add-sa-npy VALUE [VALUE ...]
load sample attribute from a Numpy .npy file. Compressed files (i.e. .npy.gz) are supported as well. The first value is the desired attribute name, the second value is the filename the data will be loaded from. Additional values modifying the way the data is loaded are described in the section "Load data from Numpy NPY files".
--add-fa-npy VALUE [VALUE ...]
load feature attribute from a Numpy .npy file. Compressed files (i.e. .npy.gz) are supported as well. The first value is the desired attribute name, the second value is the filename the data will be loaded from. Additional values modifying the way the data is loaded are described in the section "Load data from Numpy NPY files".

Options for input from MR images:

--mask IMAGE
mask image file with the same dimensions as an input data sample. All voxels corresponding to non-zero mask elements will be permitted into the dataset.
--add-vol-attr ARG ARG
attribute name (1st argument) and image file with the same dimensions as an input data sample (2nd argument). The image data will be added as a feature attribute under the specified name.
--add-fsl-mcpar FILENAME
6-column motion parameter file in FSL's McFlirt format. Six additional sample attributes will be created: mc_{x,y,z} and mc_rot{1-3}, for translation and rotation estimates respectively.

Output options:

-o OUTPUT, --output OUTPUT
output filename ('.hdf5' extension is added automatically if necessary). NOTE: The output format is suitable for data exchange between PyMVPA commands, but is not recommended for long-term storage or exchange as its specific content may vary depending on the actual software environment. For long-term storage consider conversion into other data formats (see 'dump' command).
--hdf5-compression TYPE
compression type for HDF5 storage. Available values depend on the specific HDF5 installation. Typical values are: 'gzip', 'lzf', 'szip', or integers from 1 to 9 indicating gzip compression levels.

EXAMPLES

Load 4D MRI image, assign atlas labels to a feature attribute, and attach class labels from a text file. The resulting dataset is stored as 'ds.hdf5' in the current directory.
$ pymvpa2 mkds -o ds --mri-data bold.nii.gz --vol-attr area harvox.nii.gz --add-sa-txt targets labels.txt

AUTHOR

Written by Michael Hanke & Yaroslav Halchenko, and numerous other contributors.

COPYRIGHT

Copyright © 2006-2016 PyMVPA developers

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.