logistic_regression(1) l2-regularized logistic regression and prediction

SYNOPSIS


logistic_regression [-h] [-v] [-d double] [-i string] [-r string] [-l double] [-M int] [-m string] [-O string] [-o string] [-p string] [-s double] [-t string] [-T double] -V

DESCRIPTION

An implementation of L2-regularized logistic regression using either the L-BFGS optimizer or SGD (stochastic gradient descent). This solves the regression problem

  y = (1 / 1 + e^-(X * b))
where y takes values 0 or 1. Training the model is done by giving labeled data and iteratively training the parameters vector b. The matrix of predictors (or features) X is specified with the --input_file option, and the vector of responses y is either the last column of the matrix given with --input_file, or a separate one-column vector given with the --input_responses option. After training, the calculated b is saved to the file specified by --output_file. An initial guess for b can be specified when the --model_file parameter is given with --input_file or --input_responses. The tolerance of the optimizer can be set with --tolerance; the maximum number of iterations of the optimizer can be set with --max_iterations; and the type of the optimizer (SGD / L-BFGS) can be set with the --optimizer option. Both the SGD and L-BFGS optimizers have more options, but the C++ interface must be used for those. For the SGD optimizer, the --step_size parameter controls the step size taken at each iteration by the optimizer. If the objective function for your data is oscillating between Inf and 0, the step size is probably too large.

This implementation of logistic regression supports L2-regularization, which can help the parameter vector b from overfitting. This parameter is specified with the --lambda option; by default, it is 0 (which means no regularization is performed).

Optionally, the calculated value of b is used to predict the responses for another matrix of data points, if --test_file is specified. The --test_file option can be specified without --input_file, so long as an existing logistic regression model is given with --model_file. The output predictions from the logistic regression model are stored in the file given with --output_predictions.

This implementation of logistic regression does not support the general multi-class case but instead only the two-class case. Any responses must be either 0 or 1.

REQUIRED OPTIONS

OPTIONS

--decision_boundary (-d) [double]
Decision boundary for prediction; if the logistic function for a point is less than the boundary, the class is taken to be 0; otherwise, the class is 1. Default value 0.5.
--help (-h)
Default help info.
--info [string]
Get help on a specific module or option. Default value ''.
--input_file (-i) [string]
File containing X (predictors). Default value ''.
--input_responses (-r) [string]
Optional file containing y (responses). If not given, the responses are assumed to be the last column of the input file. Default value ''.
--lambda (-l) [double]
L2-regularization parameter for training. Default value 0.
--max_iterations (-M) [int]
Maximum iterations for optimizer (0 indicates no limit). Default value 0.
--model_file (-m) [string]
File containing existing model (parameters). Default value ''.
--optimizer (-O) [string]
Optimizer to use for training ('lbfgs' or 'sgd'). Default value 'lbfgs'.
--output_file (-o) [string]
File where parameters (b) will be saved. Default value ''.
--output_predictions (-p) [string]
If --test_file is specified, this file is where the predicted responses will be saved. Default value 'predictions.csv'.
--step_size (-s) [double]
Step size for SGD optimizer. Default value 0.01.
--test_file (-t) [string]
File containing test dataset. Default value ''.
--tolerance (-T) [double]
Convergence tolerance for optimizer. Default value 1e-10.
--verbose (-v)
Display informational messages and the full list of parameters and timers at the end of execution.
--version (-V)
Display the version of mlpack.

ADDITIONAL INFORMATION

For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your distribution of MLPACK.