DESCRIPTION
VW options:
- --random_seed arg
 - seed random number generator
 - --ring_size arg
 - size of example ring
 
Update options:
- -l [ --learning_rate ] arg
 - Set learning rate
 - --power_t arg
 - t power value
 - --decay_learning_rate arg
 - Set Decay factor for learning_rate between passes
 - --initial_t arg
 - initial t value
 - --feature_mask arg
 - Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.
 
Weight options:
- -i [ --initial_regressor ] arg
 - Initial regressor(s)
 - --initial_weight arg
 - Set all weights to an initial value of arg.
 - --random_weights arg
 - make initial weights random
 - --input_feature_regularizer arg
 - Per feature regularization input file
 
Parallelization options:
- --span_server arg
 - Location of server for setting up spanning tree
 - --threads
 - Enable multi-threading
 - --unique_id arg (=0)
 - unique id used for cluster parallel jobs
 - --total arg (=1)
 - total number of nodes used in cluster parallel job
 - --node arg (=0)
 - node number in cluster parallel job
 
Diagnostic options:
- --version
 - Version information
 - -a [ --audit ]
 - print weights of features
 - -P [ --progress ] arg
 - Progress update frequency. int: additive, float: multiplicative
 - --quiet
 - Don't output disgnostics and progress updates
 - -h [ --help ]
 - Look here: http://hunch.net/~vw/ and click on Tutorial.
 
Feature options:
- --hash arg
 - how to hash the features. Available options: strings, all
 - --ignore arg
 - ignore namespaces beginning with character <arg>
 - --keep arg
 - keep namespaces beginning with character <arg>
 - --redefine arg
 - redefine namespaces beginning with characters of string S as namespace N. <arg> shall be in form 'N:=S' where := is operator. Empty N or S are treated as default namespace. Use ':' as a wildcard in S.
 - -b [ --bit_precision ] arg
 - number of bits in the feature table
 - --noconstant
 - Don't add a constant feature
 - -C [ --constant ] arg
 - Set initial value of constant
 - --ngram arg
 - Generate N grams. To generate N grams for a single namespace 'foo', arg should be fN.
 - --skips arg
 - Generate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace 'foo', arg should be fN.
 - --feature_limit arg
 - limit to N features. To apply to a single namespace 'foo', arg should be fN
 - --affix arg
 - generate prefixes/suffixes of features; argument '+2a,-3b,+1' means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace
 - --spelling arg
 - compute spelling features for a give namespace (use '_' for default namespace)
 - --dictionary arg
 - read a dictionary for additional features (arg either 'x:file' or just 'file')
 - --dictionary_path arg
 - look in this directory for dictionaries; defaults to current directory or env{PATH}
 - --interactions arg
 - Create feature interactions of any level between namespaces.
 - --permutations
 - Use permutations instead of combinations for feature interactions of same namespace.
 - --leave_duplicate_interactions
 - Don't remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: '-q ab -q ba' and a lot more in '-q ::'.
 - -q [ --quadratic ] arg
 - Create and use quadratic features
 - --q: arg
 - : corresponds to a wildcard for all printable characters
 - --cubic arg
 - Create and use cubic features
 
Example options:
- -t [ --testonly ]
 - Ignore label information and just test
 - --holdout_off
 - no holdout data in multiple passes
 - --holdout_period arg
 - holdout period for test only, default 10
 - --holdout_after arg
 - holdout after n training examples, default off (disables holdout_period)
 - --early_terminate arg
 - Specify the number of passes tolerated when holdout loss doesn't decrease before early termination, default is 3
 - --passes arg
 - Number of Training Passes
 - --initial_pass_length arg
 - initial number of examples per pass
 - --examples arg
 - number of examples to parse
 - --min_prediction arg
 - Smallest prediction to output
 - --max_prediction arg
 - Largest prediction to output
 - --sort_features
 - turn this on to disregard order in which features have been defined. This will lead to smaller cache sizes
 - --loss_function arg (=squared)
 - Specify the loss function to be used, uses squared by default. Currently available ones are squared, classic, hinge, logistic and quantile.
 - --quantile_tau arg (=0.5)
 - Parameter \tau associated with Quantile loss. Defaults to 0.5
 - --l1 arg
 - l_1 lambda
 - --l2 arg
 - l_2 lambda
 - --named_labels arg
 - use names for labels (multiclass, etc.) rather than integers, argument specified all possible labels, comma-sep, eg "--named_labels Noun,Verb,Adj,Punc"
 
Output model:
- -f [ --final_regressor ] arg
 - Final regressor
 - --readable_model arg
 - Output human-readable final regressor with numeric features
 - --invert_hash arg
 - Output human-readable final regressor with feature names. Computationally expensive.
 - --save_resume
 - save extra state so learning can be resumed later with new data
 - --save_per_pass
 - Save the model after every pass over data
 - --output_feature_regularizer_binary arg
 - Per feature regularization output file
 - --output_feature_regularizer_text arg Per feature regularization output file,
 - in text
 
Output options:
- -p [ --predictions ] arg
 - File to output predictions to
 - -r [ --raw_predictions ] arg
 - File to output unnormalized predictions to
 
Reduction options, use [option] --help for more info:
- --bootstrap arg
 - k-way bootstrap by online importance resampling
 - --search arg
 - Use learning to search, argument=maximum action id or 0 for LDF
 - --replay_c arg
 - use experience replay at a specified level [b=classification/regression, m=multiclass, c=cost sensitive] with specified buffer size
 - --cbify arg
 - Convert multiclass on <k> classes into a contextual bandit problem
 - --cb_adf
 - Do Contextual Bandit learning with multiline action dependent features.
 - --cb arg
 - Use contextual bandit learning with <k> costs
 - --csoaa_ldf arg
 - Use one-against-all multiclass learning with label dependent features. Specify singleline or multiline.
 - --wap_ldf arg
 - Use weighted all-pairs multiclass learning with label dependent features.
 - Specify singleline or multiline.
 - --interact arg
 - Put weights on feature products from namespaces <n1> and <n2>
 - --csoaa arg
 - One-against-all multiclass with <k> costs
 - --multilabel_oaa arg
 - One-against-all multilabel with <k> labels
 - --log_multi arg
 - Use online tree for multiclass
 - --ect arg
 - Error correcting tournament with <k> labels
 - --boosting arg
 - Online boosting with <N> weak learners
 - --oaa arg
 - One-against-all multiclass with <k> labels
 - --top arg
 - top k recommendation
 - --replay_m arg
 - use experience replay at a specified level [b=classification/regression, m=multiclass, c=cost sensitive] with specified buffer size
 - --binary
 - report loss as binary classification on -1,1
 - --link arg (=identity)
 - Specify the link function: identity, logistic or glf1
 - --stage_poly
 - use stagewise polynomial feature learning
 - --lrqfa arg
 - use low rank quadratic features with field aware weights
 - --lrq arg
 - use low rank quadratic features
 - --autolink arg
 - create link function with polynomial d
 - --new_mf arg
 - rank for reduction-based matrix factorization
 - --nn arg
 - Sigmoidal feedforward network with <k> hidden units
 - --confidence
 - Get confidence for binary predictions
 - --active_cover
 - enable active learning with cover
 - --active
 - enable active learning
 - --replay_b arg
 - use experience replay at a specified level [b=classification/regression, m=multiclass, c=cost sensitive] with specified buffer size
 - --bfgs
 - use bfgs optimization
 - --conjugate_gradient
 - use conjugate gradient based optimization
 - --lda arg
 - Run lda with <int> topics
 - --noop
 - do no learning
 - print examples
 - --rank arg
 - rank for matrix factorization.
 - --sendto arg
 - send examples to <host>
 - --svrg
 - Streaming Stochastic Variance Reduced Gradient
 - --ftrl
 - FTRL: Follow the Proximal Regularized Leader
 - --pistol
 - FTRL: Parameter-free Stochastic Learning
 - --ksvm
 - kernel svm
 
Gradient Descent options:
- --sgd
 - use regular stochastic gradient descent update.
 - --adaptive
 - use adaptive, individual learning rates.
 - --invariant
 - use safe/importance aware updates.
 - --normalized
 - use per feature normalized updates
 - --sparse_l2 arg (=0)
 - use per feature normalized updates
 
Input options:
- -d [ --data ] arg
 - Example Set
 - --daemon
 - persistent daemon mode on port 26542
 - --port arg
 - port to listen on; use 0 to pick unused port
 - --num_children arg
 - number of children for persistent daemon mode
 - --pid_file arg
 - Write pid file in persistent daemon mode
 - --port_file arg
 - Write port used in persistent daemon mode
 - -c [ --cache ]
 - Use a cache. The default is <data>.cache
 - --cache_file arg
 - The location(s) of cache_file.
 - -k [ --kill_cache ]
 - do not reuse existing cache: create a new one always
 - --compressed
 - use gzip format whenever possible. If a cache file is being created, this option creates a compressed cache file. A mixture of raw-text & compressed inputs are supported with autodetection.
 - --no_stdin
 - 
do not default to reading from stdin
 

