DESCRIPTION
Sequence Element Enrichment AnalysisThe .pheno file format is tab separated, two columns with sample name, one with phenotype. Phenotypes of only 0 or 1 will be treated as binary, any other value and the phenotype will be treated as quantitative. Therefore for missing phenotype values the sample should simply be excluded from this file.
OPTIONS
Required options:
- -k [ --kmers ] arg
- dsm kmer output file
- -p [ --pheno ] arg
- .pheno metadata
Covariate options:
- --struct arg
- mds values from kmds
- --covar_file arg
- file containing covariates
- --covar_list arg
- list of columns covariates to use. Format is 1,2q,3 (use q for quantitative)
Performance options:
- --threads arg (=1)
- number of threads. Suggested: 4
Filtering options:
- --no_filtering
- turn off all filtering and peform tests on all kmers input
- --max_length arg (=100)
- maximum kmer length
- --maf arg (=0.01)
- minimum kmer frequency
- --min_words arg
- minimum kmer occurrences. Overrides --maf
- --positive_only
- only test words with a predicted positive effect direction
- --chisq arg (=10e-5)
- p-value threshold for initial chi squared test. Set to 1 to show all
- --pval arg (=10e-8)
- p-value threshold for final logistic test. Set to 1 to show all
Other options:
- --print_samples
- print lists of samples significant kmers were found in
- -h [ --help ]
- full help message
EXAMPLES
Basic usage:- seer -k dsm_input.txt.gz --pheno metadata.pheno > significant_kmers.txt
To use the kmds output, increase execution speed and give the most complete output
- seer -k filtered.gz --pheno metadata.pheno --struct filtered.dsm --threads 4 --print_samples
AUTHOR
This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.