DESCRIPTION
usage: rdkit2fps [-h] [--fpSize INT] [--RDK] [--minPath INT] [--maxPath INT]- [--nBitsPerHash INT] [--useHs 0|1] [--morgan] [--radius INT] [--useFeatures 0|1] [--useChirality 0|1] [--useBondTypes 0|1] [--torsions] [--targetSize INT] [--pairs] [--minLength INT] [--maxLength INT] [--maccs166] [--substruct] [--rdmaccs] [--id-tag NAME] [--in FORMAT] [-o FILENAME] [--errors {strict,report,ignore}] [filenames [filenames ...]]
Generate FPS fingerprints from a structure file using RDKit
positional arguments:
- filenames
- input structure files (default is stdin)
optional arguments:
- -h, --help
- show this help message and exit
- --fpSize INT
- number of bits in the fingerprint (applies to RDK, Morgan, topological torsion, and atom pair fingerprints (default=2048)
- --id-tag NAME
- tag name containing the record id (SD files only)
- --in FORMAT
- input structure format (default guesses from filename)
- -o FILENAME, --output FILENAME
- save the fingerprints to FILENAME (default=stdout)
- --errors {strict,report,ignore}
- how should structure parse errors be handled? (default=strict)
RDKit topological fingerprints:
- --RDK
- generate RDK fingerprints (default)
- --minPath INT
- minimum number of bonds to include in the subgraph (default=1)
- --maxPath INT
- maximum number of bonds to include in the subgraph (default=7)
- --nBitsPerHash INT
- number of bits to set per path (default=4)
- --useHs 0|1
- include information about the number of hydrogens on each atom (default=1)
RDKit Morgan fingerprints:
- --morgan
- generate Morgan fingerprints
- --radius INT
- radius for the Morgan algorithm (default=2)
- --useFeatures 0|1
- use chemical-feature invariants (default=0)
- --useChirality 0|1
- include chirality information (default=0)
- --useBondTypes 0|1
- include bond type information (default=1)
RDKit Topological Torsion fingerprints:
- --torsions
- generate Topological Torsion fingerprints
- --targetSize INT
- number of bits in the fingerprint (default=4)
RDKit Atom Pair fingerprints:
- --pairs
- generate Atom Pair fingerprints
- --minLength INT
- minimum bond count for a pair (default=1)
- --maxLength INT
- maximum bond count for a pair (default=30)
166 bit MACCS substructure keys:
- --maccs166
- generate MACCS fingerprints
881 bit substructure keys:
- --substruct
- generate ChemFP substructure fingerprints
ChemFP version of the 166 bit RDKit/MACCS keys:
- --rdmaccs
- generate 166 bit RDKit/MACCS fingerprints
This program guesses the input structure format based on the filename extension. If the data comes from stdin, or the extension name us unknown, then use "--in" to change the default input format. The supported format extensions are:
- File Type
- Valid FORMATs (use gz if compressed)
- --------- ------------------------------------
- SMILES
- smi, ism, can, smi.gz, ism.gz, can.gz
- SDF
-
sdf, mol, sd, mdl, sdf.gz, mol.gz, sd.gz, mdl.gz