DESCRIPTION
The Variant Call Format (VCF) is a TAB-delimited format with each data line consisting of the following fields:1 | CHROM | CHROMosome name |
2 | POS | the left-most POSition of the variant |
3 | ID | unique variant IDentifier |
4 | REF | the REFerence allele |
5 | ALT | the ALTernate allele(s) (comma-separated) |
6 | QUAL | variant/reference QUALity |
7 | FILTER | FILTERs applied |
8 | INFO | INFOrmation related to the variant (semicolon-separated) |
9 | FORMAT | FORMAT of the genotype fields (optional; colon-separated) |
10+ | SAMPLE | SAMPLE genotypes and per-sample information (optional) |
The following table gives the INFO tags used by samtools and bcftools.
- AF1
- Max-likelihood estimate of the site allele frequency (AF) of the first ALT allele (double)
- DP
- Raw read depth (without quality filtering) (int)
- DP4
- # high-quality reference forward bases, ref reverse, alternate for and alt rev bases (int[4])
- FQ
- Consensus quality. Positive: sample genotypes different; negative: otherwise (int)
- MQ
- Root-Mean-Square mapping quality of covering reads (int)
- PC2
- Phred probability of AF in group1 samples being larger (,smaller) than in group2 (int[2])
- PCHI2
- Posterior weighted chi^2 P-value between group1 and group2 samples (double)
- PV4
- P-value for strand bias, baseQ bias, mapQ bias and tail distance bias (double[4])
- QCHI2
- Phred-scaled PCHI2 (int)
- RP
- # permutations yielding a smaller PCHI2 (int)
- CLR
- Phred log ratio of genotype likelihoods with and without the trio/pair constraint (int)
- UGT
- Most probable genotype configuration without the trio constraint (string)
- CGT
- Most probable configuration with the trio constraint (string)
- VDB
- Tests variant positions within reads. Intended for filtering RNA-seq artifacts around splice sites (float)
- RPB
- Mann-Whitney rank-sum test for tail distance bias (float)
- HWE
- Hardy-Weinberg equilibrium test (Wigginton et al) (float)