Variant Normalization
跳到导航
跳到搜索
Introduction
The Variant Call Format (VCF) is a flexible file format specification that allows us to represent many different variant types ranging from SNPs, indels to copy number variations. However, variant representation in VCF is non-unique for variants that have explicitly expressed reference and alternate sequences. A failure to recognize this will frequently result in inaccurate analyses.
On this wiki page, we describe a variant normalization procedure that is well defined for biallelic as well as multiallelic variants. We then provide a formal proof the procedure's correctness.