Variant Normalization

来自disease
Disease讨论 | 贡献2020年1月16日 (四) 05:06的版本 (创建页面,内容为“== Introduction == The Variant Call Format (VCF) is a flexible file format specification that allows us to represent many different variant types ranging from SNPs,…”)
(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)
跳到导航 跳到搜索

Introduction

The Variant Call Format (VCF) is a flexible file format specification that allows us to represent many different variant types ranging from SNPs, indels to copy number variations. However, variant representation in VCF is non-unique for variants that have explicitly expressed reference and alternate sequences. A failure to recognize this will frequently result in inaccurate analyses.

On this wiki page, we describe a variant normalization procedure that is well defined for biallelic as well as multiallelic variants. We then provide a formal proof the procedure's correctness.