metrics package

Submodules

metrics.metrics module

metrics.metrics.count_deletions(mqc, biosample_id)

Description: The ratio between number of insertion and deletion in VCF, only in autosomal regions, high quality variants.

Implementation details: In the NPM-sample-QC reference implementation, calculate the ratio of insertions and deletion in VCF, only in autosomal regions, high quality variants by bcftools view, (bcftools view -H -v indels -f PASS….INS/bcftools view -H -v indels -f PASS….DEL). Insertions and Deletions are only considered in this metric short, less than 50bp, insertions, deletion as commonly identified by most short reads variant callers. Structural variations which include insertions, deletions larger than 50bp and which are typically identified using dedicated SV callers are not considered.

metrics.metrics.count_insertions(mqc, biosample_id)

Description: The number of variant type indels only insertions in VCF, only in autosomal regions, high quality variants.

Implementation details: In the NPM-sample-QC reference implementation, calculate the number of variant type indels only insertions in VCF, only in autosomal regions, high quality variants by bcftools view, (bcftools view -H -v indels -f PASS….INS). Insertions are only considered in this metric short, less than 50bp, insertions as commonly identified by most short reads variant callers. Structural variations which include insertions larger than 50bp and which are typically identified using dedicated SV callers are not considered.

metrics.metrics.count_snvs(mqc, biosample_id)

Description: The number of variant type SNVs in VCF, only in autosomal regions, high quality variants.

Implementation details: In the NPM-sample-QC reference implementation, calculate the number of variant type SNVs in VCF, only in autosomal regions, high quality variants by bcftools view. (bcftools view -H -v snps -f PASS)

metrics.metrics.cross_contamination_rate(mqc, biosample_id)

Description: Estimation of inter-sample contamination rate of short paired-end sequencing high quality, non duplicated reads, primary alignments, mapped on GRCh38 assembly. No minimum mapping quality is imposed. It is critical that the (BAM/CRAM) alignment files be readily marked for duplicated reads and clipped bases.

Implementation details: The estimation of inter-sample DNA contamination of short paired-end sequencing high quality, aligned sequence reads (BAM/CRAM) mapped on GRCh38 assembly with pre-calculated reference panel of 1000 Genome Project dataset from the VerifyBamID resource using VerifyBamID2 with NumPC “4” (# of Principal Components used in estimation), the key information “FREEMIX” in “.selfSM” in the results indicates the estimated contamination level.

metrics.metrics.insert_size_std_deviation(mqc, biosample_id)

Description: The insert size standard deviation of short paired-end sequencing high quality reads, primary alignments, mapped on GRCh38 assembly. Duplicated reads and clipped bases are included. No minimum mapping quality is imposed.

Implementation details: In the NPM-sample-QC reference implementation it is computed using samtools stats, reporting the insert_size_standard_deviation field. Duplicated reads are included. No mapping qualiy is applied.

metrics.metrics.mad_autosome_coverage(mqc, biosample_id)

Description: The median absolute deviation of sequencing coverage derived from short paired-end sequencing high quality, non duplicated reads, primary alignments, achieving a base quality of 20 or greater and mapping quality of 20 or greater, in autosomes non gap regions of GRCh38 assembly. Overlapping bases are counted only once. It is critical that the (BAM/CRAM) alignment files be readily marked for duplicated reads.

Implementation details: In the NPM-sample-QC reference implementation, the genome-wide sequencing median absolute deviation coverage of the non gap regions of GRCh38 assembly, autosomes only, non duplicated reads, non overlapping bases, primary alignments, achieving a base quality of 20 or greater and mapping quality of 20 or greater is derived from picard (2.27.0) CollectWgsMetrics.

metrics.metrics.mean_autosome_coverage(mqc, biosample_id)

Description: The mean sequencing coverage derived from short paired-end sequencing high quality, non duplicated reads, primary alignments, achieving a base quality of 20 or greater and mapping quality of 20 or greater, in autosomes non gap regions of GRCh38 assembly. Overlapping bases are counted only once. It is critical that the (BAM/CRAM) alignment files be readily marked for duplicated reads.

Implementation details: In the NPM-sample-QC reference implementation, the genome-wide sequencing mean coverage of the non gap regions of GRCh38 assembly, autosomes only, non duplicated reads, non overlapping bases, primary alignments, achieving a base quality of 20 or greater and mapping quality of 20 or greater is derived from picard (2.27.0) CollectWgsMetrics.

metrics.metrics.mean_insert_size(mqc, biosample_id)

Description: The mean insert size of short paired-end sequencing high quality reads, primary alignments, mapped on GRCh38 assembly. Duplicated reads and clipped bases are included. No minimum mapping quality is imposed.

Implementation details: In the NPM-sample-QC reference implementation it is computed using samtools stats, reporting the insert_size_average field. Duplicated reads are included. No mapping qualiy is applied.

metrics.metrics.pct_autosomes_15x(mqc, biosample_id)

Description: The percentage of bases attaining at least 15X sequencing coverage in short paired-end sequencing high quality, non duplicated reads, primary alignments, achieving a base quality of 20 or greater and mapping quality of 20 or greater, in autosomes non gap regions of GRCh38 assembly. Overlapping bases are counted only once. It is critical that the (BAM/CRAM) alignment files be readily marked for duplicated reads.

Implementation details: In the NPM-sample-QC reference implementation, the genome-wide sequencing coverage percentage of bases attaining at least 15X of the non gap regions of GRCh38 assembly, autosomes only, non duplicated reads, non overlapping bases, primary alignments, achieving a base quality of 20 or greater and mapping quality of 20 or greater is derived from picard (2.27.0) CollectWgsMetrics.

metrics.metrics.pct_reads_mapped(mqc, biosample_id)

Description: The percentage of short paired-end sequencing high quality reads, primary alignments, mapped on GRCh38 assembly. Duplicated reads and clipped bases are included. No minimum mapping quality is imposed.

Implementation details: In the NPM-sample-QC reference implementation it is computed using samtools stats, reporting the percentage of reads mapped on GRCh38 assembly. Duplicated reads are included. No mapping qualiy is applied.

metrics.metrics.pct_reads_properly_paired(mqc, biosample_id)

Description: The percentage of short paired-end sequencing high quality, properly paired reads, primary alignments, mapped on GRCh38 assembly. Duplicated reads are included. No minimum mapping quality is imposed.

Implementation details: In the NPM-sample-QC reference implementation it is computed using samtools stats, reporting the percentage of properly paired reads mapped on GRCh38 assembly. Duplicated reads are included. No mapping qualiy is applied.

metrics.metrics.ratio_heterozygous_homzygous_indel(mqc, biosample_id)

Description: The ratio of heterozygous and homozygous variant type indels in VCF, only in autosomal regions, high quality variants.

Implementation details: In the NPM-sample-QC reference implementation, calculate the ratio of heterozygous and homozygous variant type indels in VCF, only in autosomal regions, high quality variants by bcftools view, (bcftools view -H -v indels -f PASS -g het / bcftools view -H -v indels -f PASS -g hom).

metrics.metrics.ratio_heterozygous_homzygous_snv(mqc, biosample_id)

Description: The ratio of heterozygous and homozygous variant type SNVs in VCF, only in autosomal regions, high quality variants.

Implementation details: In the NPM-sample-QC reference implementation, calculate the ratio of heterozygous and homozygous variant type SNVs in VCF, only in autosomal regions, high quality variants by bcftools view, (bcftools view -H -v snps -f PASS -g het / bcftools view -H -v snps -f PASS -g hom).

metrics.metrics.ratio_insertion_deletion(mqc, biosample_id)

Description: The ratio between number of insertion and deletion in VCF, only in autosomal regions, high quality variants.

Implementation details: In the NPM-sample-QC reference implementation, calculate the ratio of insertions and deletion in VCF, only in autosomal regions, high quality variants by bcftools view, (bcftools view -H -v indels -f PASS….INS/bcftools view -H -v indels -f PASS….DEL). Insertions and Deletions are only considered in this metric short, less than 50bp, insertions, deletion as commonly identified by most short reads variant callers. Structural variations which include insertions, deletions larger than 50bp and which are typically identified using dedicated SV callers are not considered.

metrics.metrics.ratio_transitions_transversions(mqc, biosample_id)

Description: The ratio of transitions and transversions of bi-allelic SNVs in VCF, only in autosomal regions, high quality variants.

Implementation details: In the NPM-sample-QC reference implementation, calculate the ratio of transitions and transversions of bi-allelic SNVs in VCF, only in autosomal regions, high quality-variants by bcftools stats, (bcftools stats -f PASS … TSTV).

metrics.metrics.yield_bp_q30(mqc, biosample_id)

Description: The number of bases in short paired-end sequencing high quality reads, primary alignments, achieving a base quality score of 30 or greater (Phred scale). Duplicated reads and clipped bases are included. No minimum mapping quality is imposed.

Implementation details: In the NPM-sample-QC reference implementation it is computed using GATK Picard CollectQualityYieldMetrics, reporting the PF_Q30_BASES field. Only high quality bases from primary alignments are considered. No filter on duplicated reads, clipped bases or mapping qualiy is applied.

Module contents