What it does
CONTRA is a tool for copy number variation (CNV) detection for targeted resequencing data such as those from whole-exome capture data. CONTRA calls copy number gains and losses for each target region with key strategies include the use of base-level log-ratios to remove GC-content bias, correction for an imbalanced library size effect on log-ratios, and the estimation of log-ratio variations via binning and interpolation. It takes standard alignment formats (BAM/SAM) and output in variant call format (VCF 4.0) for easy integration with other next generation sequencing analysis package.
Required Parameters
-t, --target Target region definition file [BED format]
-s, --test Alignment file for the test sample [BAM/SAM]
-c, --control Alignment file for the control sample
[BAM/SAM/BED – baseline file]
--bed **option has to be supplied for control
with baseline file.**
-f, --fasta Reference genome [FASTA]
-o, --outFolder the folder name (and its path) to store the output
of the analysis (this new folder will be created –
error message occur if the folder exists)
Optional Parameters
--numBin Numbers of bins to group the regions. User can
specify multiple experiments with different numbers
of bins (comma separated). [Default: 20]
--minReadDepth The threshold for minimum read depth for each bases
(see Step 2 in CONTRA workflow) [Default: 10]
--minNBases The threshold for minimum number of bases for each
target regions (see Step 2 in CONTRA workflow)
[Default: 10]
--sam If the specified test and control samples are in
SAM format. [Default: False] (It will always take
BAM samples as default)
--bed If specified, control will be a baseline file in
BED format. [Default: False]
Please refer to the Baseline Script section for
instruction how to create baseline files from set
of BAMfiles. A set of baseline files from different
platform have also been provided in the CONTRA
download page.
--pval The p-value threshold for filtering. Based on Adjusted
P-Values. Only regions that pass this threshold will
be included in the VCF file. [Default: 0.05]
--sampleName The name to be appended to the front of the default output
name. By default, there will be nothing appended.
--nomultimapped The option to remove multi-mapped reads
(using SAMtools with mapping quality > 0).
[default: FALSE]
-p, --plot If specified, plots of log-ratio distribution for each
bin will be included in the output folder [default: FALSE]
--minExon Minimum number of exons in one bin (if less than this number
, bin that contains small number of exons will be merged to
the adjacent bins) [Default : 2000]
--minControlRdForCall Minimum Control ReadDepth for call [Default: 5]
--minTestRdForCall Minimum Test ReadDepth for call [Default: 0]
--minAvgForCall Minimum average coverage for call [Default: 20]
--maxRegionSize Maximum region size in target region (for breaking
large regions into smaller regions. By default,
maxRegionSize=0 means no breakdown). [Default : 0]
--targetRegionSize Target region size for breakdown (if maxRegionSize
is non-zero) [Default: 200]
-l, --largeDeletion If specified, CONTRA will run large deletion analysis (CBS).
User must have DNAcopy R-library installed to run the
analysis. [False]
--smallSegment CBS segment size for calling large variations [Default : 1]
--largeSegment CBS segment size for calling large variations [Default : 25]
--lrCallStart Log ratios start range that will be used to call CNV
[Default : -0.3]
--lrCallEnd Log ratios end range that will be used to call CNV
[Default : 0.3]
--passSize Size of exons that passed the p-value threshold compare
to the original exons size [Default: 0.5]