Command line version

An Accurate and Ultra-deep Coverage Method for Large-scale SSR Genotyping with SNPs in the SSR and Flanking Region Compatible

Command line version

An Accurate and Ultra-deep Coverage Method for Large-scale SSR Genotyping with SNPs in the SSR and Flanking Region Compatible

AMGT-TS User manual (command line version)

  1. Prerequisite Running AMGT-TS requires a GNU-like environment. It should be possible to run AMGT-TS on a Linux or Mac OS, Ubuntu Server 18.04 is recommended.

AMGT-TS requires the following external tools:

bamtools (2.5.0)
blast tool suite (2.6.0+)
bwa (0.7.17-r1188)
fastx_toolkit (0.0.13)
picard (2.15.0-SNAPSHOT)
samtools (1.3.1)
seqtk (1.2)

So, please install the external tools before the following steps.

  1. Get the source code

git clone https://github.com/plantdna/amgt-ts.git

  1. Add the samples data and the loci information file For test purpose, just skip this step, for we already provided two sample data files under the folder “working/00_fastq”: artificial-reads-5000-2loci.fq sample15.fq

and loci information files under folder “ref”: sites.amplicon.fa sites.motif.info.stats

  1. Setup the configuration file A example configuration file is under the “subtools” folder: profiles-maizedna-1.sh

partial lines as:

export TOOLS_DIR=/mnt/diskc/gitlab/s2s/tools

export FORMAT=fq

export PICARD=$TOOLS_DIR/picard.jar

Please change the value same as the environment.

  1. Run

Although we can run the tool as the description in the README.md file, we recommend using “launch.sh” instead. For it will bring some convenient. Let’s have a look of this script:

#!/bin/bash

# Keep all running information to log file.
TIME_STR=`date +"%Y%m%d%H%M%S"`
LOG_FILE=run_$TIME_STR.log
touch $LOG_FILE
SCRIPT_DIR=/mnt/diskc/gitlab/amgt-ts/
ENV_FILE=$SCRIPT_DIR/subtools/profiles-maizedna-1.sh
#METHOD=broad
METHOD=precise
PROJECT_DIR=/mnt/diskc/gitlab/amgt-ts
$SCRIPT_DIR/amgt-ts.sh --script=$SCRIPT_DIR --environment=$ENV_FILE --method=$METHOD --project=$PROJECT_DIR 2>&1 | tee -a $LOG_FILE

The following 4 lines need to be changed:

SCRIPT_DIR=/mnt/diskc/gitlab/amgt-ts/
ENV_FILE=$SCRIPT_DIR/subtools/profiles-maizedna-1.sh
METHOD=precise
PROJECT_DIR=/mnt/diskc/gitlab/amgt-ts

After modify this script to consistent with the environment, let’s start with just input: $ ./launch.sh

We can get the genotype result under folder “working/04_reads”, such as:

$ head -n 3 working/04_reads/sample15.site.stat
#Site Motif Repeat_len Reads_num Total_reads Proportion
s258878 GCT 0 10 2370 0.0042194092827
s258878 GCT 12 1 2370 0.00042194092827


$ head -n 3 working/04_reads/artificial-reads-5000-2loci.site.stat
#Site Motif Repeat_len Reads_num Total_reads Proportion
s258878 GCT 9 1000 1000 1.0
s282049 TGG 15 1000 2000 0.5

and the reads of each genotype:

$ head -n 6 working/04_reads/sample15/s258878/SSR9.fas
>CRC8E:03524:00917
GATCTGTTTGCCAGCTGGGGCAAGATTTCTTCGGTGCCTGCACCCTTTCTCTTGCGGTTGTTTATGCTATCGCTGCTGCTGATTGTGGGGTTCCTGCGTTCGCCACTGTGACTGTCACTTTGCTGGTGCTGTTCCTGGTTGCATCTGCTTTTCAGTATGTGGGGCTTGAGCTTGTTC
>CRC8E:03306:03509
TAACTGTGCCTTGATCTGTTTGCCAGCTGGGGCAAGATTTCTTCGGTGCCTGCACCCTTTCTCTTGCGGTTGTTTATGCTATCGCTGCTGCTGATTGTGGGGTTCCTGCGTTCGCCACTGTGACTGTCACTTTGCTGGTGCTGTTCCTGGTTGCATCTGCTTTTCAGTATGTGGGGCTTGAGCTTGTTC
>CRC8E:04520:05520
GATCTGTTTGCCAGCTGGGGCAAGATTTCTTCGGTGCCTGCACCCTTTCTCTTGCGGTTGTTTATGCTATCGCTGCTGCTGATTGTGGGGTTCCTGCGTTCGCCACTGTGACTGTCACTTTGCTGGTGCTGTTCCTGGTTGCATCTGCTTTTCAGTATGTGGGGCTTGAGCTTGTTC

If you encounter some issue, you can check the log file:

$ head -n 5 run_20210707191020.log
===== Script starts running on Wed Jul 7 19:10:20 CST 2021 =====
[bwa_index] Pack FASTA... 0.00 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.00 seconds elapse.
[bwa_index] Update BWT... 0.00 sec

Note that the log file is named by time, so change the file name same with the log file in your tool folder.

Any questions, feel free to contact via email, thank you very much to have interest in this tool.

See also