高清无码影视,亚洲热久久,男女生操逼,人人看操美女,gogo大胆无码无码免费衩频,国产精品卡一卡二在线,啪啪视频入口,色吊丝永久性观看网站在线观看

使用 trimmomatic 去除fastq數(shù)據(jù)的接頭

# 創(chuàng)建 Raw Reads目錄mkdir -p $OUTWD/Raw_reads# 拷貝 Raw Readscp -f ~/wgs_prokaryote/fastq/*fastq.gz $OUTWD/Raw_reads/# 創(chuàng)建 Clean Reads 目錄mkdir -p $OUTWD/cleaned_reads# 查看變量和接頭文件echo $TRIMMOMATIC $CPUS $OUTWD $ADAPTERS ${QUALITY} ${MINLEN}head $ADAPTERS# 用 trimmomatic 循環(huán)、批量去除fastq數(shù)據(jù)的接頭for i in `ls $OUTWD/Raw_reads | sed 's/_/\t/g' | cut -f 1 | sort -u | head -n 4`; do  # 用trimmomatic：過濾  # $TRIMMOMATIC PE -threads $CPUS -phred33 $OUTWD/Raw_reads/$i\_1.fastq.gz $OUTWD/Raw_reads/$i\_2.fastq.gz \  #   $OUTWD/cleaned_reads/$i\.ok_1.fastq.gz /dev/null $OUTWD/cleaned_reads/$i\.ok_2.fastq.gz /dev/null \  #   ILLUMINACLIP:$ADAPTERS:1:30:11 LEADING:${QUALITY} TRAILING:${QUALITY} MINLEN:${MINLEN}  # 用trimmomatic：去接頭  $TRIMMOMATIC PE -threads $CPUS -phred33 $OUTWD/Raw_reads/$i\_1.fastq.gz $OUTWD/Raw_reads/$i\_2.fastq.gz \    $OUTWD/cleaned_reads/$i\.noadapt.R1.fastq.gz /dev/null $OUTWD/cleaned_reads/$i\.noadapt.R2.fastq.gz /dev/null \    ILLUMINACLIP:$ADAPTERS:1:30:11done

使用 prinseq 程序過濾 fastq

# 將4個樣本名寫入文件ls $OUTWD/Raw_reads | sed 's/_/\t/g' | cut -f 1 | sort -u | head -n 4 > $OUTWD/reads-list.tmp
# 解壓fastq$PARALLEL -j $CPUS --gnu gzip -d ::: $OUTWD/cleaned_reads/*gz# -d, --decompress  decompress
# 使用 prinseq 程序過濾 fastqecho $PARALLEL $CPUS $OUTWD $PRINSEQ $MINLEN $QUALITY# 默認的過濾閾值(QUALITY=25, MINLEN=125)可能過于嚴格，導(dǎo)致無法生成滿足條件的、合適大小的fastq文件$PARALLEL -j $CPUS -a $OUTWD/reads-list.tmp \  perl $PRINSEQ -verbose \  -fastq $OUTWD/cleaned_reads/{}.noadapt.R1.fastq -fastq2 $OUTWD/cleaned_reads/{}.noadapt.R2.fastq \  -out_good $OUTWD/cleaned_reads/{}.ok -out_format 3 -out_bad null \  -min_len $MINLEN -min_qual_mean $QUALITY \  -trim_qual_right $QUALITY -trim_qual_window 15 -trim_qual_type mean

prinseq-lite.pl程序的參數(shù)：

-verbose

Prints status and info messages during processing.

***** FILTER OPTIONS *****

-min_len <integer>

Filter sequence shorter than min_len.

-min_qual_mean <integer>

Filter sequence with quality score mean below min_qual_mean.

***** TRIM OPTIONS *****

-trim_qual_right <integer>

Trim sequence by quality score from the 3'-end with this

threshold score.

-trim_qual_window <integer>

The sliding window size used to calculate quality score by type.

To stop at the first base that fails the rule defined, use a

window size of 1. [default: 1]

-trim_qual_type <string>

Type of quality score calculation to use. Allowed options are

min, mean, max and sum. [default: min]

***** INPUT OPTIONS *****

-fastq <file>

Input file in FASTQ format that contains the sequence and

quality data. Use stdin instead of a file name to read from

STDIN (-fasta stdin). This can be useful to process compressed

files using Unix pipes.

-fastq2 <file>

For paired-end data only. Input file in FASTQ format that

contains the sequence and quality data. The sequence identifiers

for two matching paired-end sequences in separate files can be

marked by /1 and /2, or _L and _R, or _left and _right, or must

have the exact same identifier in both input files. The input

sequences must be sorted by their sequence identifiers.

Singletons are allowed in the input files.

***** OUTPUT OPTIONS *****

-out_format <integer>

To change the output format, use one of the following options.

If not defined, the output format will be the same as the input

format.

1 (FASTA only), 2 (FASTA and QUAL), 3 (FASTQ), 4 (FASTQ and

FASTA), or 5 (FASTQ, FASTA and QUAL)

-out_good <string>

By default, the output files are created in the same directory

as the input file containing the sequence data with an

additional "_prinseq_good_XXXX" in their name (where XXXX is

replaced by random characters to prevent overwriting previous

files). To change the output filename and location, specify the

filename using this option. The file extension will be added

automatically (either .fasta, .qual, or .fastq). For paired-end

data, filenames contain additionally "_1", "_1_singletons",

"_2", and "_2_singletons" before the file extension. Use

"-out_good null" to prevent the program from generating the

output file(s) for data passing all filters. Use "-out_good

stdout" to write data passing all filters to STDOUT (only for

FASTA or FASTQ output files).

-out_bad <string>

By default, the output files are created in the same directory

as the input file containing the sequence data with an

additional "_prinseq_bad_XXXX" in their name (where XXXX is

replaced by random characters to prevent overwriting previous

files). To change the output filename and location, specify the

filename using this option. The file extension will be added

automatically (either .fasta, .qual, or .fastq). For paired-end

data, filenames contain additionally "_1" and "_2" before the

file extension. Use "-out_bad null" to prevent the program from

generating the output file(s) for data not passing any filter.

Use "-out_bad stdout" to write data not passing any filter to

STDOUT (only for FASTA or FASTQ output files).

Example: use "file_filtered" to generate the output file

file_filtered.fasta in the current directory

Example: "-out_good stdout -out_bad null" will write data

passing filters to STDOUT and data not passing any filter will

be ignored

程序運行時的提示

If you use programs that use GNU Parallel to process data for an article in a

scientific publication, please cite:

Tange, O. (2020, July 22). GNU Parallel 20200722 ('Privacy Shield').

Zenodo. https://doi.org/10.5281/zenodo.3956817

Estimate size of input data for status report (this might take a while for large files)

done

Parse and process input data

done

Clean up empty files

done

Input and filter stats:

Input sequences (file 1): 1,445,931

Input bases (file 1): 167,706,607

Input mean length (file 1): 115.99

Input sequences (file 2): 1,445,931

Input bases (file 2): 156,139,854

Input mean length (file 2): 107.99

Good sequences (pairs): 1,445,301

Good bases (pairs): 323,719,542

Good mean length (pairs): 223.98

Good sequences (singletons file 1): 544 (0.04%)

Good bases (singletons file 1): 63,066

Good mean length (singletons file 1): 115.93

Good sequences (singletons file 2): 20 (0.00%)

Good bases (singletons file 2): 2,160

Good mean length (singletons file 2): 108.00

Bad sequences (file 1): 86 (0.01%)

Bad bases (file 1): 2,813

Bad mean length (file 1): 32.71

Bad sequences (file 2): 544 (0.04%)

Bad bases (file 2): 57,217

Bad mean length (file 2): 105.18

Sequences filtered by specified parameters:

trim_qual_right: 18

min_len: 549

min_qual_mean: 129

# 刪除中間文件，壓縮結(jié)果文件$PARALLEL -j $CPUS --gnu rm -f ::: $OUTWD/cleaned_reads/*singletons*$PARALLEL -j $CPUS --gnu rm -rf ::: $OUTWD/cleaned_reads/*noadapt*$PARALLEL -j $CPUS --gnu gzip ::: $OUTWD/cleaned_reads/*fastq
# Quality filtering statsfor i in `ls $OUTWD/Raw_reads | sed 's/_/\t/g' | cut -f 1 | sort -u | head -n 4`; do  zcat $OUTWD/cleaned_reads/$i\.ok_*.fastq.gz | awk 'BEGIN { t=0.0;sq=0.0; n=0;} ;NR%4==2 {n++;L=length($0);t+=L;sq+=L*L;}END{m=t/n;printf("total\tavg\n%d\t%f\n",n,m,sq/n-m*m);}' \    > $OUTWD/cleaned_reads/$i\.ok.stats  sed -i 's/\..*//' $OUTWD/cleaned_reads/$i\.ok.statsdone
# Number of reads surviving to log filefor i in `ls $OUTWD/Raw_reads | sed 's/_/\t/g' | cut -f 1 | sort -u | head -n 4`; do   echo -e "Number of reads surviving for sample ${i}: \c"   cut -f 1 $OUTWD/cleaned_reads/${i}.ok.stats | tail -n 1done

Number of reads surviving for sample ERR025837: 3954038

Number of reads surviving for sample ERR025838: 2890602

Number of reads surviving for sample ERR038741: 12890710

Number of reads surviving for sample SRR057610: 13663540

使用 fastqc 程序進一步檢查 fastq 的數(shù)據(jù)質(zhì)量

## Chechking quality with FASTQCfor i in 1 2; do  $PARALLEL -j $CPUS -a $OUTWD/reads-list.tmp $FASTQC $OUTWD/cleaned_reads/{}.ok_${i}.fastq.gz -t 1 -o $OUTWD/cleaned_reads/donerm $OUTWD/cleaned_reads/*zip

組裝

Software for genome assembly chosen: spades (耗時、耗內(nèi)存)

mkdir -p $OUTWD/assemblymkdir -p $OUTWD/genomes
echo $CPUS $SPADES
for i in `ls $OUTWD/Raw_reads | sed 's/_/\t/g' | cut -f 1 | sort -u | head -n 4`; do  python $SPADES --careful -1 $OUTWD/cleaned_reads/$i\.ok_1.fastq.gz -2 $OUTWD/cleaned_reads/$i\.ok_2.fastq.gz \    -o $OUTWD/assembly/$i\_assembly -t $CPUSdone
# Software for genome assembly chosen: megahit
# echo $MEGAHIT $CPUS# for i in `ls $OUTWD/Raw_reads | sed 's/_/\t/g' | cut -f 1 | sort -u | head -n 4`; do#   $MEGAHIT -1 $OUTWD/cleaned_reads/$i\.ok_1.fastq.gz -2 $OUTWD/cleaned_reads/$i\.ok_2.fastq.gz \#     -o $OUTWD/assembly/$i\_assembly -t $CPUS# done
# 對組裝后的基因組重命名: final.contigs.fa --> contigs.fastafor i in `ls $OUTWD/Raw_reads | sed 's/_/\t/g' | cut -f 1 | sort -u | head -n 4`; do  mv $OUTWD/assembly/$i\_assembly/final.contigs.fa $OUTWD/assembly/$i\_assembly/contigs.fastadone

使用 Prinseq 過濾組裝后的基因組，并對后者進行質(zhì)量控制

# 啟動 Prinseq$PARALLEL -j $CPUS -a $OUTWD/reads-list.tmp perl $PRINSEQ -fasta $OUTWD/assembly/{}\_assembly/contigs.fasta \  -min_len $MINCONTIGLEN -out_good $OUTWD/genomes/{} -out_bad null# 壓縮、刪除之前的組裝結(jié)果# parallel -j $CPUS --gnu rm -rf ::: $OUTWD/genomes/1/$PARALLEL -j $CPUS -a $OUTWD/reads-list.tmp --gnu tar cfz $OUTWD/assembly/{}\_assembly.tgz -C $OUTWD/assembly/{}\_assembly$PARALLEL -j $CPUS -a $OUTWD/reads-list.tmp --gnu rm -rf $OUTWD/assembly/{}\_assembly

使用 QUAST 評估、統(tǒng)計組裝結(jié)果

1. Quality assessment for genome assemblies

2. Count the number of contigs in genome

mkdir -p $OUTWD/genome_stats
for i in `ls $OUTWD/Raw_reads | sed 's/_/\t/g' | cut -f 1 | sort -u | head -n 4`; do$QUAST $OUTWD/genomes/${i}.fasta -o $OUTWD/genome_stats/${i}_genome_stats -t $CPUS \  --min-contig $MINCONTIGLEN --no-icarus --silent --no-svdone