MOCAT 2(Meta'omic Analysis Toolkit 2)是一个用于宏基因组和宏转录组数据分析的工具集,旨在处理和分析来自各种环境样品(如土壤、水体、肠道等)的宏基因组学和宏转录组学数据。它提供了一系列功能模块,涵盖了数据预处理、序列比对、装配、功能注释和分析等方面。
MOCAT: A Metagenomics Assembly and Gene Prediction Toolkit | PLOS ONE
github:GitHub - mocat2/mocat2: Latest MOCAT2 version
MOCAT 2的使用流程包括数据准备、选择合适的模块和参数、运行分析、结果解释和分析等步骤。用户可以根据实验设计和数据类型选择合适的模块和参数进行分析,并根据分析结果进行后续的生物信息学分析或实验设计。
MOCAT 2提供了详细的官方文档和使用指南,其中包括安装指南、使用教程、参数说明等,可在官方网站或GitHub页面获取相关信息和支持:
MOCAT2(Meta'omic Analysis Toolkit 2)是用于宏基因组和宏转录组数据分析的工具,提供了一系列功能用于质量控制、序列比对、装配、注释等。以下是MOCAT2的基本使用方法和分析流程:
MOCAT2可以从其官方网站或GitHub页面获取源代码,并且在Linux环境下进行编译安装。可以参考官方文档提供的安装指南进行安装:MOCAT2 GitHub
git clone https://github.com/mocat2/mocat2.git
cd mocat2/stable/2.1.3
perl ./setup.MOCAT2.pl
: 经过质量控制和预处理后的测序数据。summary_statistics.txt
: 包含关于质量控制步骤的统计信息,如序列数目、质量分数统计等。mocat_assembly
: 组装得到的contigs序列。assembly_stats.txt
: 包含有关组装质量和性能的统计信息,如N50、最大/最小contig长度等。mocat_analysis
mocat_preprocessing -t 4 -o output_directory --input-files reads_1.fastq,reads_2.fastq
mocat_assembly -t 4 -o output_directory --input-files reads_1.fastq,reads_2.fastq
mocat_analysis -t 4 -o output_directory --input-files assembly.fa
这里的 -t
# 质量控制和预处理
mocat_preprocessing -t 4 -o preprocessing_output --input-files reads_1.fastq,reads_2.fastq
# 序列组装
mocat_assembly -t 4 -o assembly_output --input-files preprocessing_output/clean_reads_1.fastq,preprocessing_output/clean_reads_2.fastq
# 功能注释和分类分析
mocat_analysis -t 4 -o analysis_output --input-files assembly_output/contigs.fasta
MOCAT.pl --help
MOCAT - Metagenomics Analysis Toolkit v2.1.3
by Jens Roat Kultima, Luis Pedro Coelho, Shinichi Sunagawa @ Bork Group, EMBL
Full manual & FAQ: MOCAT.pl -man
How to cite MOCAT: MOCAT.pl -cite
Have you tried the wrapper runMOCAT.sh? Try it!
Usage: MOCAT.pl -sf|sample_file 'FILE' [Pipeline, Statistics, & Additional Options]
Contains the list of folder names (sample names), one per line,
in which the raw sample data is located
Process, Assemble, Revise Assembly, Predict Genes, cluster genes into gene catalog, annotate gene catalog, profile against gene catalog
MOCAT.pl -sf my.samples -rtf
MOCAT.pl -sf my.samples -a
MOCAT.pl -sf my.samples -gp assembly
MOCAT.pl -sf my.samples -make_gene_catalog -assembly_type assembly
MOCAT.pl -sf my.samples -annotate_gene_catalog
MOCAT.pl -sf my.samples -s my.samples.padded -identity 95
MOCAT.pl -sf my.samples -f my.samples.padded -identity 95
MOCAT.pl -sf my.samples -p my.samples.padded -identity 95 -mode functional
Assemble and predict genes: MOCAT.pl -sf my.samples -rtf
(no screen) MOCAT.pl -sf my.samples -a
MOCAT.pl -sf my.samples -gp assembly
fetch marker genes: MOCAT.pl -sf my.samples -fmg assembly
MOCAT.pl -sf my.samples -ss
Assemble and predict genes: MOCAT.pl -sf my.samples -rtf
(DB screen) MOCAT.pl -sf my.samples -s hg19 -screened_files -identity 90
MOCAT.pl -sf my.samples -a -r hg19
MOCAT.pl -sf my.samples -gp assembly -r hg19
MOCAT.pl -sf my.samples -ss
Assemble and predict genes: MOCAT.pl -sf my.samples -rtf
(remove eg. adapters MOCAT.pl -sf my.samples -sff adapters.fa -screened_files
and then DB screen) MOCAT.pl -sf my.samples -bwa hg19 -r adapters.fa -screened_files
MOCAT.pl -sf my.samples -a -r screened.adapters.fa.on.hg19
MOCAT.pl -sf my.samples -gp assembly -r screened.adapters.fa.on.hg19
MOCAT.pl -sf my.samples -ss
Pipeline Options
-r|reads ['reads.processed', 'DATABASE' or 'FASTA FILE']
Required for all pipeline options, except rtf|read_trim_filter
Specify whether processing trim & filtered, or screened reads.
A default value to this setting can also be specified in config file
Optional for all pipeline options, except rtf|read_trim_filter, see full manual
performs trimming and filtering of reads
Performs assembly of reads
Further improves assemblies
-gp|gene_prediction ['assembly', 'assembly.revised']
Predicts protein coding genes on assemblies
-fmg|fetch_mg ['assembly', 'assembly.revised']
Extracts marker genes among the predicted genes
-soap|bwa ['DB1 DB2 ...',s,c,f,r]
Screen, extract and map reads against a reference databse (hg19 is provided) or (s)acftigs,
(c)ontigs, sca(f)folds from an assembly, or scaftigs from a (r)evised assembly.
This mapping step uses SOAPaligner2 (soap) or BWA (bwa).
Additional options:
-screened_files : If set, screened read files are generated, these are reads not matching the DB
-extracted_files : If set, extracted read files are generated, these are reads matching the DB
-use_mem : If set, copies the DB into memory for faster loading
-sff|screen_fastafile 'FASTA FILE'
Same as 's|screen' above, but uses USearch, rather than SOAPaligner2.
-fsoap ['DB1 DB2 ...',s,c,f,r]
Filter screened reads, (s)caftigs, (c)ontigs, sca(f)folds or (r)evised assembly scaftigs
at higher %ID and length cutoff. This step has to be run before calculating profiles if the option soap was used
Additional options:
-shm : If set, faster, but saves data for the filtering step in /dev/shm/<USER>
-psoap|pbwa ['DB1 DB2 ...',s,c,f,r] -m|mode [gene, NCBI, mOTU, functional] -o [OUTPUT FOLDER]
Generate gene, mOTU, NCBI or functional profiles on filtered reads,
(s)caftigs, (c)ontigs, sca(f)folds or (r)evised assembly scaftigs.
If -mode is set to either NCBI or mOTU, it is expected that the
reads have been correctly mapped to the corresponding databases.
Specify psoap if you used the command 'soap' previously, and 'pbwa' if you used 'bwa'.
Additional options:
-no_horizontal : No not calculate horizontal gene & functional coverages
-verbose : Prints extra information about status of profiling steps
-shm : Faster, but saves 2-5 GB of data for the profiling step in /dev/shm/<USER>
-uniq : Specify this flag if you find duplicated row names
(e.g. if you have mapped to a DB where the same reference appears multiple times)
Available modules
These are installed in the folder /nfs/data/Downloads/mocat2/stable/2.1.3/mod
Each module requires a NAME.sh and NAME.cfg file inside the NAME folder
-annotate_gene_catalog [leave empty for using sample file generated catalog or enter full path to catalog; use amino acid sequence file]
Required options:
-blasttype [should be "blastp" normally for amino acid sequences, but can be set to "blastx"]
-make_gene_catalog [samples specifed in sample file will be used ot generate catalog]
Required options:
-assembly_type [asembly or assembly.revised]
Statistics Options
Produces statistics for each lane with raw reads using the FastQC toolkit
Prints a simple view how the processing status of each sample,
and stores this in <sample_file>.status
Additional Options
-cfg|config [file]
Specify another config file than MOCAT.cfg
Only create job scripts, but don't execute them
Overrides any specified temp folders config file
-cpus [integer]
Not recommended, but specifies a fixed number of cores for each job,
please read the full manual using MOCAT.pl -man
-host [hostname]
Runs the jobs on a different host machine
-identity [integer]
Overrides any percentage cutoff setting in cfg file
-length [integer]
Overrides any length cutoff setting in cfg file
-memory XGB
If queuing system is SGE or LSF, it will require XGB of RAM for the job
This can also be set with the respective memory options by adding these
to the param fields in the config file
-config A=b C=d
Overrides setting A from the config file with b, etc