## CEGMA-EXTENDED-20062015
## BETA VERSION
## Authorized by Yuichiro Hara & Shigehiro Kuraku
 
This extended version of CEGMA allows one to perform completeness assessment using a customized gene set as reference, which facilitates regular use of a new reference gene set for vertebrates, CVG (core vertebrate genes). The extended CEGMA is compatible with CEGMA versions 2.4 and 2.5.
 
1. Installation
   a) Install CEGMA.
   Download the installation package at CEGMA home page http://korflab.ucdavis.edu/Datasets/cegma/
 
   b) Download and unpack the downloaded file of the CVG suite.
   $ wget  http://transcriptome.cdb.riken.jp/reptiliomix/resources/CVG_20062015.tar.gz
   $ tar zxf CVG_20062015.tar.gz
 
   c) Copy the perl scripts and the CVG directory to the directory in which the original CEGMA is installed.
   $ cd CVG_20062015/cegma-extension
   (for CEGMA ver 2.4)
   $ cp ver2.4/*.mod.pl /Path/to/cegma_v2.4.010312/bin/
   $ cp -r CVG /Path/to/cegma_v2.4.010312/
   (for CEGMA ver 2.5)
   $ cp ver2.5/*.mod.pl /Path/to/CEGMA_v2.5/bin/
   $ cp -r CVG /Path/to/CEGMA_v2.5/
  
 
2. Usage
   After setting environments for CEGMA, you can use the extended CEGMA program as well as the original CEGMA package.
   Environmental setting is described in the README file of the original CEGMA package and the cegma.conf file in this package.
 
   When you run CEGMA with a customized gene set, you have to set a directory in which HMMer profiles (*.hmm), a peptide FASTA file, and the cutoff files for conventional CEGMA (corresponding to profiles_cutoff.tbl) and the completeness assessment (corresponding to completeness_cutoff.tbl) are installed.
   You can run cegma.mod.pl, an extended version of cegma, with specifying the cutoff file for the completeness analysis with the --complete_file option.
   $ cegma.mod.pl --protein /Path/to/cegma-estension/CVGs/chorNOG_aln_core_ag_shark.fa --hmm_profiles /Path/to/cegma-estension/CVGs/hmm_profiles --prot_num 8 --hmm_prefix chorNOG --genome transcripts.fa -o output --ext --cutoff_file /Path/to/cegma-estension/CVGs/data/profile_cutoff.users.tbl --complete_file /Path/to/cegma-estension/CVGs/data/completeness_cutoff.users.tbl -v  --interlen 50000 --boundaries 10000
   To execute cegma.mod.pl on a vertebrate genome assembly, we recommend to set options '--interlen 50000 --boundaries 10000' for referring to the CVGs instead of '--mam' or '--vrt' options for referring to the CEGs.
   To execute cegma.mod.pl on a transcriptome assembly, these options are not necessary.
 
   You can run cegma.mod.pl using the batch.sh file with modification for your examination.
   $ vi cegma.conf
     [modify the environment valiables]
   $ vi batch.sh
     [modify the name of input genome/transcriptome assembly file]
   $ ./batch.sh
  
   You can also use your original core gene set for the completeness assessment.
 
3. Copyrights
 
Original CEGMA was written by Genis Parra and has been maintained by Keith Bradnam.
The CVG, the new reference gene set for vertebrates, and modified cegma scripts were prepared by Yuichiro Hara (yuichiro.hara@riken.jp) and Shigehiro Kuraku (shigehiro.kuraku@riken.jp) in Phyloinformatics Unit, RIKEN CLST. More information on this project is found at http://www.clst.riken.jp/phylo/reptiliomix.html
 
4. Citation
   Please cite the articles below when you present or publish any data based on the use of CEGMA with this extended function.
  
   - Original CEGMA program
   Genis Parra, Keith Bradnam and Ian Korf. 2007. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes.
   Bioinformatics, 23: 1061-1067
 
   - Completeness assessment based on CEGMA
   Genis Parra, Keith Bradnam, Zemin Ning, Thomas Keane, and Ian Korf. 2009. Assessing the gene space in draft genomes.
   Nucleic Acids Research, 37(1): 298-297
 
   - Extended completeness analysis with this package and use of CVGs
   Yuichiro Hara, Kaori Tatsumi, Michio Yoshida, Eriko Kajikawa, Hiroshi Kiyonari, and Shigehiro Kuraku. Optimizing and benchmarking  de novo transcriptome sequencing: from library preparation to assembly evaluation. under review.

