Bam file format reference name

Bam file format reference name software#

–outFileNamePrefix: the path for the output directory and prefix of all output files.

Bam file format reference name software#

This ”unsorted” file cannot be directly used with downstream software such as HTseq, without the need of name sorting.” We therefore prefer the option BAM SortedB圜oordinate “The paired ends of an alignment are always adjacent, and multiple alignments of a read are adjacent as well. Default is BAM Unsorted STAR outputs unsorted file(s). if reads are compressed or not ( –readFilesCommand).To use STAR for the read alignment (default –runMode option), we have to specify the following options: Here: min(14, log2(170805979/2)-1) =~ 12.6Īligning reads to the genome (and counting them at the same time!) If genome is small, should be scaled down as: min(14, log2(GenomeLength)/2 - 1). sjdbGTFfile ~/rnaseq_course/reference_genome/reference_chr6/Homo_sapiens.GRCh38.88.chr6.gtf \ genomeFastaFiles ~/rnaseq_course/reference_genome/reference_chr6/Homo_6.fa \ $RUN STAR -runMode genomeGenerate -genomeDir index_star_chr6 \ # create the index and store it in ~/rnaseq_course/mapping/index_star_chr6 # create sub-folder where index will be generated Here: min(14, log2(170805979/2)-1) =~ 12.6īuilding the STAR index (option –runMode genomeGenerate): # go to mapping folder NOTE that for small genomes, parameter –genomeSAindexNbases (default 14) should be scaled down as: min(14, log2(GenomeLength)/2 - 1). –runThreadN allows you to parallelize the job.Otherwise a drop in aligned reads can be experienced. This also means that for every different read-length to be aligned a new STAR index needs to be generated.In our case, since the read size is 49 bases, we can accept maximum 48 bases on one side and one base on the other of a splicing site that is, to set up this parameter to 48.It usually equals the minimum read size minus 1 it tells STAR what is the maximum possible stretch of sequence that can be found on one side of a spicing site.To index the genome with STAR for RNA-seq analysis, the sjdbOverhang option needs to be specified for detecting possible splicing sites: Once the index is built, do not forget to remove those unzipped files! How much (in percentage) disk space is saved when those two files are kept zipped vs unzipped?

Zcat Homo_sapiens.GRCh38.88. > Homo_sapiens.GRCh38.88.chr6.gtf # unzip files (keep original zipped file) # go to reference_genome folderĬd ~/rnaseq_course/reference_genome/reference_chr6

We already downloaded the FASTA and GTF files (in ~/rnaseq_course/reference_genome) needed for the indexing. To make an index for STAR, we need both the genome sequence in FASTA format and the annotation in GTF format.Īs STAR is very resource consuming, we will create an index for chromosome 6 only (and hope that it will work!). For the STAR running options, see STAR Manual.