Author: Michael R. Crusoe
Description: fix spelling errors
--- rsem.orig/rsem-calculate-expression
+++ rsem/rsem-calculate-expression
@@ -666,11 +666,11 @@
 
 =item B<--sampling-for-bam>
 
-When RSEM generates a BAM file, instead of outputing all alignments a read has with their posterior probabilities, one alignment is sampled according to the posterior probabilities. The sampling procedure includes the alignment to the "noise" transcript, which does not appear in the BAM file. Only the sampled alignment has a weight of 1. All other alignments have weight 0. If the "noise" transcript is sampled, all alignments appeared in the BAM file should have weight 0. (Default: off)
+When RSEM generates a BAM file, instead of outputting all alignments a read has with their posterior probabilities, one alignment is sampled according to the posterior probabilities. The sampling procedure includes the alignment to the "noise" transcript, which does not appear in the BAM file. Only the sampled alignment has a weight of 1. All other alignments have weight 0. If the "noise" transcript is sampled, all alignments appeared in the BAM file should have weight 0. (Default: off)
 
 =item B<--seed> <uint32>
 
-Set the seed for the random number generators used in calculating posterior mean estimates and credibility intervals. The seed must be a non-negative 32 bit interger. (Default: off)
+Set the seed for the random number generators used in calculating posterior mean estimates and credibility intervals. The seed must be a non-negative 32 bit integer. (Default: off)
 
 =item B<--single-cell-prior>
 
@@ -776,7 +776,7 @@
 
 =item B<--sort-bam-by-read-name>
 
-Sort BAM file aligned under transcript coordidate by read name. Setting this option on will produce determinstic maximum likelihood estimations from independet runs. Note that sorting will take long time and lots of memory. (Default: off)
+Sort BAM file aligned under transcript coordidate by read name. Setting this option on will produce deterministic maximum likelihood estimations from independet runs. Note that sorting will take long time and lots of memory. (Default: off)
 
 =item B<--sort-bam-buffer-size> <string>
 
--- rsem.orig/README.md
+++ rsem/README.md
@@ -492,7 +492,7 @@
 obtain an accurate gene-isoform relationship. Instead, RSEM provides a
 script `rsem-generate-ngvector`, which clusters transcripts based on
 measures directly relating to read mappaing ambiguity. First, it
-calcualtes the 'unmappability' of each transcript. The 'unmappability'
+calculates the 'unmappability' of each transcript. The 'unmappability'
 of a transcript is the ratio between the number of k mers with at
 least one perfect match to other transcripts and the total number of k
 mers of this transcript, where k is a parameter. Then, Ng vector is
--- rsem.orig/rsem-generate-ngvector
+++ rsem/rsem-generate-ngvector
@@ -69,7 +69,7 @@
 
 =head1 DESCRIPTION
 
-This program generates the Ng vector required by EBSeq for isoform level differential expression analysis based on reference sequences only. EBSeq can take variance due to read mapping ambiguity into consideration by grouping isoforms with parent gene's number of isoforms. However, for de novo assembled transcriptome, it is hard to obtain an accurate gene-isoform relationship. Instead, this program groups isoforms by using measures on read mappaing ambiguity directly. First, it calcualtes the 'unmappability' of each transcript. The 'unmappability' of a transcript is the ratio between the number of k mers with at least one perfect match to other transcripts and the total number of k mers of this transcript, where k is a parameter. Then, Ng vector is generated by applying Kmeans algorithm to the 'unmappability' values with number of clusters set as 3. 'rsem-generate-ngvector' will make sure the mean 'unmappability' scores for clusters are in ascending order. All transcripts whose lengths are less than k are assigned to cluster 3.   
+This program generates the Ng vector required by EBSeq for isoform level differential expression analysis based on reference sequences only. EBSeq can take variance due to read mapping ambiguity into consideration by grouping isoforms with parent gene's number of isoforms. However, for de novo assembled transcriptome, it is hard to obtain an accurate gene-isoform relationship. Instead, this program groups isoforms by using measures on read mappaing ambiguity directly. First, it calculates the 'unmappability' of each transcript. The 'unmappability' of a transcript is the ratio between the number of k mers with at least one perfect match to other transcripts and the total number of k mers of this transcript, where k is a parameter. Then, Ng vector is generated by applying Kmeans algorithm to the 'unmappability' values with number of clusters set as 3. 'rsem-generate-ngvector' will make sure the mean 'unmappability' scores for clusters are in ascending order. All transcripts whose lengths are less than k are assigned to cluster 3.   
 
 If your reference is a de novo assembled transcript set, you should run 'rsem-generate-ngvector' first. Then load the resulting 'output_name.ngvec' into R. For example, you can use
 
