Comparison differential gene expression with salmon, kallisto HTseq-count

Introduction

A single read 50 RNA-seq experiment (NEB, riboZero) was used to compare diffential expression with 3 strategies: HtSeq-count-gene (union, version 0.6.1p1), salmon-gene (version : 0.4.0, --libType SR --unmatedReads --extraSensitive --biasCorrect, but the bias corrected file was not used) and kallisto-gene (kallisto 0.42.1). The reads were aligned to mm10 using star-aligner. Ensembl GRCh38 78 gtf was used as annotation. For salmon (non bias corrected) and kallisto the indices were generated from Homo_sapiens.GRCh38.cdna.all.fa.gz and Homo_sapiens.GRCh38.ncrna.fa.gz. The experiment consisted of 4 conditions with 2 replicates each. Gene counts were derived from kallisto with a home made script by summing estimated counts/transcripts per gene. Differential expression was called using DESeq2.

Only the comparison for one condition is shown. The other conditions had similar results regarding htseq/kallisto/salmon.

The bottom line is that the differences are neglegible.

Alignment Statistics

Alignment statistics show a nice stranded undegraded preparation

Scatter Plots

Scatter plots show high correlation between the different methods. Numbers are spearman correlation of plotted values.

comparison differential expression

Summary

60% of all the differentially expressed genes are found by all 3 methods. In addition to this 8% are found by htseq and kallisto or salmon and kallisto. Another 8% is found only by HtSeq-count. There are only 3% genes found by only kallisto and 6% by only salmon as being differentially expressed. If the methods would be completely independent and genes called by just one method would count as false positives, I would conclude that HtSeq-count has the highest rate of false positives. Kallisto results encompass both sailfish and HtSeq-count, with a very low rate of probably unreliable single method calls. This study is only anecdotal evidence because the ground truth is not known. The optimum would be if somebody integrates kallisto and salmon into seqc where a little bit of ground truth is known. PS: The first version of this study used salmon v0.3.0 which had dramatically different results.