Escolar Documentos
Profissional Documentos
Cultura Documentos
Aggregation
Methods
Conclusion
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Outline
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Introduction
Aggregation
Methods
Conclusion
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Introduction
Aggregation
Methods
Conclusion
This talk
Methodology still under development Not many methods and packages available Focus on principles How should gene set testing be done? Parallels and dierences with microarray gene set testing Many parallels, a few dierences
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Outline
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
In abstract sense Gene set testing = testing at higher levels of aggregation Dierent ways of aggregating Aggregating data Aggregating evidence Aggregating evidence With or without taking direction of eect into account
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Data aggregation
exon 1 exon 2 . . . exon k whole gene Subj. 1 34 44 . . . 16 236 Subj. 2 18 32 . . . 5 187 ... Subj. n 26 8 . . . 9 74 p-value
0.01
Data aggregation Simplest way of aggregation Assumption: all exons have same dierential expression Ignores between exon-variation Result dominated by exons with high coverage
Testing groups of genes Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Evidence aggregation
gene 1 gene 2 . . . gene k whole pathway Evidence aggregation More sophisticated way of aggregation Genes dierential expression may be very dierent Direction of dierential expression is lost In result all genes equally important
Testing groups of genes Jelle Goeman
Subj. 1 34 44 . . . 16
Subj. 2 18 32 . . . 5
...
Subj. n 26 8 . . . 9
Introduction
Aggregation
Methods
Conclusion
Introduction
Aggregation
Methods
Conclusion
Outline
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Enrichment type methods Compare set of genes to other genes Keep subjects/samples xed H0 : this gene set behaves just like the rest of the genes Aggregate score methods Compare subjects to other (hypothetical) subjects Keep set of genes xed H0 : this gene set behaves like one that is not di. expressed
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Competitive method
Null hypothesis Genes in the gene set are as often dierentially expressed as genes outside Alternative hypothesis Genes in the gene set are more often dierentially expressed as genes outside Competitive Null and alternative compare to genes outside gene set
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Test row/column association with Fishers exact test Equivalent variants 2 test; hypergeometric test; binomial z -test Overview Khatri and Dhragici, Bioinformatics, 2005
Testing groups of genes Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Assumptions of enrichment
Assumption: equal probabilities of being called di. expressed Violation: more power for long genes (coverage) Consequence: gene sets with long genes called enriched Assumption: equal probabilities of being called di. expressed Violation: more power highly expressed genes Consequence: gene sets high expression called enriched Assumption: independence of genes in gene sets Violation: correlation of expression within pathway likely Consequence: gene sets with dependence called enriched
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
0.25
q q q q q q q
q q
q qq qq qq q
0.20
Proportion DE
q q q q q q qq q q qq q q qq q q q q q q q q q q qq q qq q q q q q q q q q q
q q
0.15
q q qq q
0.10
q q q q
0.05
q q q
0.00
2000
4000
6000
8000
10000
12000
14000
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Introduction
Aggregation
Methods
Conclusion
Permutation testing
True data 0 x1,1 . . . 0 x1,2 . . . 0 x1,3 . . . 1 x1,4 . . . 1 x1,5 . . . 1 x1,6 . . .
xp,1 xp,2 xp,3 xp,4 xp,5 xp,6 Random permutation of sample labels 1 x1,1 . . . 0 x1,2 . . . 1 x1,3 . . . 1 x1,4 . . . 0 x1,5 . . . 0 x1,6 . . .
Introduction
Aggregation
Methods
Conclusion
Permutation testing
Advantages Almost assumption-free Always correct p-values No problems taking into account gene correlations Disadvantages Not possible if experimental design too complex Computationally very intensive Often there is no alternative If you want reliable p-values
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Null hypothesis There is no dierential expression in the reads of the gene set Alternative hypothesis There is some dierential expression in the reads of the gene set Self-contained Null and alternative do not involve genes outside gene set
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Introduction
Aggregation
Methods
Conclusion
Multiple testing
Testing more than one gene set Multiple testing comes back into play use FDR (Benjamini and Hochberg) or Permutation-based: Westfall & Young Be selective if you dare The more gene sets tested, the less power per gene set
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Outline
Jelle Goeman
Introduction
Aggregation
Methods
Conclusion
Work in progress
Field under development Few methods available that are specic for RNA-seq Choice of methodology Lessons from microarray data analysis remain valid Prefer an aggregate score method Rely on permutations if you can aord them
Jelle Goeman