Title: | Minimum-Hypergeometric Test |
---|---|
Description: | Runs a minimum-hypergeometric (mHG) test as described in: Eden, E. (2007). Discovering Motifs in Ranked Lists of DNA Sequences. Haifa. |
Authors: | Kobi Perl |
Maintainer: | Kobi Perl <[email protected]> |
License: | GPL-2 |
Version: | 1.1 |
Built: | 2025-02-14 03:40:36 UTC |
Source: | https://github.com/cran/mHG |
Sometimes when running a hypergeometric test to check for enrichment for a feature in a group versus the background, the separation between the group and the background is done arbitrarily by setting a threshold on some other property.
When the correct threshold is unknown, different thresholds can be tried, and the minimal p-value of the hypergeometric tests can be retreived.
If the elements can be sorted according to the property, it is possible to perform the hypergeometric tests on groups of increasing size.
The minimum over all the tests is the minimum hypergeometric statistic, or mHG.
The mHG is not a p-value by itself, as multiple tests were performed, without correcting for this.
The package provides means to calculate the statistic (mHG.statistic.calc), to fix the p-value (mHG.pval.calc) or to perform the entire test at once (mHG.test).
This is an R implementation of the algorithm described in:
Eden, E. (2007). Discovering Motifs in Ranked Lists of DNA Sequences. Haifa.
Retrieved from http://bioinfo.cs.technion.ac.il/people/zohar/thesis/eran.pdf
Package: | mHG |
Type: | Package |
Version: | 1.0 |
Date: | 2015-05-18 |
License: | GPL-2 |
Depends: | methods |
The package provides means to calculate the statistic (mHG.statistic.calc
), to fix the p-value (mHG.pval.calc
) or to perform the entire test at once (mHG.test
).
Kobi Perl <[email protected]>
Eden, E. (2007). Discovering Motifs in Ranked Lists of DNA Sequences. Haifa. Retrieved from http://bioinfo.cs.technion.ac.il/people/zohar/thesis/eran.pdf
mHG.statistic.calc
mHG.pval.calc
mHG.test
N <- 50 B <- 15 lambdas <- numeric(50) lambdas[sample(N, B)] <- 1 t <- mHG.test(lambdas) t <- mHG.test(lambdas, n_max = 20)
N <- 50 B <- 15 lambdas <- numeric(50) lambdas[sample(N, B)] <- 1 t <- mHG.test(lambdas) t <- mHG.test(lambdas, n_max = 20)
Calculates the p-value associated with the (minimum-hypergeometric) mHG statistic.
mHG.pval.calc(p, N, B, n_max = N)
mHG.pval.calc(p, N, B, n_max = N)
p |
the mHG statistic. It is marked as p as it represents an "uncorrected" p-value. |
N |
total number of white and black balls (according to the hypergeometric problem definition). |
B |
number of black balls. |
n_max |
the algorithm will calculate the p-value under the assumption that only the
first |
running time,
space.
the p-value of the test.
Kobi Perl
Eden, E. (2007). Discovering Motifs in Ranked Lists of DNA Sequences. Haifa. Retrieved from http://bioinfo.cs.technion.ac.il/people/zohar/thesis/eran.pdf (pages 11-12, 19-20)
N <- 50 B <- 15 lambdas <- numeric(50) lambdas[sample(N, B)] <- 1 p <- mHG.statistic.calc(lambdas)@mHG p.corrected <- mHG.pval.calc(p, N, B) # Could have used mHG.test directly
N <- 50 B <- 15 lambdas <- numeric(50) lambdas[sample(N, B)] <- 1 p <- mHG.statistic.calc(lambdas)@mHG p.corrected <- mHG.pval.calc(p, N, B) # Could have used mHG.test directly
Calculates the minimum-hypergeometric (mHG) statistic.
mHG definition:
Where HGT is the hypergeometric tail: ,
and .
mHG.statistic.calc(lambdas, n_max = length(lambdas))
mHG.statistic.calc(lambdas, n_max = length(lambdas))
lambdas |
|
n_max |
the algorithm will only consider the first |
running time,
space.
Instance of the class mHG.statistic.info
(stores the statistics, and for which n and it was obtained).
If several n give the same mHG, the smallest one is chosen.
Kobi Perl
Eden, E. (2007). Discovering Motifs in Ranked Lists of DNA Sequences. Haifa. Retrieved from http://bioinfo.cs.technion.ac.il/people/zohar/thesis/eran.pdf (pages 10-11, 18-19)
N <- 50 B <- 15 lambdas <- numeric(50) lambdas[sample(N, B)] <- 1 mHG.statistic.info <- mHG.statistic.calc(lambdas)@mHG
N <- 50 B <- 15 lambdas <- numeric(50) lambdas[sample(N, B)] <- 1 mHG.statistic.info <- mHG.statistic.calc(lambdas)@mHG
"mHG.statistic.info"
Summarizes data about the minimum-hypergeometric (mHG) statistic of a {0,1}^N vector.
Objects can be created by calls of the form new("mHG.statistic.info", ...)
.
mHG
:The actual statistic.
n
:The index in which the minimum was obtained.
b
:.
No methods defined with class "mHG.statistic.info" in the signature.
Kobi Perl
Eden, E. (2007). Discovering Motifs in Ranked Lists of DNA Sequences. Haifa. Retrieved from http://bioinfo.cs.technion.ac.il/people/zohar/thesis/eran.pdf (page 10)
showClass("mHG.statistic.info")
showClass("mHG.statistic.info")
Performs a minimum-hypergeometric (mHG) test. The null-hypothesis is that provided list was randomly and equiprobable selected from all lists containing N entries, B of which are 1s. The alternative hypothesis is that the 1s tend to appear at the top of the list.
mHG.test(lambdas, n_max = length(lambdas))
mHG.test(lambdas, n_max = length(lambdas))
lambdas |
|
n_max |
the algorithm will only consider the first |
running time,
space.
A list with class "htest" containing the following components:
statistic |
The mHG statistic. |
p.value |
The p-value for the test. |
parameters |
|
n |
The index for which the mHG was obtained (smallest one if several n give the same mHG). |
b |
|
Kobi Perl
Eden, E. (2007). Discovering Motifs in Ranked Lists of DNA Sequences. Haifa. Retrieved from http://bioinfo.cs.technion.ac.il/people/zohar/thesis/eran.pdf (pages 10-12, 18-20)
N <- 50 B <- 15 lambdas <- numeric(50) lambdas[sample(N, B)] <- 1 t <- mHG.test(lambdas) t <- mHG.test(lambdas, n_max = 20)
N <- 50 B <- 15 lambdas <- numeric(50) lambdas[sample(N, B)] <- 1 t <- mHG.test(lambdas) t <- mHG.test(lambdas, n_max = 20)