Title: | Baumgartner Weiss Schindler Test of Equal Distributions |
---|---|
Description: | Performs the 'Baumgartner-Weiss-Schindler' two-sample test of equal probability distributions, <doi:10.2307/2533862>. Also performs similar rank-based tests for equal probability distributions due to Neuhauser <doi:10.1080/10485250108832874> and Murakami <doi:10.1080/00949655.2010.551516>. |
Authors: | Steven E. Pav [aut, cre] |
Maintainer: | Steven E. Pav <[email protected]> |
License: | LGPL-3 |
Version: | 0.2.3 |
Built: | 2024-11-08 04:42:59 UTC |
Source: | https://github.com/shabbychef/BWStest |
Baumgartner Weiss Schindler test.
The Baumgartner Weiss Schindler test is a two sample test of the null that the samples come from the same probability distribution, similar to the Kolmogorv-Smirnov, Wilcoxon, and Cramer-Von Mises tests. It is similar to the Cramer-Von Mises test in that it estimates the square norm of the difference in CDFs of the two samples. However, the Baumgartner Weiss Schindler test weights the integral by the variance of the difference in CDFs, "[emphasizing] the tails of the distributions, which increases the power of the test for a lot of applications."
BWStest is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
Steven E. Pav [email protected]
W. Baumgartner, P. Weiss, H. Schindler, 'A nonparametric test for the general two-sample problem', Biometrics 54, no. 3 (Sep., 1998): pp. 1129-1135. doi:10.2307/2533862
M. Neuhauser, 'Exact tests based on the Baumgartner-Weiss-Schindler Statistic–a survey', Statistical Papers 46, no. 1 (2005): pp. 1-30. doi:10.1007/BF02762032
M. Neuhauser, 'One-sided two-sample and trend tests based on a modified Baumgartner-Weiss-Schindler statistic', J. Nonparametric Statistics 13, no. 5 (2001): pp 729-739. doi:10.1080/10485250108832874
H. Murakami, 'K-sample rank test based on modified Baumgartner statistic and its power comparison', J. Jpn. Comp. Statist. 19, no. 1 (2006): pp. 1-13. doi:10.5183/jjscs1988.19.1
H. Murakami, 'Modified Baumgartner Statistics for the two-sample and multisample problems: a numerical comparison', J. Stat. Comp. and Sim. 82, no. 5 (2012): pp. 711-728. doi:10.1080/00949655.2010.551516
H. Murakami, 'Lepage type statistic based on the modified Baumgartner statistic', Comp. Stat. & Data Analysis 51 (2007): pp 5061-5067. doi:10.1016/j.csda.2006.04.026
Computes the CDF of the Baumgartner-Weiss-Schindler test statistic under the null hypothesis of equal distributions.
bws_cdf(b, maxj = 5L, lower_tail = TRUE)
bws_cdf(b, maxj = 5L, lower_tail = TRUE)
b |
a vector of BWS test statistics. |
maxj |
the maximum value of j to take in the approximate computation of the CDF via equation (2.5). Baumgartner et. al. claim that a value of 3 is sufficient. |
lower_tail |
boolean, when |
Given value , computes the CDF of the BWS statistic under
the null, denoted as
by
Baumgartner et al. The CDF is computed from
equation (2.5) via numerical quadrature.
The expression for the CDF contains the integral
By making the change of variables , this can
be re-expressed as an integral of the form
for some function involving
and
.
This integral can be approximated
via Gaussian quadrature using Chebyshev nodes (of the first kind), which
is the approach we take here.
A vector of the CDF of ,
.
Steven E. Pav [email protected]
W. Baumgartner, P. Weiss, H. Schindler, 'A nonparametric test for the general two-sample problem', Biometrics 54, no. 3 (Sep., 1998): pp. 1129-1135. doi:10.2307/2533862
# do it 500 times set.seed(123) bvals <- replicate(500, bws_stat(rnorm(50),rnorm(50))) pvals <- bws_cdf(bvals) # these should be uniform! plot(ecdf(pvals)) # compare to Table 1 of Baumgartner et al. bvals <- c(1.933,2.493,3.076,3.880,4.500,5.990) tab1v <- c(0.9,0.95,0.975,0.990,0.995,0.999) pvals <- bws_cdf(bvals,lower_tail=TRUE) show(data.frame(B=bvals,BWS_psi=tab1v,our_psi=pvals))
# do it 500 times set.seed(123) bvals <- replicate(500, bws_stat(rnorm(50),rnorm(50))) pvals <- bws_cdf(bvals) # these should be uniform! plot(ecdf(pvals)) # compare to Table 1 of Baumgartner et al. bvals <- c(1.933,2.493,3.076,3.880,4.500,5.990) tab1v <- c(0.9,0.95,0.975,0.990,0.995,0.999) pvals <- bws_cdf(bvals,lower_tail=TRUE) show(data.frame(B=bvals,BWS_psi=tab1v,our_psi=pvals))
Compute the Baumgartner-Weiss-Schindler test statistic.
bws_stat(x, y)
bws_stat(x, y)
x |
a vector. |
y |
a vector. |
Given vectors and
, computes
and
as
described by Baumgartner et al., returning their average,
.
The test statistic approximates the variance-weighted square norm of the
difference in CDFs of the two distributions. For sufficiently large sample
sizes (more than 20, say), under the null the test statistic approaches the asymptotic
value computed in
bws_cdf
.
The test value is an approximation of
where (
) is the number of elements in
(
), and
(
) is the CDF of
(
).
The test statistic is based only on the ranks of the input. If the same
monotonic transform is applied to both vectors, the result should be unchanged.
Moreover, the test is inherently two-sided, so swapping and
should also leave the test statistic unchanged.
The BWS test statistic, .
Steven E. Pav [email protected]
W. Baumgartner, P. Weiss, H. Schindler, 'A nonparametric test for the general two-sample problem', Biometrics 54, no. 3 (Sep., 1998): pp. 1129-1135. doi:10.2307/2533862
set.seed(1234) x <- runif(1000) y <- runif(100) bval <- bws_stat(x,y) # check a monotonic transform: ftrans <- function(x) { log(1 + x) } bval2 <- bws_stat(ftrans(x),ftrans(y)) stopifnot(all.equal(bval,bval2)) # check commutivity bval3 <- bws_stat(y,x) stopifnot(all.equal(bval,bval3))
set.seed(1234) x <- runif(1000) y <- runif(100) bval <- bws_stat(x,y) # check a monotonic transform: ftrans <- function(x) { log(1 + x) } bval2 <- bws_stat(ftrans(x),ftrans(y)) stopifnot(all.equal(bval,bval2)) # check commutivity bval3 <- bws_stat(y,x) stopifnot(all.equal(bval,bval3))
Perform the Baumgartner-Weiss-Schindler hypothesis test.
bws_test( x, y, method = c("default", "BWS", "Neuhauser", "B1", "B2", "B3", "B4", "B5"), alternative = c("two.sided", "greater", "less") )
bws_test( x, y, method = c("default", "BWS", "Neuhauser", "B1", "B2", "B3", "B4", "B5"), alternative = c("two.sided", "greater", "less") )
x |
a vector of the first sample. |
y |
a vector of the second sample. |
method |
a character string specifying the test statistic to use. should be one of the following:
Only Neuhauser's test supports one-sided alternatives. |
alternative |
a character string specifying the alternative hypothesis,
must be one of “two.sided” (default), “greater” or
“less”. You can specify just the initial letter.
“greater” corresponds to testing whether the survival function
of |
Object of class htest
, a list of the test statistic,
the p-value, and the method
noted.
The code will happily compute Murakami's through
for large sample sizes, even though nominal coverage is not achieved.
A warning will be thrown. User assumes all risk relying on results from this
function.
Steven E. Pav [email protected]
W. Baumgartner, P. Weiss, H. Schindler, 'A nonparametric test for the general two-sample problem', Biometrics 54, no. 3 (Sep., 1998): pp. 1129-1135. doi:10.2307/2533862
bws_test
, bws_stat
,
murakami_stat
, murakami_cdf
.
# under the null set.seed(123) x <- rnorm(100) y <- rnorm(100) hval <- bws_test(x,y) # under the alternative set.seed(123) x <- rnorm(100) y <- rnorm(100,mean=1.0) hval <- bws_test(x,y) show(hval) stopifnot(hval$p.value < 0.05) # under the alternative with a one sided test. set.seed(123) x <- rnorm(100) y <- rnorm(100,mean=0.7) hval <- bws_test(x,y,alternative='less') show(hval) stopifnot(hval$p.value < 0.01) hval <- bws_test(x,y,alternative='greater') stopifnot(hval$p.value > 0.99) hval <- bws_test(x,y,alternative='two.sided') stopifnot(hval$p.value < 0.05)
# under the null set.seed(123) x <- rnorm(100) y <- rnorm(100) hval <- bws_test(x,y) # under the alternative set.seed(123) x <- rnorm(100) y <- rnorm(100,mean=1.0) hval <- bws_test(x,y) show(hval) stopifnot(hval$p.value < 0.05) # under the alternative with a one sided test. set.seed(123) x <- rnorm(100) y <- rnorm(100,mean=0.7) hval <- bws_test(x,y,alternative='less') show(hval) stopifnot(hval$p.value < 0.01) hval <- bws_test(x,y,alternative='greater') stopifnot(hval$p.value > 0.99) hval <- bws_test(x,y,alternative='two.sided') stopifnot(hval$p.value < 0.05)
News for package 'BWStest'
Update doi links.
Package maintenance–no new features.
Package maintenance–no new features.
move github figures to location CRAN understands.
package initialization mumbo jumbo, see Rcpp issue 636.
Adding Murakami statistics.
First CRAN release.
Start work
Estimates the CDF of the Murakami test statistics via permutations.
murakami_cdf(B, n1, n2, flavor = 0L, lower_tail = TRUE)
murakami_cdf(B, n1, n2, flavor = 0L, lower_tail = TRUE)
B |
the Murakami test statistic or a vector of the same. |
n1 |
number of elements in the first sample. |
n2 |
number of elements in the second sample. |
flavor |
the 'flavor' of the test statistic. See
|
lower_tail |
boolean, when |
Given the Murakami test statistic for
,
computes the CDF under the null that the two samples come from the same
distribution. The CDF is computed by permutation test and memoization.
a vector of the same size as B
of the CDF under the null.
the CDF is approximately computed by evaluating the permutations up to some reasonably small sample size (currently the cutoff is 9). When larger sample sizes are used, the distribution of the test statistic may not converge. This is apparently seen in flavors 3 through 5.
Steven E. Pav [email protected]
W. Baumgartner, P. Weiss, H. Schindler, 'A nonparametric test for the general two-sample problem', Biometrics 54, no. 3 (Sep., 1998): pp. 1129-1135. doi:10.2307/2533862
M. Neuhauser, 'Exact tests based on the Baumgartner-Weiss-Schindler Statistic–a survey', Statistical Papers 46, no. 1 (2005): pp. 1-30. doi:10.1007/BF02762032
M. Neuhauser, 'One-sided two-sample and trend tests based on a modified Baumgartner-Weiss-Schindler statistic', J. Nonparametric Statistics 13, no. 5 (2001): pp 729-739. doi:10.1080/10485250108832874
H. Murakami, 'K-sample rank test based on modified Baumgartner statistic and its power comparison', J. Jpn. Comp. Statist. 19, no. 1 (2006): pp. 1-13. doi:10.5183/jjscs1988.19.1
H. Murakami, 'Modified Baumgartner Statistics for the two-sample and multisample problems: a numerical comparison', J. Stat. Comp. and Sim. 82, no. 5 (2012): pp. 711-728. doi:10.1080/00949655.2010.551516
H. Murakami, 'Lepage type statistic based on the modified Baumgartner statistic', Comp. Stat. & Data Analysis 51 (2007): pp 5061-5067. doi:10.1016/j.csda.2006.04.026
# basic usage: xv <- seq(0,4,length.out=101) yv <- murakami_cdf(xv, n1=8, n2=6, flavor=1L) plot(xv,yv) zv <- bws_cdf(xv) lines(xv,zv,col='red') # check under the null: flavor <- 1L n1 <- 8 n2 <- 8 set.seed(1234) Bvals <- replicate(2000,murakami_stat(rnorm(n1),rnorm(n2),flavor)) # should be uniform: plot(ecdf(murakami_cdf(Bvals,n1,n2,flavor)))
# basic usage: xv <- seq(0,4,length.out=101) yv <- murakami_cdf(xv, n1=8, n2=6, flavor=1L) plot(xv,yv) zv <- bws_cdf(xv) lines(xv,zv,col='red') # check under the null: flavor <- 1L n1 <- 8 n2 <- 8 set.seed(1234) Bvals <- replicate(2000,murakami_stat(rnorm(n1),rnorm(n2),flavor)) # should be uniform: plot(ecdf(murakami_cdf(Bvals,n1,n2,flavor)))
Compute one of the modified Baumgartner-Weiss-Schindler test statistics proposed by Murakami, or Neuhauser.
murakami_stat(x, y, flavor = 0L) murakami_stat_perms(nx, ny, flavor = 0L)
murakami_stat(x, y, flavor = 0L) murakami_stat_perms(nx, ny, flavor = 0L)
x |
a vector of the first sample. |
y |
a vector of the second sample. |
flavor |
which ‘flavor’ of test statistic. |
nx |
the length of |
ny |
the length of |
Given vectors and
, computes
and
for some
as described by Murakami and by Neuhauser, returning either their
their average or their average distance.
The test statistics approximate the weighted square norm of the
difference in CDFs of the two distributions.
The test statistic is based only on the ranks of the input. If the same monotonic transform is applied to both vectors, the result should be unchanged.
The various ‘flavor’s of test statistic are:
The statistic of Baumgartner-Weiss-Schindler.
Murakami's statistic, from his 2006 paper.
Neuhauser's difference statistic, denoted by Murakami as in his
2012 paper.
Murakami's statistic, from his 2012 paper.
Murakami's statistic, from his 2012 paper.
Murakami's statistic, from his 2012 paper, with a log weighting.
The BWS test statistic, . For
murakami_stat_perms
, a vector of
the test statistics for all permutations of the input.
NA
and NaN
are not yet dealt with!
Steven E. Pav [email protected]
W. Baumgartner, P. Weiss, H. Schindler, 'A nonparametric test for the general two-sample problem', Biometrics 54, no. 3 (Sep., 1998): pp. 1129-1135. doi:10.2307/2533862
M. Neuhauser, 'Exact tests based on the Baumgartner-Weiss-Schindler Statistic–a survey', Statistical Papers 46, no. 1 (2005): pp. 1-30. doi:10.1007/BF02762032
M. Neuhauser, 'One-sided two-sample and trend tests based on a modified Baumgartner-Weiss-Schindler statistic', J. Nonparametric Statistics 13, no. 5 (2001): pp 729-739. doi:10.1080/10485250108832874
H. Murakami, 'K-sample rank test based on modified Baumgartner statistic and its power comparison', J. Jpn. Comp. Statist. 19, no. 1 (2006): pp. 1-13. doi:10.5183/jjscs1988.19.1
H. Murakami, 'Modified Baumgartner Statistics for the two-sample and multisample problems: a numerical comparison', J. Stat. Comp. and Sim. 82, no. 5 (2012): pp. 711-728. doi:10.1080/00949655.2010.551516
H. Murakami, 'Lepage type statistic based on the modified Baumgartner statistic', Comp. Stat. & Data Analysis 51 (2007): pp 5061-5067. doi:10.1016/j.csda.2006.04.026
set.seed(1234) x <- runif(1000) y <- runif(100) bval <- murakami_stat(x,y,1) nx <- 6 ny <- 5 # monte carlo set.seed(1234) repli <- replicate(3000,murakami_stat(rnorm(nx),rnorm(ny),0L)) # under the null, perform the permutation test: allem <- murakami_stat_perms(nx,ny,0L) plot(ecdf(allem)) lines(ecdf(repli),col='red')
set.seed(1234) x <- runif(1000) y <- runif(100) bval <- murakami_stat(x,y,1) nx <- 6 ny <- 5 # monte carlo set.seed(1234) repli <- replicate(3000,murakami_stat(rnorm(nx),rnorm(ny),0L)) # under the null, perform the permutation test: allem <- murakami_stat_perms(nx,ny,0L) plot(ecdf(allem)) lines(ecdf(repli),col='red')