Here's an example of the Hotelling's MV T-test with responses and hypothesized means. Hotelling's MV T-test allows us to use a multivariate analogue of the t-test, comparing the observed values with some hypothesized mean. In this, similar to the univariate case, we assume that the population is Multivariate Normally Distributed, that is:

$$ \vec{x} \sim MVN(\vec{\mu}, \Sigma) $$

where $\Sigma$ is a square, symmetric matrix.

This test is implemented below.

In [4]:

## Hotelling's t-test for multivariate normal.

prob_null <- function(data, hypothesized_mean_vector) {
  n = nrow(data)
  p = ncol(data)
  tsqobs <- hotellings_tsq_statistic(data, hypothesized_mean_vector)
  f_dist <- hotellings_f_statistic_trans(tsqobs, n, p)

  return(1 - pf(f_dist['f'], f_dist['df1'], f_dist['df2']))
}

hotellings_f_statistic_trans <- function(tsqobs, n, p) {
  f <- ((n - p)/(p*(n - 1))) * tsqobs
  df1 <- p
  df2 <- n - p
  return(c(f = f,df1 = df1,df2 = df2))
}

hotellings_tsq_statistic <- function(data, hypothesized_mean_vector) {
  sample_mean <- apply(data,2,mean)
  sample_covar <- cov(data)

  S_inv <- solve(sample_covar)
  n <- nrow(data)
  mu <- hypothesized_mean_vector

  tsqobs <- n*t(sample_mean - mu) %*% S_inv %*% (sample_mean - mu)
  return(tsqobs)
}

find_discriminant <- function(data, hypothesized_mean_vector) {
  sample_mean <- apply(data,2,mean)
  sample_covar <- cov(data)

  discriminant <- solve(sample_covar) %*% (sample_mean - hypothesized_mean_vector)
  return(discriminant)
}

In [5]:

response_data <- matrix(c(51,27,37,42,27,
                          43,41,38,36,26,
                          29,36,20,22,36,
                          18,32,22,21,23,
                          31,20,50,26,41,
                          32,33,43,36,31,
                          27,31,25,35,17,
                          37,34,14,35,25,
                          20,25,32,26,42,
                          27,30,27,29,40,
                          38,16,28,36,25),ncol = 5)
hypothesized_mean <- c(30,25,40,25,30)
response_data

prob_null(response_data, hypothesized_mean)
find_discriminant(response_data, hypothesized_mean)

51	36	50	35	42
27	20	26	17	27
37	22	41	37	30
42	36	32	34	27
27	18	33	14	29
43	32	43	35	40
41	22	36	25	38
38	21	31	20	16
36	23	27	25	28
26	31	31	32	36
29	20	25	26	25

f: 0.00669952528414886

0.5298893

-0.2659554

-0.6946549

0.1483265

0.3206404

From this, because the p value is less than .05, we reject the hypothesis that this data comes from a normally distributed population with mean vector: $$ \vec{\mu} = \begin{bmatrix} 30 \\ 25 \\ 40 \\ 25 \\ 30 \end{bmatrix} $$

The discriminant indicates that the third variable contributes most to the difference between the hypothesized and sample mean.

In [6]:

require(MASS)
ex <- mvrnorm(11, mu = hypothesized_mean, Sigma = cov(response_data))
prob_null(ex, hypothesized_mean)
find_discriminant(ex, hypothesized_mean)

f: 0.684532555387565

-0.03199313

-0.07721548

0.09417471

0.09898568

-0.13109563

Now we see that a multivariate normal distribution sampled with mean equal to the hypothesized mean and Sigma equal to the covariance of the observed data provides unsignificant results -- as expected.

Now, let's see the t-test for the equality of means, assuming an equal covariance matrix and sample sizes.

In [7]:

differences <- response_data - ex
prob_null(differences, rep(0,5))
find_discriminant(differences, rep(0,5))

f: 0.00818771246890448

0.35457592

-0.09934875

-0.70226005

0.01886289

0.33948696

We see the expected result: that the two samples produce significant results -- meaning that there is evidence to reject the null hypothesis that they come from a distribution with the same mean vector.

Sean Ammirati - creator of Stats Works. He can be reached on Github, LinkedIn and email.

Comments

Hotelling's T-Test Example

Published

Category

Tags

Contact