T Test in Python: Easily Test Hypothesis in Python

T test in Python is a statistical method to test any hypothesis. Moreover, It helps you validate your study in terms of statistical values. There are three types of T-tests you can perform, and we are going to cover all three and their implementation in Python.

What Is T Test?

First, let’s understand what the T test is, also known as the student’s test. It is an inferential statistical approach to finding the relation between two samples using their means and variances. T test is basically used to accept or reject a null hypothesis H0.. However, to accept or reject the null hypothesis depends on the P value. Mainly if the P > alpha value which in most cases is 0.05, we reject the null hypothesis and consider that there is a significant difference between the two samples.

If you want to understand more about statistics in terms of programming, check out this post.

Types Of T Test In Python

There are four types of T test you can perform in Python. They are as follows:

  1. One sample T test
  2. Two sample T test (paired)
  3. Two sample T test (independent)
  4. Welch T test

Let’s understand each of the tests and how we can implement every single of the tests accordingly.

One Sample Test

In one sample T test, we usually test the difference between a mean of the sample from a particular group and a mean that we know or we have hypothesized. For example, we hypothesize the mean height of a person in a classroom of 25 students of 5feet. Further, we carry out a T test to know if the mean height is actually 5 feet or not.

Formula

T Test Sample Formula
T test formula for one sample test.

Where x is the sample mean, μ is hypothesized or known to mean, s is the sample standard deviation and n is the sample size.

Two Sample Test (paired)

In two sample test, which is paired, we carry out a T test between two means of samples that we take from the same population or group. For example, we apply pesticide on one part of a crop field and further take the mean of yields from the part where there is no pesticide and from the part where the pesticide is applied.

Formula

Two Sample Test (paired) Formula
T test formula for two sample test (paired).

Where x1 and x2 are sample means, v1 and v2 are variances of two samples, respectively, and s1 and s2 are sample sizes.

Two Sample Test (unpaired)

On the other hand, in two sample tests unpaired, we carry out a T test between two means of samples which we take from different populations or groups. For example, similar to the last one, we apply pesticides on crops but now on another field. Further, we take samples from both the fields, the one which has pesticides and one which doesn’t. Finally, we calculate the mean of the yield given by both samples and carry out a T test between them to see if they both have some difference or not.

Formula

Two Sample Test (unpaired) Formula
T test formula for two sample tests (unpaired).

Where x1 and x2 are sample means, v1 and v2 are variances of two samples, respectively, and s1 and s2 are sample sizes.

Welch T test

The Welch test is the same as the student’s T test, but in this test, it is assumed that both the samples have different variances. Further, we can also say that the Welch test takes into account the standard deviation of both samples.

Welch T test Formula
Welch T test formula

Where x1 bar and x2 bar are sample means, s1 and s2 are standard deviations of both the samples, respectively, and N1 and N2 are sample sizes.

Implementing T Test In Python

Let’s understand how to implement the T test in Python. In this article, we will see how to implement the t test in Python using the ‘Scipy package. We will understand all the three types of tests discussed above one by one with an example.

Example 1 (Single sample)

Firstly we will discuss Implementing one sample T test in Python. Let’s take an example where we take blood samples of people who work out and measure their LDL cholesterol levels. We hypothesize the mean cholesterol level is 100mg/dL.

  • Null hypothesis: There is a difference between hypothesized mean and the actual mean.
  • Alternate hypothesis: There is no difference between hypothesized and actual mean.

Steps to implement the test:

  1. Firstly import the scipy library
  2. From scipy import stats
  3. define your alpha value
  4. Use the syntax given in the example code to calculate the p and t value
  5. perform the T test and check the results.
from scipy import stats
cholesterol =[90, 102, 108, 97, 104, 103, 84, 94, 98, 92, 104, 88, 92, 110]
u =120
t, p =stats.ttest_1samp(cholesterol, u)
p_value=float("{:.6f}".format(p_value/2)) 
print('Test statistic is %f'%float("{:.6f}".format(tvalue)))
print(%p_value)
alpha = 0.05
if p_value<=alpha:
    print('%p_value,'<','alpha(=%.2f)'%alpha,'''We 
    reject the null hypothesis H0.)
else:
    print(p-value(=%f)'%p_value, '>', 
    'alpha(=%.2f)'%alpha,'We do not reject the null hypothesis H0.')

Output:

Test statistic is -1.155881
p-value for one tailed test is 0.134266
p-value =0.134266 > alpha =0.05 We do not reject the null hypothesis H0.

Example 2 (two sample T test paired)

Let’s take an example where we use fertilizer on one part of the field. On the other hand, another part is left as it is. Finally, we calculate the mean yields of both parts x1 and x2, respectively.

  • Null Hypothesis: There is no difference between the mean of yields from both parts of the field.
  • Alternative Hypothesis: There is a significant difference between the mean of yields from both parts of the field.
import numpy as np
from scipy import stats 

x1 = [29, 43, 34, 44, 58, 38, 67, 42, 22, 51, 49, 38, 67, 78, 28, 45, 39, 47]
x2 = [12, 17, 24, 11, 36, 42, 19, 21, 31, 14, 26, 20, 18, 10, 22, 13, 27, 32, 19]


x1_bar, x2_bar = np.mean(x1), np.mean(x2)
n1, n2 = len(x1), len(x2)
var_x1, var_x2= np.var(x1, ddof=1), np.var(x2, ddof=1)
var = ( ((n1-1)*var_sample1) + ((n2-1)*var_sample2) ) / (n1+n2-2)
std_error = np.sqrt(var * (1.0 / n1 + 1.0 / n2))
 
print("x1:",np.round(x1_bar,4))
print("x2:",np.round(x2_bar,4))
print("variance of first sample:",np.round(var_x1))
print("variance of second sample:",np.round(var_x2,4))
print("pooled sample variance:",var)
print("standard error:",std_error)
# calculate t statistics
t = abs(x1_bar - x2_bar) / std_error
print('t static:',t)
# two-tailed critical value at alpha = 0.05
t_c = stats.t.ppf(q=0.975, df=17)
print("Critical value for t two tailed:",t_c)

# one-tailed critical value at alpha = 0.05
t_c = stats.t.ppf(q=0.95, df=12)
print("Critical value for t one tailed:",t_c)

# get two-tailed p value
p_two = 2*(1-stats.t.cdf(x=t, df=12))
print("p-value for two tailed:",p_two)
 
# get one-tailed p value
p_one = 1-stats.t.cdf(x=t, df=12)
print("p-value for one tailed:",p_one)

Output:

x1: 45.5
x2: 21.7895
variance of first sample: 212.9706
variance of second sample: 77.5088
pooled sample variance: 143.30451127819546
standard error: 3.9374743728092905
t static: 6.0217601616725185
Critical value for t two tailed: 2.1098155778331806
Critical value for t one tailed: 1.782287555649159
p-value for two tailed: .4012788886922138
p-value for one tailed: .3006394443461069

P-value is less than the alpha value. Therefore, we reject the null hypothesis and conclude there is a difference between two means of yield in the different parts of the field.

In the same manner, you can perform a T test for paired samples just by changing the values of variance and number of samples.

Example 3 (two sample T test unpaired)

This T test is done when we take samples from different populations. Assume that one of the fields of the crop is covered in fertilizers and one is not. We carry out a T test to find if fertilizers make any difference between the two fields.

from scipy import stats as st
from bioinfokit.analys import get_data
df = get_data('testdata1').data
a = df.loc[df['Field'] == 'A', 'yield'].to_numpy()
b = df.loc[df['Field'] == 'B', 'yield'].to_numpy()
st.ttest_ind(a=a, b=b, equal_var=True)

Output:

Ttest_indResult(statistic=-2.407091104196024,pvalue=0.0001678342540326837)

P-value is less than the alpha value. Therefore, we reject the null hypothesis and conclude there is a difference between the two means of yield in the different fields.

Example 4 (Welch T test)

Welch T test is the same as the student’s T test but implements a T test between samples with different mean variances. For performing the Welch T test, you have to set ‘equal_var = False’.

import pandas as pd
from scipy.stats import ttest_ind

df = pd.DataFrame({'field': ['x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y'],
'yield': [71, 72, 72, 75, 78, 81, 82, 83, 89, 91, 80, 81, 82, 85, 86, 89, 92, 89, 88, 91]})

sample1 = df[df['method']=='x']
sample2 = df[df['method']=='y']

ttest_ind(sample1['score'], sample2['score'], equal_var = False)

Output:

Ttest_indResult(statistic=-2.5437898397764372,pvalue=0.02322137472781382)

T Test In Pandas

Further, let’s see how to implement the T test in pandas. We will see paired and unpaired t tests both. Firstly we will understand how to implement an unpaired two-sample T-test in pandas.

Two Sample T Test (unpaired)

We will understand using an example. Assume the same example that one of the fields of the crop is covered in fertilizers and one is not. We carry out a T test to find if fertilizers make any difference between the two fields.

import pandas as pd
from scipy.stats import ttest_ind

df = pd.DataFrame({'field': ['x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y'],
'yield': [71, 72, 72, 75, 78, 81, 82, 83, 89, 91, 80, 81, 82, 85, 86, 89, 92, 89, 88, 91]})

sample1 = df[df['method']=='x']
sample2 = df[df['method']=='y']

ttest_ind(sample1['score'], sample2['score'])

Output:

Ttest_indResult(statistic=-2.5437898397764372, pvalue=0.0162532472781382)

P-value is less than 0.5. Therefore, we can say fertilizers make a difference in the final yields of crops.

Two Sample T Test (paired)

In this T test, we will test samples taken from the same population, for example, if one part of a field is covered with fertilizers and another part of the same field is not. We will check our hypothesis if fertilizers make any difference in the final yields of both parts.

import pandas as pd
from scipy.stats import ttest_ind

df = pd.DataFrame({'field': ['x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y', 'y'],
'yield': [69, 68, 72, 75, 78, 81, 79, 83, 89, 91, 80, 81, 82, 85, 86, 89, 92, 89, 88, 91]})

sample1 = df[df['method']=='x']
sample2 = df[df['method']=='y']

ttest_rel(sample1['score'], sample2['score'])

Output:

Ttest_indResult(statistic=-2.5437898397764372, pvalue=0.0132532679343821)

We can see the p-value is less than 0.5. Therefore, we reject the null hypothesis and conclude fertilizers make a difference in the final yields of identical parts of fields.

FAQs on T Test in Python

What is the t-test used for?

The T-test is used for testing a hypothesis by comparing means of two samples.

What are the types of t-test in Python?

There are four types of t-test you can perform in Python:
one sample t-test, two sample t-test paired, two sample t-test unpaired, Welch t-test

What is the p-value for the t-test?

The p-value is the value of the probability of chance in our sample. If the p-value is less than 0.5, it means the results are not simply because of circumstance and hence shows us the validity of the test.

What is a null hypothesis?

A null hypothesis is a hypothesis we assume before testing our hypothesis. If the p-value is less than 0.5, we reject the null hypothesis and valid our own alternative hypothesis.

Conclusion

In conclusion, we can say that the T test in Python helps programmers to test their hypothesis much more quickly and give accurate results. Further, it provides programmers with statistical data of the samples, which can be of great use.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments