T-Test Calculator
Enter the sample mean, population mean, sample standard deviation, and sample size to compute the t-statistic and approximate p-value for a one-sample t-test.
The t-test is the most-used statistical test for comparing means. The one-sample t-test compares a sample mean to a hypothesized population value to see if they differ significantly. Independent two-sample t-test compares two group means. Paired t-test compares matched pairs (before/after measurements on same subjects). The Student's t-distribution accounts for the additional uncertainty when estimating population SD from a sample.
This calculator performs the one-sample t-test: t = (x̄ - μ₀) / (s/√n), where μ₀ is the hypothesized population mean. The result includes the t-statistic and an approximate p-value (from a normal approximation). For exact p-values, use statistical software with t-distribution CDF.
Compared to z-test: t-test uses sample standard deviation (s) and the t-distribution. As sample size grows, t approaches z. For n > 30, the two tests give very similar results. For smaller samples (n < 30), the t-distribution has heavier tails to account for additional uncertainty in the SD estimate.
Inputs
Results
T-Statistic
1.8257
Degrees of Freedom
29
P-Value (Two-Tail)
0.009738
P-Value (One-Tail)
0.004869
Standard Error
2.7386
Decision
Reject H0 (p < 0.05)
95% Confidence Interval
99.3858 to 110.6142
Formula
How to use this calculator
- Enter sample mean.
- Enter hypothesized population mean (the value under null hypothesis).
- Enter sample standard deviation.
- Enter sample size.
- Calculator returns t-statistic and approximate p-value.
- For exact p-values, use statistical software with t-distribution.
Worked examples
Drug effectiveness test
**Scenario:** New drug claims 50% pain reduction. Test with 25 patients: mean reduction 55%, SD 12%. Is the actual mean different from 50? **Calculation:** t = (55 - 50) / (12 / √25) = 5/2.4 = 2.08. df = 24. Two-tailed p ≈ 0.048. **Result:** P = 0.048 < α = 0.05 (just barely). Reject null hypothesis. There is statistically significant evidence the drug's actual effect differs from the claimed 50% (in this case, higher). However, "just barely" significance suggests caution; replicate before clinical claims.
Manufacturing tolerance check
**Scenario:** Spec calls for 100 g parts. Sample of 60 parts: mean 99.8 g, SD 1.2 g. Is the process on target? **Calculation:** t = (99.8 - 100) / (1.2 / √60) = -0.2 / 0.155 = -1.29. df = 59. Two-tailed p ≈ 0.20. **Result:** P = 0.20 > 0.05. Fail to reject null. Insufficient evidence that process is off target. With current data, can't conclude process is wrong. But: small effect might be detected with larger sample.
Customer satisfaction baseline
**Scenario:** Historical mean satisfaction: 7.5. New survey of 100 customers: mean 7.8, SD 1.5. Has satisfaction improved? **Calculation:** One-tailed test (H₁: μ > 7.5). t = (7.8 - 7.5) / (1.5 / √100) = 0.3 / 0.15 = 2.0. df = 99. One-tailed p ≈ 0.024. **Result:** P = 0.024 < α = 0.05. Reject null hypothesis. Statistically significant evidence customer satisfaction improved over baseline. Effect size: d = 0.20 (small). Statistically significant but practical importance is modest.
When to use this calculator
**Use t-test for:**
- **Comparing sample mean to known value**: one-sample. - **Comparing two group means**: independent two-sample. - **Before/after measurements on same subjects**: paired. - **Continuous data**: not for proportions or categorical. - **Approximately normal distributions**: especially small samples.
**Choosing the right t-test:**
| Scenario | Test | |---|---| | Sample vs hypothesized value | One-sample t | | Two independent groups | Two-sample t | | Paired measurements (before/after) | Paired t | | Two samples, unequal variances | Welch's t | | Three or more groups | ANOVA | | Non-normal data | Mann-Whitney U | | Paired non-normal | Wilcoxon signed-rank |
**Welch's t-test (unequal variances):**
When two-sample groups have very different variances, use Welch's:
t = (x̄₁ - x̄₂) / √(s²₁/n₁ + s²₂/n₂)
With Satterthwaite-Welch degrees of freedom adjustment.
**Robustness:**
T-test is moderately robust to: - Mild non-normality (especially with n > 30). - Unequal variances (use Welch's). - Outliers (consider robust alternatives).
T-test is sensitive to: - Severe non-normality with small samples. - Heavy outliers. - Heteroskedasticity (changing variance).
**Common decisions:**
| If... | Then... | |---|---| | p < α | Reject null hypothesis | | p ≥ α | Fail to reject null | | p just above α | Consider sample size, replicate | | Large t with large n | Likely significant | | Small t with small n | Likely not significant | | Large effect size, p > α | Need more data |
**Common errors:**
- Using t-test for proportions. Use z-test for proportions. - Using t for non-independent observations. - Skipping normality check for small samples. - Reporting t without df and p-value. - Forgetting one-tailed vs two-tailed distinction.
**Power analysis:**
Power = probability of detecting a true effect when one exists. - Increase power: larger sample, lower α, larger true effect. - Target 80% power for most research. - Sample size calculation: depends on effect size and significance level.
**Modern statistical practice:**
- Report effect sizes (Cohen's d) alongside p-values. - Use confidence intervals to convey precision. - Don't equate "significant" with "important." - Consider replication; one study is rarely conclusive.
**Statistical software:**
- **Excel**: T.TEST() function. - **R**: t.test(). - **Python (scipy.stats)**: ttest_1samp, ttest_ind, ttest_rel. - **SPSS**: T-Test menu. - **SAS**: PROC TTEST.
**Common confusions:**
- **T-test vs z-test**: t-test when SD estimated from sample; z-test when population SD known. - **Two-tailed vs one-tailed**: two-tailed is more conservative; default unless prior reason. - **Paired vs independent**: depends on study design. - **Effect size vs significance**: separate concepts; both matter.
Common mistakes to avoid
- Using t-test for non-numeric data. Use chi-square or other appropriate test.
- Skipping normality check for small samples. Use Shapiro-Wilk or visual inspection.
- Forgetting one-tailed vs two-tailed. Conservative default is two-tailed.
- Using same data for hypothesis generation and testing. Causes inflated Type I error.
- Treating "just barely significant" as definitively important. Effect size matters more.
- Reporting t without df. Include df = n - 1 for one-sample.
- Comparing t-statistics directly across samples of different size. df changes interpretation.
Frequently Asked Questions
Sources & further reading
Related Calculators
Z-Score Calculator
Calculate the z-score from a value, population mean, and standard deviation.
P-Value Calculator
Calculate the p-value from a z-score or t-score for hypothesis testing.
Hypothesis Testing Calculator
Perform a z-test for hypothesis testing with a decision at your chosen significance level.