CalcMountain

T-Test Calculator

Enter the sample mean, population mean, sample standard deviation, and sample size to compute the t-statistic and approximate p-value for a one-sample t-test.

The t-test is the most-used statistical test for comparing means. The one-sample t-test compares a sample mean to a hypothesized population value to see if they differ significantly. Independent two-sample t-test compares two group means. Paired t-test compares matched pairs (before/after measurements on same subjects). The Student's t-distribution accounts for the additional uncertainty when estimating population SD from a sample.

This calculator performs the one-sample t-test: t = (x̄ - μ₀) / (s/√n), where μ₀ is the hypothesized population mean. The result includes the t-statistic and an approximate p-value (from a normal approximation). For exact p-values, use statistical software with t-distribution CDF.

Compared to z-test: t-test uses sample standard deviation (s) and the t-distribution. As sample size grows, t approaches z. For n > 30, the two tests give very similar results. For smaller samples (n < 30), the t-distribution has heavier tails to account for additional uncertainty in the SD estimate.

Inputs

Results

T-Statistic

1.8257

Degrees of Freedom

29

P-Value (Two-Tail)

0.009738

P-Value (One-Tail)

0.004869

Standard Error

2.7386

Decision

Reject H0 (p < 0.05)

95% Confidence Interval

99.3858 to 110.6142

Last updated:

Formula

**One-sample t-statistic:** t = (x̄ - μ₀) / (s / √n) Where: - **x̄**: sample mean - **μ₀**: hypothesized population mean (under H₀) - **s**: sample standard deviation - **n**: sample size **Degrees of freedom:** df = n - 1 **Worked example: sample mean 105, hypothesized 100, s = 15, n = 30** t = (105 - 100) / (15 / √30) = 5 / 2.74 = 1.83 For df = 29 and two-tailed test, p ≈ 0.077. At α = 0.05 (two-tailed), this is just barely non-significant. **Two-sample independent t-test:** t = (x̄₁ - x̄₂) / √(s²₁/n₁ + s²₂/n₂) **Paired t-test (matched pairs):** t = d̄ / (sd / √n) Where d̄ is mean of differences, sd is SD of differences. **Critical t-values for common alpha levels:** | df | α = 0.05 (one-tail) | α = 0.05 (two-tail) | α = 0.01 (two-tail) | |---|---|---|---| | 10 | 1.812 | 2.228 | 3.169 | | 20 | 1.725 | 2.086 | 2.845 | | 30 | 1.697 | 2.042 | 2.750 | | 50 | 1.676 | 2.009 | 2.678 | | ∞ (z) | 1.645 | 1.960 | 2.576 | **T-test assumptions:** 1. **Independence**: observations don't influence each other. 2. **Normality**: data approximately normal (more important for small n). 3. **Equal variances** (for two-sample): may need Welch's t-test if unequal. 4. **Random sampling**: representative of population. **When normality matters:** - **n > 30**: t-test robust to non-normality (Central Limit Theorem). - **n < 30**: check normality (histogram, Q-Q plot, Shapiro-Wilk test). - **Non-normal small samples**: use non-parametric Wilcoxon rank-sum or sign test instead. **Steps for one-sample t-test:** 1. Calculate sample mean (x̄), SD (s). 2. State null hypothesis (H₀): μ = μ₀ (specific value). 3. State alternative hypothesis (H₁): μ ≠ μ₀ (two-tailed) or μ > μ₀ or μ < μ₀ (one-tailed). 4. Choose significance level α (usually 0.05). 5. Compute t-statistic. 6. Find critical value or p-value. 7. Compare: |t| > critical value or p < α → reject H₀. 8. Interpret in context. **Reporting:** "A one-sample t-test indicated that the sample mean of 105 (SD = 15, n = 30) was not significantly different from the hypothesized value of 100, t(29) = 1.83, p = 0.077 (two-tailed)." Include: - Sample mean and SD. - Sample size. - Hypothesized value. - Test statistic with df. - p-value. - Significance decision. - Tail direction. **Power and sample size:** For detecting effect size d at α = 0.05, two-tailed, 80% power: | Effect size (d) | Required n | |---|---| | 0.2 (small) | ~196 | | 0.5 (medium) | ~32 | | 0.8 (large) | ~13 | **Effect size (Cohen's d):** d = (x̄ - μ₀) / s - d < 0.2: trivial - 0.2 ≤ d < 0.5: small - 0.5 ≤ d < 0.8: medium - d ≥ 0.8: large Always report effect size alongside p-value. **Common variations:** | Test | Use | |---|---| | One-sample t | Sample vs known value | | Independent two-sample t | Compare two groups | | Paired t | Before/after or matched pairs | | Welch's t-test | Two samples with unequal variances | | Trimmed t-test | Robust to outliers | | Permutation t-test | Non-parametric alternative |

How to use this calculator

  1. Enter sample mean.
  2. Enter hypothesized population mean (the value under null hypothesis).
  3. Enter sample standard deviation.
  4. Enter sample size.
  5. Calculator returns t-statistic and approximate p-value.
  6. For exact p-values, use statistical software with t-distribution.

Worked examples

Drug effectiveness test

**Scenario:** New drug claims 50% pain reduction. Test with 25 patients: mean reduction 55%, SD 12%. Is the actual mean different from 50? **Calculation:** t = (55 - 50) / (12 / √25) = 5/2.4 = 2.08. df = 24. Two-tailed p ≈ 0.048. **Result:** P = 0.048 < α = 0.05 (just barely). Reject null hypothesis. There is statistically significant evidence the drug's actual effect differs from the claimed 50% (in this case, higher). However, "just barely" significance suggests caution; replicate before clinical claims.

Manufacturing tolerance check

**Scenario:** Spec calls for 100 g parts. Sample of 60 parts: mean 99.8 g, SD 1.2 g. Is the process on target? **Calculation:** t = (99.8 - 100) / (1.2 / √60) = -0.2 / 0.155 = -1.29. df = 59. Two-tailed p ≈ 0.20. **Result:** P = 0.20 > 0.05. Fail to reject null. Insufficient evidence that process is off target. With current data, can't conclude process is wrong. But: small effect might be detected with larger sample.

Customer satisfaction baseline

**Scenario:** Historical mean satisfaction: 7.5. New survey of 100 customers: mean 7.8, SD 1.5. Has satisfaction improved? **Calculation:** One-tailed test (H₁: μ > 7.5). t = (7.8 - 7.5) / (1.5 / √100) = 0.3 / 0.15 = 2.0. df = 99. One-tailed p ≈ 0.024. **Result:** P = 0.024 < α = 0.05. Reject null hypothesis. Statistically significant evidence customer satisfaction improved over baseline. Effect size: d = 0.20 (small). Statistically significant but practical importance is modest.

When to use this calculator

**Use t-test for:**

- **Comparing sample mean to known value**: one-sample. - **Comparing two group means**: independent two-sample. - **Before/after measurements on same subjects**: paired. - **Continuous data**: not for proportions or categorical. - **Approximately normal distributions**: especially small samples.

**Choosing the right t-test:**

| Scenario | Test | |---|---| | Sample vs hypothesized value | One-sample t | | Two independent groups | Two-sample t | | Paired measurements (before/after) | Paired t | | Two samples, unequal variances | Welch's t | | Three or more groups | ANOVA | | Non-normal data | Mann-Whitney U | | Paired non-normal | Wilcoxon signed-rank |

**Welch's t-test (unequal variances):**

When two-sample groups have very different variances, use Welch's:

t = (x̄₁ - x̄₂) / √(s²₁/n₁ + s²₂/n₂)

With Satterthwaite-Welch degrees of freedom adjustment.

**Robustness:**

T-test is moderately robust to: - Mild non-normality (especially with n > 30). - Unequal variances (use Welch's). - Outliers (consider robust alternatives).

T-test is sensitive to: - Severe non-normality with small samples. - Heavy outliers. - Heteroskedasticity (changing variance).

**Common decisions:**

| If... | Then... | |---|---| | p < α | Reject null hypothesis | | p ≥ α | Fail to reject null | | p just above α | Consider sample size, replicate | | Large t with large n | Likely significant | | Small t with small n | Likely not significant | | Large effect size, p > α | Need more data |

**Common errors:**

- Using t-test for proportions. Use z-test for proportions. - Using t for non-independent observations. - Skipping normality check for small samples. - Reporting t without df and p-value. - Forgetting one-tailed vs two-tailed distinction.

**Power analysis:**

Power = probability of detecting a true effect when one exists. - Increase power: larger sample, lower α, larger true effect. - Target 80% power for most research. - Sample size calculation: depends on effect size and significance level.

**Modern statistical practice:**

- Report effect sizes (Cohen's d) alongside p-values. - Use confidence intervals to convey precision. - Don't equate "significant" with "important." - Consider replication; one study is rarely conclusive.

**Statistical software:**

- **Excel**: T.TEST() function. - **R**: t.test(). - **Python (scipy.stats)**: ttest_1samp, ttest_ind, ttest_rel. - **SPSS**: T-Test menu. - **SAS**: PROC TTEST.

**Common confusions:**

- **T-test vs z-test**: t-test when SD estimated from sample; z-test when population SD known. - **Two-tailed vs one-tailed**: two-tailed is more conservative; default unless prior reason. - **Paired vs independent**: depends on study design. - **Effect size vs significance**: separate concepts; both matter.

Common mistakes to avoid

  • Using t-test for non-numeric data. Use chi-square or other appropriate test.
  • Skipping normality check for small samples. Use Shapiro-Wilk or visual inspection.
  • Forgetting one-tailed vs two-tailed. Conservative default is two-tailed.
  • Using same data for hypothesis generation and testing. Causes inflated Type I error.
  • Treating "just barely significant" as definitively important. Effect size matters more.
  • Reporting t without df. Include df = n - 1 for one-sample.
  • Comparing t-statistics directly across samples of different size. df changes interpretation.

Frequently Asked Questions

Sources & further reading

Related Calculators