When should I use a t-test vs a z-test?

Use a t-test when the population standard deviation is unknown and you use the sample standard deviation instead. Use a z-test when the population standard deviation is known or the sample size is very large (n > 30). As sample size grows, t approaches z, so for large samples the difference is negligible.

What does the p-value mean?

The p-value is the probability of observing a result as extreme as the one obtained, assuming the null hypothesis is true. A small p-value (< 0.05) suggests the result is statistically significant — unlikely if null hypothesis is true. It does NOT mean the probability the null is true.

How do I know if my data meets t-test assumptions?

Check independence (no measurement influence), approximate normality (especially for small samples), and equal variances (for two-sample). For non-normal small samples, use non-parametric alternatives (Mann-Whitney, Wilcoxon). For unequal variances, use Welch's t-test.

What's the difference between paired and independent t-test?

Paired t-test compares matched pairs (before/after on same subjects, or twin pairs). Independent t-test compares two unrelated groups. Paired test is more powerful because it accounts for within-pair variation. Choose based on study design.

How do I choose between one-tailed and two-tailed?

Two-tailed is the conservative default — tests for any difference. One-tailed has more power but tests only one direction; use only with strong prior reason for direction. In doubt, choose two-tailed. Note: one-tailed p-value is half of two-tailed.

What if my sample is too small?

T-test is robust for n > 30 even with non-normal data. For n < 30, check normality first. If non-normal, use non-parametric tests (Wilcoxon). Increasing sample size strengthens any test. For small effects, target larger sample to detect.

Should I report only significance?

No — also report effect size (Cohen's d), confidence interval, and sample size. Statistical significance ≠ practical importance. A large p-value doesn't mean "no effect"; it means insufficient evidence to detect. Full reporting enables proper interpretation.

T-Test Calculator

Q: How do I know if my data meets t-test assumptions?

Check independence (no measurement influence), approximate normality (especially for small samples), and equal variances (for two-sample). For non-normal small samples, use non-parametric alternatives (Mann-Whitney, Wilcoxon). For unequal variances, use Welch's t-test.

Q: What's the difference between paired and independent t-test?

Paired t-test compares matched pairs (before/after on same subjects, or twin pairs). Independent t-test compares two unrelated groups. Paired test is more powerful because it accounts for within-pair variation. Choose based on study design.

Q: How do I choose between one-tailed and two-tailed?

Two-tailed is the conservative default — tests for any difference. One-tailed has more power but tests only one direction; use only with strong prior reason for direction. In doubt, choose two-tailed. Note: one-tailed p-value is half of two-tailed.

Q: What if my sample is too small?

T-test is robust for n > 30 even with non-normal data. For n < 30, check normality first. If non-normal, use non-parametric tests (Wilcoxon). Increasing sample size strengthens any test. For small effects, target larger sample to detect.

Q: Should I report only significance?

No — also report effect size (Cohen's d), confidence interval, and sample size. Statistical significance ≠ practical importance. A large p-value doesn't mean "no effect"; it means insufficient evidence to detect. Full reporting enables proper interpretation.

Enter the sample mean, population mean, sample standard deviation, and sample size to compute the t-statistic and approximate p-value for a one-sample t-test.

The t-test is the most-used statistical test for comparing means. The one-sample t-test compares a sample mean to a hypothesized population value to see if they differ significantly. Independent two-sample t-test compares two group means. Paired t-test compares matched pairs (before/after measurements on same subjects). The Student's t-distribution accounts for the additional uncertainty when estimating population SD from a sample.

This calculator performs the one-sample t-test: t = (x̄ - μ₀) / (s/√n), where μ₀ is the hypothesized population mean. The result includes the t-statistic and an approximate p-value (from a normal approximation). For exact p-values, use statistical software with t-distribution CDF.

Compared to z-test: t-test uses sample standard deviation (s) and the t-distribution. As sample size grows, t approaches z. For n > 30, the two tests give very similar results. For smaller samples (n < 30), the t-distribution has heavier tails to account for additional uncertainty in the SD estimate.

Inputs

Sample Mean

Population Mean (H0)

Sample Std Dev (s)

Sample Size (n)

Results

T-Statistic

1.8257

Degrees of Freedom

P-Value (Two-Tail)

0.009738

P-Value (One-Tail)

0.004869

Standard Error

2.7386

Decision

Reject H0 (p < 0.05)

95% Confidence Interval

99.3858 to 110.6142

Last updated: May 29, 2026

Formula

**One-sample t-statistic:** t = (x̄ - μ₀) / (s / √n) Where: - **x̄**: sample mean - **μ₀**: hypothesized population mean (under H₀) - **s**: sample standard deviation - **n**: sample size **Degrees of freedom:** df = n - 1 **Worked example: sample mean 105, hypothesized 100, s = 15, n = 30** t = (105 - 100) / (15 / √30) = 5 / 2.74 = 1.83 For df = 29 and two-tailed test, p ≈ 0.077. At α = 0.05 (two-tailed), this is just barely non-significant. **Two-sample independent t-test:** t = (x̄₁ - x̄₂) / √(s²₁/n₁ + s²₂/n₂) **Paired t-test (matched pairs):** t = d̄ / (sd / √n) Where d̄ is mean of differences, sd is SD of differences. **Critical t-values for common alpha levels:** | df | α = 0.05 (one-tail) | α = 0.05 (two-tail) | α = 0.01 (two-tail) | |---|---|---|---| | 10 | 1.812 | 2.228 | 3.169 | | 20 | 1.725 | 2.086 | 2.845 | | 30 | 1.697 | 2.042 | 2.750 | | 50 | 1.676 | 2.009 | 2.678 | | ∞ (z) | 1.645 | 1.960 | 2.576 | **T-test assumptions:** 1. **Independence**: observations don't influence each other. 2. **Normality**: data approximately normal (more important for small n). 3. **Equal variances** (for two-sample): may need Welch's t-test if unequal. 4. **Random sampling**: representative of population. **When normality matters:** - **n > 30**: t-test robust to non-normality (Central Limit Theorem). - **n < 30**: check normality (histogram, Q-Q plot, Shapiro-Wilk test). - **Non-normal small samples**: use non-parametric Wilcoxon rank-sum or sign test instead. **Steps for one-sample t-test:** 1. Calculate sample mean (x̄), SD (s). 2. State null hypothesis (H₀): μ = μ₀ (specific value). 3. State alternative hypothesis (H₁): μ ≠ μ₀ (two-tailed) or μ > μ₀ or μ < μ₀ (one-tailed). 4. Choose significance level α (usually 0.05). 5. Compute t-statistic. 6. Find critical value or p-value. 7. Compare: |t| > critical value or p < α → reject H₀. 8. Interpret in context. **Reporting:** "A one-sample t-test indicated that the sample mean of 105 (SD = 15, n = 30) was not significantly different from the hypothesized value of 100, t(29) = 1.83, p = 0.077 (two-tailed)." Include: - Sample mean and SD. - Sample size. - Hypothesized value. - Test statistic with df. - p-value. - Significance decision. - Tail direction. **Power and sample size:** For detecting effect size d at α = 0.05, two-tailed, 80% power: | Effect size (d) | Required n | |---|---| | 0.2 (small) | ~196 | | 0.5 (medium) | ~32 | | 0.8 (large) | ~13 | **Effect size (Cohen's d):** d = (x̄ - μ₀) / s - d < 0.2: trivial - 0.2 ≤ d < 0.5: small - 0.5 ≤ d < 0.8: medium - d ≥ 0.8: large Always report effect size alongside p-value. **Common variations:** | Test | Use | |---|---| | One-sample t | Sample vs known value | | Independent two-sample t | Compare two groups | | Paired t | Before/after or matched pairs | | Welch's t-test | Two samples with unequal variances | | Trimmed t-test | Robust to outliers | | Permutation t-test | Non-parametric alternative |

How to use this calculator

Enter sample mean.
Enter hypothesized population mean (the value under null hypothesis).
Enter sample standard deviation.
Enter sample size.
Calculator returns t-statistic and approximate p-value.
For exact p-values, use statistical software with t-distribution.

Worked examples

Drug effectiveness test

**Scenario:** New drug claims 50% pain reduction. Test with 25 patients: mean reduction 55%, SD 12%. Is the actual mean different from 50? **Calculation:** t = (55 - 50) / (12 / √25) = 5/2.4 = 2.08. df = 24. Two-tailed p ≈ 0.048. **Result:** P = 0.048 < α = 0.05 (just barely). Reject null hypothesis. There is statistically significant evidence the drug's actual effect differs from the claimed 50% (in this case, higher). However, "just barely" significance suggests caution; replicate before clinical claims.

Manufacturing tolerance check

**Scenario:** Spec calls for 100 g parts. Sample of 60 parts: mean 99.8 g, SD 1.2 g. Is the process on target? **Calculation:** t = (99.8 - 100) / (1.2 / √60) = -0.2 / 0.155 = -1.29. df = 59. Two-tailed p ≈ 0.20. **Result:** P = 0.20 > 0.05. Fail to reject null. Insufficient evidence that process is off target. With current data, can't conclude process is wrong. But: small effect might be detected with larger sample.

Customer satisfaction baseline

**Scenario:** Historical mean satisfaction: 7.5. New survey of 100 customers: mean 7.8, SD 1.5. Has satisfaction improved? **Calculation:** One-tailed test (H₁: μ > 7.5). t = (7.8 - 7.5) / (1.5 / √100) = 0.3 / 0.15 = 2.0. df = 99. One-tailed p ≈ 0.024. **Result:** P = 0.024 < α = 0.05. Reject null hypothesis. Statistically significant evidence customer satisfaction improved over baseline. Effect size: d = 0.20 (small). Statistically significant but practical importance is modest.

When to use this calculator

**Use t-test for:**

- **Comparing sample mean to known value**: one-sample. - **Comparing two group means**: independent two-sample. - **Before/after measurements on same subjects**: paired. - **Continuous data**: not for proportions or categorical. - **Approximately normal distributions**: especially small samples.

**Choosing the right t-test:**

| Scenario | Test | |---|---| | Sample vs hypothesized value | One-sample t | | Two independent groups | Two-sample t | | Paired measurements (before/after) | Paired t | | Two samples, unequal variances | Welch's t | | Three or more groups | ANOVA | | Non-normal data | Mann-Whitney U | | Paired non-normal | Wilcoxon signed-rank |

**Welch's t-test (unequal variances):**

When two-sample groups have very different variances, use Welch's:

t = (x̄₁ - x̄₂) / √(s²₁/n₁ + s²₂/n₂)

With Satterthwaite-Welch degrees of freedom adjustment.

**Robustness:**

T-test is moderately robust to: - Mild non-normality (especially with n > 30). - Unequal variances (use Welch's). - Outliers (consider robust alternatives).

T-test is sensitive to: - Severe non-normality with small samples. - Heavy outliers. - Heteroskedasticity (changing variance).

**Common decisions:**

| If... | Then... | |---|---| | p < α | Reject null hypothesis | | p ≥ α | Fail to reject null | | p just above α | Consider sample size, replicate | | Large t with large n | Likely significant | | Small t with small n | Likely not significant | | Large effect size, p > α | Need more data |

**Common errors:**

- Using t-test for proportions. Use z-test for proportions. - Using t for non-independent observations. - Skipping normality check for small samples. - Reporting t without df and p-value. - Forgetting one-tailed vs two-tailed distinction.

**Power analysis:**

Power = probability of detecting a true effect when one exists. - Increase power: larger sample, lower α, larger true effect. - Target 80% power for most research. - Sample size calculation: depends on effect size and significance level.

**Modern statistical practice:**

- Report effect sizes (Cohen's d) alongside p-values. - Use confidence intervals to convey precision. - Don't equate "significant" with "important." - Consider replication; one study is rarely conclusive.

**Statistical software:**

- **Excel**: T.TEST() function. - **R**: t.test(). - **Python (scipy.stats)**: ttest_1samp, ttest_ind, ttest_rel. - **SPSS**: T-Test menu. - **SAS**: PROC TTEST.

**Common confusions:**

- **T-test vs z-test**: t-test when SD estimated from sample; z-test when population SD known. - **Two-tailed vs one-tailed**: two-tailed is more conservative; default unless prior reason. - **Paired vs independent**: depends on study design. - **Effect size vs significance**: separate concepts; both matter.

Common mistakes to avoid

Using t-test for non-numeric data. Use chi-square or other appropriate test.
Skipping normality check for small samples. Use Shapiro-Wilk or visual inspection.
Forgetting one-tailed vs two-tailed. Conservative default is two-tailed.
Using same data for hypothesis generation and testing. Causes inflated Type I error.
Treating "just barely significant" as definitively important. Effect size matters more.
Reporting t without df. Include df = n - 1 for one-sample.
Comparing t-statistics directly across samples of different size. df changes interpretation.

T-Test Calculator

Inputs

Results

Formula

How to use this calculator

Worked examples

Drug effectiveness test

Manufacturing tolerance check

Customer satisfaction baseline

When to use this calculator

Common mistakes to avoid

Frequently Asked Questions

Sources & further reading

Related Calculators

Z-Score Calculator

P-Value Calculator

Hypothesis Testing Calculator