User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

6.1 - type i and type ii errors.

When conducting a hypothesis test there are two possible decisions: reject the null hypothesis or fail to reject the null hypothesis. You should remember though, hypothesis testing uses data from a sample to make an inference about a population. When conducting a hypothesis test we do not know the population parameters. In most cases, we don't know if our inference is correct or incorrect.

When we reject the null hypothesis there are two possibilities. There could really be a difference in the population, in which case we made a correct decision. Or, it is possible that there is not a difference in the population (i.e., \(H_0\) is true) but our sample was different from the hypothesized value due to random sampling variation. In that case we made an error. This is known as a Type I error.

When we fail to reject the null hypothesis there are also two possibilities. If the null hypothesis is really true, and there is not a difference in the population, then we made the correct decision. If there is a difference in the population, and we failed to reject it, then we made a Type II error.

Rejecting \(H_0\) when \(H_0\) is really true, denoted by \(\alpha\) ("alpha") and commonly set at .05

     \(\alpha=P(Type\;I\;error)\)

Failing to reject \(H_0\) when \(H_0\) is really false, denoted by \(\beta\) ("beta")

     \(\beta=P(Type\;II\;error)\)

Example: Trial Section  

A man goes to trial where he is being tried for the murder of his wife.

We can put it in a hypothesis testing framework. The hypotheses being tested are:

  • \(H_0\) : Not Guilty
  • \(H_a\) : Guilty

Type I error  is committed if we reject \(H_0\) when it is true. In other words, did not kill his wife but was found guilty and is punished for a crime he did not really commit.

Type II error  is committed if we fail to reject \(H_0\) when it is false. In other words, if the man did kill his wife but was found not guilty and was not punished.

Example: Culinary Arts Study Section  

Asparagus

A group of culinary arts students is comparing two methods for preparing asparagus: traditional steaming and a new frying method. They want to know if patrons of their school restaurant prefer their new frying method over the traditional steaming method. A sample of patrons are given asparagus prepared using each method and asked to select their preference. A statistical analysis is performed to determine if more than 50% of participants prefer the new frying method:

  • \(H_{0}: p = .50\)
  • \(H_{a}: p>.50\)

Type I error  occurs if they reject the null hypothesis and conclude that their new frying method is preferred when in reality is it not. This may occur if, by random sampling error, they happen to get a sample that prefers the new frying method more than the overall population does. If this does occur, the consequence is that the students will have an incorrect belief that their new method of frying asparagus is superior to the traditional method of steaming.

Type II error  occurs if they fail to reject the null hypothesis and conclude that their new method is not superior when in reality it is. If this does occur, the consequence is that the students will have an incorrect belief that their new method is not superior to the traditional method when in reality it is.

Have a thesis expert improve your writing

Check your thesis for plagiarism in 10 minutes, generate your apa citations for free.

  • Knowledge Base
  • Type I & Type II Errors | Differences, Examples, Visualizations

Type I & Type II Errors | Differences, Examples, Visualizations

Published on 18 January 2021 by Pritha Bhandari . Revised on 2 February 2023.

In statistics , a Type I error is a false positive conclusion, while a Type II error is a false negative conclusion.

Making a statistical decision always involves uncertainties, so the risks of making these errors are unavoidable in hypothesis testing .

The probability of making a Type I error is the significance level , or alpha (α), while the probability of making a Type II error is beta (β). These risks can be minimized through careful planning in your study design.

  • Type I error (false positive) : the test result says you have coronavirus, but you actually don’t.
  • Type II error (false negative) : the test result says you don’t have coronavirus, but you actually do.

Table of contents

Error in statistical decision-making, type i error, type ii error, trade-off between type i and type ii errors, is a type i or type ii error worse, frequently asked questions about type i and ii errors.

Using hypothesis testing, you can make decisions about whether your data support or refute your research predictions with null and alternative hypotheses .

Hypothesis testing starts with the assumption of no difference between groups or no relationship between variables in the population—this is the null hypothesis . It’s always paired with an alternative hypothesis , which is your research prediction of an actual difference between groups or a true relationship between variables .

In this case:

  • The null hypothesis (H 0 ) is that the new drug has no effect on symptoms of the disease.
  • The alternative hypothesis (H 1 ) is that the drug is effective for alleviating symptoms of the disease.

Then , you decide whether the null hypothesis can be rejected based on your data and the results of a statistical test . Since these decisions are based on probabilities, there is always a risk of making the wrong conclusion.

  • If your results show statistical significance , that means they are very unlikely to occur if the null hypothesis is true. In this case, you would reject your null hypothesis. But sometimes, this may actually be a Type I error.
  • If your findings do not show statistical significance, they have a high chance of occurring if the null hypothesis is true. Therefore, you fail to reject your null hypothesis. But sometimes, this may be a Type II error.

Type I and Type II error in statistics

A Type I error means rejecting the null hypothesis when it’s actually true. It means concluding that results are statistically significant when, in reality, they came about purely by chance or because of unrelated factors.

The risk of committing this error is the significance level (alpha or α) you choose. That’s a value that you set at the beginning of your study to assess the statistical probability of obtaining your results ( p value).

The significance level is usually set at 0.05 or 5%. This means that your results only have a 5% chance of occurring, or less, if the null hypothesis is actually true.

If the p value of your test is lower than the significance level, it means your results are statistically significant and consistent with the alternative hypothesis. If your p value is higher than the significance level, then your results are considered statistically non-significant.

To reduce the Type I error probability, you can simply set a lower significance level.

Type I error rate

The null hypothesis distribution curve below shows the probabilities of obtaining all possible results if the study were repeated with new samples and the null hypothesis were true in the population .

At the tail end, the shaded area represents alpha. It’s also called a critical region in statistics.

If your results fall in the critical region of this curve, they are considered statistically significant and the null hypothesis is rejected. However, this is a false positive conclusion, because the null hypothesis is actually true in this case!

Type I error rate

A Type II error means not rejecting the null hypothesis when it’s actually false. This is not quite the same as “accepting” the null hypothesis, because hypothesis testing can only tell you whether to reject the null hypothesis.

Instead, a Type II error means failing to conclude there was an effect when there actually was. In reality, your study may not have had enough statistical power to detect an effect of a certain size.

Power is the extent to which a test can correctly detect a real effect when there is one. A power level of 80% or higher is usually considered acceptable.

The risk of a Type II error is inversely related to the statistical power of a study. The higher the statistical power, the lower the probability of making a Type II error.

Statistical power is determined by:

  • Size of the effect : Larger effects are more easily detected.
  • Measurement error : Systematic and random errors in recorded data reduce power.
  • Sample size : Larger samples reduce sampling error and increase power.
  • Significance level : Increasing the significance level increases power.

To (indirectly) reduce the risk of a Type II error, you can increase the sample size or the significance level.

Type II error rate

The alternative hypothesis distribution curve below shows the probabilities of obtaining all possible results if the study were repeated with new samples and the alternative hypothesis were true in the population .

The Type II error rate is beta (β), represented by the shaded area on the left side. The remaining area under the curve represents statistical power, which is 1 – β.

Increasing the statistical power of your test directly decreases the risk of making a Type II error.

Type II error rate

The Type I and Type II error rates influence each other. That’s because the significance level (the Type I error rate) affects statistical power, which is inversely related to the Type II error rate.

This means there’s an important tradeoff between Type I and Type II errors:

  • Setting a lower significance level decreases a Type I error risk, but increases a Type II error risk.
  • Increasing the power of a test decreases a Type II error risk, but increases a Type I error risk.

This trade-off is visualized in the graph below. It shows two curves:

  • The null hypothesis distribution shows all possible results you’d obtain if the null hypothesis is true. The correct conclusion for any point on this distribution means not rejecting the null hypothesis.
  • The alternative hypothesis distribution shows all possible results you’d obtain if the alternative hypothesis is true. The correct conclusion for any point on this distribution means rejecting the null hypothesis.

Type I and Type II errors occur where these two distributions overlap. The blue shaded area represents alpha, the Type I error rate, and the green shaded area represents beta, the Type II error rate.

By setting the Type I error rate, you indirectly influence the size of the Type II error rate as well.

Type I and Type II error

It’s important to strike a balance between the risks of making Type I and Type II errors. Reducing the alpha always comes at the cost of increasing beta, and vice versa .

For statisticians, a Type I error is usually worse. In practical terms, however, either type of error could be worse depending on your research context.

A Type I error means mistakenly going against the main statistical assumption of a null hypothesis. This may lead to new policies, practices or treatments that are inadequate or a waste of resources.

In contrast, a Type II error means failing to reject a null hypothesis. It may only result in missed opportunities to innovate, but these can also have important practical consequences.

In statistics, a Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s actually false.

The risk of making a Type I error is the significance level (or alpha) that you choose. That’s a value that you set at the beginning of your study to assess the statistical probability of obtaining your results ( p value ).

To reduce the Type I error probability, you can set a lower significance level.

The risk of making a Type II error is inversely related to the statistical power of a test. Power is the extent to which a test can correctly detect a real effect when there is one.

To (indirectly) reduce the risk of a Type II error, you can increase the sample size or the significance level to increase statistical power.

Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test . Significance is usually denoted by a p -value , or probability value.

Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis .

When the p -value falls below the chosen alpha value, then we say the result of the test is statistically significant.

In statistics, power refers to the likelihood of a hypothesis test detecting a true effect if there is one. A statistically powerful test is more likely to reject a false negative (a Type II error).

If you don’t ensure enough power in your study, you may not be able to detect a statistically significant result even when it has practical significance. Your study might not have the ability to answer your research question.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Bhandari, P. (2023, February 02). Type I & Type II Errors | Differences, Examples, Visualizations. Scribbr. Retrieved 12 March 2024, from https://www.scribbr.co.uk/stats/type-i-and-type-ii-error/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Hypothesis Testing and Types of Errors

Article info.

  • Statistical Inference
  • Null Hypothesis
  • Bayes Factor
  • Bayesian Inference
  • Sampling and Estimation

Article Versions

  • 14 2021-05-28 15:47:23 2562,2561 14,2562 By arvindpdmn Adding info on base rate and confidence interval.
  • 13 2021-05-28 15:20:55 2561,2560 13,2561 By arvindpdmn Minor edit
  • 12 2021-05-28 15:11:35 2560,2559 12,2560 By arvindpdmn Milestones added. Added question on misconceptions.
  • 11 2021-05-28 13:22:05 2559,2116 11,2559 By arvindpdmn Minor reordering of questions. Merge two questions into one. Added more refs and citations to remove warnings. Other corrections. Milestones are pending.
  • 10 2020-07-15 06:11:06 2116,1611 10,2116 By arvindpdmn Reducing to 8 tags. This is the maximum that will be imposed in the next release. Corrected spelling.
  • Submitting ... You are editing an existing chat message. All Versions 2021-05-28 15:47:23 by arvindpdmn 2021-05-28 15:20:55 by arvindpdmn 2021-05-28 15:11:35 by arvindpdmn 2021-05-28 13:22:05 by arvindpdmn 2020-07-15 06:11:06 by arvindpdmn 2019-09-23 10:56:34 by arvindpdmn 2019-03-13 15:34:32 by arvindpdmn 2018-05-29 05:32:41 by arvindpdmn 2018-05-27 12:14:19 by raam.raam 2018-05-26 11:37:25 by raam.raam 2018-05-21 10:59:24 by arvindpdmn 2018-05-21 10:46:31 by arvindpdmn 2018-05-18 07:27:06 by raam.raam 2018-05-18 06:19:18 by raam.raam All Sections Summary Discussion Sample Code References Milestones Tags See Also Further Reading
  • 2021-05-28 13:12:19 - By raam.raam It is the error rate that is related and not error quantum. The following article puts this nicely. https://www.scribbr.com/statistics/type-i-and-type-ii-errors/#:~:text=Trade%2Doff%20between%20Type%20I%20and%20Type%20II%20errors,-The%20Type%20I&text=This%20means%20there's%20an%20important,a%20Type%20I%20error%20risk.
  • 2021-05-28 12:57:18 - By arvindpdmn I am anyway updating the article to remove warnings. Will edit as part of the update.
  • 2021-05-28 12:36:34 - 1 By raam.raam I will correct the relationship between alpha and beta. They are dependent
  • 2021-05-28 12:05:26 - By raam.raam as a special case when null and alternate hypotheses distributions are overlapping, it may be. But don't think can be generalized. I will look for more references.
  • 2021-05-28 11:30:47 - 1 By arvindpdmn Need to clarify "Errors α and β are independent of each other. Increasing one does not decrease the other.". This ref says otherwise: https://www.afit.edu/stat/statcoe_files/8__The_Logic_of_Statistical_Hypothesis_Testing.pdf

Suppose we want to study income of a population. We study a sample from the population and draw conclusions. The sample should represent the population for our study to be a reliable one.

Null hypothesis \((H_0)\) is that sample represents population. Hypothesis testing provides us with framework to conclude if we have sufficient evidence to either accept or reject null hypothesis.

Population characteristics are either assumed or drawn from third-party sources or judgements by subject matter experts. Population data and sample data are characterised by moments of its distribution (mean, variance, skewness and kurtosis). We test null hypothesis for equality of moments where population characteristic is available and conclude if sample represents population.

For example, given only mean income of population, we validate if mean income of sample is close to population mean to conclude if sample represents the population.

Population and sample parameters. Source: Rolke 2018.

Population mean and population variance are denoted in Greek alphabets \(\mu\) and \(\sigma^2\) respectively, while sample mean and sample variance are denoted in English alphabets \(\bar x\) and \(s^2\) respectively.

Graphical representations of sampling error. Source: Nurse Key 2017, fig. 15-2.

  • If the difference is not significant, we conclude the difference is due to sampling. This is called sampling error and this happens due to chance.
  • If the difference is significant, we conclude the sample does not represent the population. The reason has to be more than chance for difference to be explained.

Hypothesis testing helps us to conclude if the difference is due to sampling error or due to reasons beyond sampling error.

A common assumption is that the observations are independent and come from a random sample. The population distribution must be Normal or the sample size is large enough. If the sample size is large enough, we can invoke the Central Limit Theorem ( CLT ) regardless of the underlying population distribution. Due to CLT , sampling distribution of the sample statistic (such as sample mean) will be approximately a Normal distribution.

A rule of thumb is 30 observations but in some cases even 10 observations may be sufficient to invoke the CLT . Others require at least 50 observations.

When acceptance of \(H_0\) involves boundaries on both sides, we invoke the two-tailed test . For example, if we define \(H_0\) as sample drawn from population with age limits in the range of 25 to 35, then testing of \(H_0\) involves limits on both sides.

Suppose we define the population as greater than age 50, we are interested in rejecting a sample if the age is less than or equal to 50; we are not concerned about any upper limit. Here we invoke the one-tailed test . A one-tailed test could be left-tailed or right-tailed.

  • \(H_0\): \(\mu = \$ 2.62\)
  • \(H_1\) right-tailed: \(\mu > \$ 2.62\)
  • \(H_1\) two-tailed: \(\mu \neq \$ 2.62\)

Matrix showing types of errors in hypothesis testing. Source: howMed 2013.

  • Not accepting that sample represents population when in reality it does. This is called type-I or \(\alpha\) error .
  • Accepting that sample represents population when in reality it does not. This is called type-II or \(\beta\) error .

For instance, granting loan to an applicant with low credit score is \(\alpha\) error. Not granting loan to an applicant with high credit score is (\(\beta\)) error.

The symbols \(\alpha\) and \(\beta\) are used to represent the probability of type-I and type-II errors respectively.

Illustration of p-value in the population distribution. Source: Heard 2015.

The p-value can be interpreted as the probability of getting a result that's same or more extreme when the null hypothesis is true.

The observed sample mean \(\bar x\) is overlaid on population distribution of values with mean \(\mu\) and variance \(\sigma^2\). The proportion of values beyond \(\bar x\) and away from \(\mu\) (either in left tail or in right tail or in both tails) is p-value . If p-value <= \(\alpha\) we reject null hypothesis. The results are said to be statistically significant and not due to chance.

Assuming \(\alpha\)=0.05, p-value > 5%, we conclude the sample is highly likely to be drawn from population with mean \(\mu\) and variance \(\sigma^2\). We accept \((H_0)\). Otherwise, there's insufficient evidence to be part of population and we reject \(H_0\).

We preselect \(\alpha\) based on how much type-I error we're willing to tolerate. \(\alpha\) is called level of significance . The standard for level of significance is 0.05 but in some studies it may be 0.01 or 0.1. In the case of two-tailed tests, it's \(\alpha/2\) on either side.

As sample size increases the margin of error falls. Source: Wikipedia 2018.

Law of Large Numbers suggests larger the sample size, the more accurate the estimate. Accuracy means the variance of estimate will tend towards zero as sample size increases. Sample Size can be determined to suit accepted level of tolerance for deviation.

Confidence interval of sample mean is determined from sample mean offset by variance on either side of the sample mean. If the population variance is known, then we conduct z-test based on Normal distribution. Otherwise, variance has to be estimated and we use t-test based on t-distribution.

The formulae for determining sample size and confidence interval depends on what we to estimate (mean/variance/others), sampling distribution of estimate and standard deviation of estimate's sampling distribution.

Illustrating type-II or beta error. Source: Gordon 2011.

We overlay sample mean's distribution on population distribution, the proportion of overlap of sampling estimate's distribution on population distribution is \(\beta\) error .

Larger the overlap, larger the chance the sample does belong to population with mean \(\mu\) and variance \(\sigma^2\). Incidentally, despite the overlap, p-value may be less than 5%. This happens when sample mean is way off population mean, but the variance of sample mean is such that the overlap is significant.

Understanding alpha and beta errors together. Source: McNeese 2015.

Errors \(\alpha\) and \(\beta\) are dependent on each other. Increasing one decreases the other. Choosing suitable values for these depends on the cost of making these errors. Perhaps it's worse to convict an innocent person (type-I error) than to acquit a guilty person (type-II error), in which case we choose a lower \(\alpha\). But it's possible to decrease both errors but collecting more data.

  • Probability of rejecting the null hypothesis when, in fact, it is false.
  • Probability that a test of significance will pick up on an effect that is present.
  • Probability of avoiding a Type II error.

Low p-value and high power help us decisively conclude sample doesn't belong to population. When we cannot conclude decisively, it's advisable to go for larger samples and multiple samples.

In fact, power is increased by increasing sample size, effect sizes and significance levels. Variance also affects power.

Flowchart showing the classification of 1000 theoretical null hypotheses. Source: Biau et al. 2010, fig. 2.

A common misconception is to consider "p value as the probability that the null hypothesis is true". In fact, p-value is computed under the assumption that the null hypothesis is true. P-value is the probability of observing the values, or more extremes values, if the null hypothesis is true.

Another misconception, sometimes called base rate fallacy , is that under controlled \(\alpha\) and adequate power, statistically significant results correspond to true differences. This is not the case, as shown in the figure. Even with \(\alpha\)=5% and power=80%, 36% of statistically significant p-values will not report the true difference. This is because only 10% of the null hypotheses are false (base rate) and 80% power on these gives only 80 true positives.

P-value doesn't measure the size of the effect, for which confidence interval is a better approach. A drug that gives 25% improvement may not mean much if symptoms are innocuous compared to another drug that gives small improvement from a disease that leads to certain death. Context is therefore important.

Early work on statistical testing. Source: Huberty 1993, table 1.

The field of statistical testing probably starts with John Arbuthnot who applies it to test sex ratios at birth. Subsequently, others in the 18th and 19th centuries use it in other fields. However, modern terminology (null hypothesis, p-value, type-I or type-II errors) is formed only in the 20th century.

Table of P values published by Pearson. Source: Pearson 1900, pp. 175.

Pearson introduces the concept of p-value with the chi-squared test. He gives equations for calculating P and states that it's "the measure of the probability of a complex system of n errors occurring with a frequency as great or greater than that of the observed system."

Ronald A. Fisher develops the concept of p-value and shows how to calculate it in a wide variety of situations. He also notes that a value of 0.05 may be considered as conventional cut-off.

Illustrating both type-I and type-II errors with shaded regions. Source: Neyman and Pearson 1933, fig. 5.

Neyman and Pearson publish On the problem of the most efficient tests of statistical hypotheses . They introduce the notion of alternative hypotheses . They also describe both type-I and type-II errors (although they don't use these terms). They state, "Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behaviour with regard to them, in following which we insure that, in the long run of experience, we shall not be too often wrong."

Johnson's textbook titled Statistical methods in research is perhaps the first to introduce to students the Neyman-Pearson hypothesis testing at a time when most textbooks follow Fisher's significance testing. Johnson uses the terms "error of the first kind" and "error of the second kind". In time, Fisher's approach is called P-value approach and the Neyman-Pearson approach is called fixed-α approach .

Carver makes the following suggestions: use of the term "statistically significant"; interpret results with respect to the data first and statistical significance second; and pay attention to the size of the effect.

  • Biau, David Jean, Brigitte M. Jolles, and Raphaël Porcher. 2010. "P Value and the Theory of Hypothesis Testing: An Explanation for New Researchers." Clin Orthop Relat Res, vol. 468, no. 3, pp. 885–892, March. Accessed 2021-05-28.
  • Carver, Ronald P. 1993. "The Case against Statistical Significance Testing, Revisited." The Journal of Experimental Education, vol. 61, no. 4, Statistical Significance Testing in Contemporary Practice, pp. 287-292. Accessed 2021-05-28.
  • Frankfort-Nachmias, Chava, Anna Leon-Guerrero, and Georgiann Davis. 2020. "Chapter 8: Testing Hypotheses." In: Social Statistics for a Diverse Society, SAGE Publications. Accessed 2021-05-28.
  • Gordon, Max. 2011. "How to best display graphically type II (beta) error, power and sample size?" August 11. Accessed 2018-05-18.
  • Heard, Stephen B. 2015. "In defence of the P-value" Types of Errors. February 9. Updated 2015-12-04. Accessed 2018-05-18.
  • Huberty, Carl J. 1993. "Historical Origins of Statistical Testing Practices: The Treatment of Fisher versus Neyman-Pearson Views in Textbooks." The Journal of Experimental Education, vol. 61, no. 4, Statistical Significance Testing in Contemporary Practice, pp. 317-333. Accessed 2021-05-28.
  • Kensler, Jennifer. 2013. "The Logic of Statistical Hypothesis Testing: Best Practice." Report, STAT T&E Center of Excellence. Accessed 2021-05-28.
  • Klappa, Peter. 2014. "Sampling error and hypothesis testing." On YouTube, December 10. Accessed 2021-05-28.
  • Lane, David M. 2021. "Section 10.8: Confidence Interval on the Mean." In: Introduction to Statistics, Rice University. Accessed 2021-05-28.
  • McNeese, Bill. 2015. "How Many Samples Do I Need?" SPC For Excel, BPI Consulting, June. Accessed 2018-05-18.
  • McNeese, Bill. 2017. "Interpretation of Alpha and p-Value." SPC for Excel, BPI Consulting, April 6. Updated 2020-04-25. Accessed 2021-05-28.
  • Neyman, J., and E. S. Pearson. 1933. "On the problem of the most efficient tests of statistical hypotheses." Philos Trans R Soc Lond A., vol. 231, issue 694-706, pp. 289–337. doi: 10.1098/rsta.1933.0009. Accessed 2021-05-28.
  • Nurse Key. 2017. "Chapter 15: Sampling." February 17. Accessed 2018-05-18.
  • Pearson, Karl. 1900. "On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling." Philosophical Magazine, Series 5 (1876-1900), pp. 157–175. Accessed 2021-05-28.
  • Reinhart, Alex. 2015. "Statistics Done Wrong: The Woefully Complete Guide." No Starch Press.
  • Rolke, Wolfgang A. 2018. "Quantitative Variables." Department of Mathematical Sciences, University of Puerto Rico - Mayaguez. Accessed 2018-05-18.
  • Six-Sigma-Material.com. 2016. "Population & Samples." Six-Sigma-Material.com. Accessed 2018-05-18.
  • Walmsley, Angela and Michael C. Brown. 2017. "What is Power?" Statistics Teacher, American Statistical Association, September 15. Accessed 2021-05-28.
  • Wang, Jing. 2014. "Chapter 4.II: Hypothesis Testing." Applied Statistical Methods II, Univ. of Illinois Chicago. Accessed 2021-05-28.
  • Weigle, David C. 1994. "Historical Origins of Contemporary Statistical Testing Practices: How in the World Did Significance Testing Assume Its Current Place in Contemporary Analytic Practice?" Paper presented at the Annual Meeting of the Southwest Educational Research Association, SanAntonio, TX, January 27. Accessed 2021-05-28.
  • Wikipedia. 2018. "Margin of Error." May 1. Accessed 2018-05-18.
  • Wikipedia. 2021. "Law of large numbers." Wikipedia, March 26. Accessed 2021-05-28.
  • howMed. 2013. "Significance Testing and p value." August 4. Updated 2013-08-08. Accessed 2018-05-18.

Further Reading

  • Foley, Hugh. 2018. "Introduction to Hypothesis Testing." Skidmore College. Accessed 2018-05-18.
  • Buskirk, Trent. 2015. "Sampling Error in Surveys." Accessed 2018-05-18.
  • Zaiontz, Charles. 2014. "Assumptions for Statistical Tests." Real Statistics Using Excel. Accessed 2018-05-18.
  • DeCook, Rhonda. 2018. "Section 9.2: Types of Errors in Hypothesis testing." Stat1010 Notes, Department of Statistics and Actuarial Science, University of Iowa. Accessed 2018-05-18.

Article Stats

Author-wise stats for article edits.

Avatar of user arvindpdmn

  • Browse Articles
  • Community Outreach
  • About Devopedia
  • Author Guidelines
  • FAQ & Help
  • Forgot your password?
  • Create an account

9.2 Outcomes and the Type I and Type II Errors

When you perform a hypothesis test, there are four possible outcomes depending on the actual truth, or falseness, of the null hypothesis H 0 and the decision to reject or not. The outcomes are summarized in the following table:

The four possible outcomes in the table are as follows:

  • The decision is not to reject H 0 when H 0 is true (correct decision).
  • The decision is to reject H 0 when, in fact, H 0 is true (incorrect decision known as a Type I error ).
  • The decision is not to reject H 0 when, in fact, H 0 is false (incorrect decision known as a Type II error ).
  • The decision is to reject H 0 when H 0 is false (correct decision whose probability is called the Power of the Test ).

Each of the errors occurs with a particular probability. The Greek letters α and β represent the probabilities.

α = probability of a Type I error = P (Type I error) = probability of rejecting the null hypothesis when the null hypothesis is true.

β = probability of a Type II error = P (Type II error) = probability of not rejecting the null hypothesis when the null hypothesis is false.

α and β should be as small as possible because they are probabilities of errors. They are rarely zero.

The Power of the Test is 1 – β . Ideally, we want a high power that is as close to one as possible. Increasing the sample size can increase the Power of the Test.

The following are examples of Type I and Type II errors.

Example 9.5

Suppose the null hypothesis, H 0 , is: Frank's rock climbing equipment is safe.

Type I error: Frank does not go rock climbing because he considers that the equipment is not safe, when in fact, the equipment is really safe. Frank is making the mistake of rejecting the null hypothesis, when the equipment is actually safe!

Type II error: Frank goes climbing, thinking that his equipment is safe, but this is a mistake, and he painfully realizes that his equipment is not as safe as it should have been. Frank assumed that the null hypothesis was true, when it was not.

α = probability that Frank thinks his rock climbing equipment may not be safe when, in fact, it really is safe. β = probability that Frank thinks his rock climbing equipment may be safe when, in fact, it is not safe.

Notice that, in this case, the error with the greater consequence is the Type II error. (If Frank thinks his rock climbing equipment is safe, he will go ahead and use it.)

Suppose the null hypothesis, H 0 , is: the blood cultures contain no traces of pathogen X . State the Type I and Type II errors.

Example 9.6

Suppose the null hypothesis, H 0 , is: a tomato plant is alive when a class visits the school garden.

Type I error: The null hypothesis claims that the tomato plant is alive, and it is true, but the students make the mistake of thinking that the plant is already dead.

Type II error: The tomato plant is already dead (the null hypothesis is false), but the students do not notice it, and believe that the tomato plant is alive.

α = probability that the class thinks the tomato plant is dead when, in fact, it is alive = P (Type I error). β = probability that the class thinks the tomato plant is alive when, in fact, it is dead = P (Type II error).

The error with the greater consequence is the Type I error. (If the class thinks the plant is dead, they will not water it.)

Suppose the null hypothesis, H 0 , is: a patient is not sick. Which type of error has the greater consequence, Type I or Type II?

Example 9.7

It’s a Boy Genetic Labs, a genetics company, claims to be able to increase the likelihood that a pregnancy will result in a boy being born. Statisticians want to test the claim. Suppose that the null hypothesis, H 0 , is: It’s a Boy Genetic Labs has no effect on gender outcome.

Type I error : This error results when a true null hypothesis is rejected. In the context of this scenario, we would state that we believe that It’s a Boy Genetic Labs influences the gender outcome, when in fact it has no effect. The probability of this error occurring is denoted by the Greek letter alpha, α .

Type II error : This error results when we fail to reject a false null hypothesis. In context, we would state that It’s a Boy Genetic Labs does not influence the gender outcome of a pregnancy when, in fact, it does. The probability of this error occurring is denoted by the Greek letter beta, β .

The error with the greater consequence would be the Type I error since couples would use the It’s a Boy Genetic Labs product in hopes of increasing the chances of having a boy.

Red tide is a bloom of poison-producing algae—a few different species of a class of plankton called dinoflagellates. When the weather and water conditions cause these blooms, shellfish such as clams living in the area develop dangerous levels of a paralysis-inducing toxin. In Massachusetts, the Division of Marine Fisheries montors levels of the toxin in shellfish by regular sampling of shellfish along the coastline. If the mean level of toxin in clams exceeds 800 μg (micrograms) of toxin per kilogram of clam meat in any area, clam harvesting is banned there until the bloom is over and levels of toxin in clams subside. Describe both a Type I and a Type II error in this context, and state which error has the greater consequence.

Example 9.8

A certain experimental drug claims a cure rate of at least 75 percent for males with a disease. Describe both the Type I and Type II errors in context. Which error is the more serious?

Type I : A patient believes the cure rate for the drug is less than 75 percent when it actually is at least 75 percent.

Type II : A patient believes the experimental drug has at least a 75 percent cure rate when it has a cure rate that is less than 75 percent.

In this scenario, the Type II error contains the more severe consequence. If a patient believes the drug works at least 75 percent of the time, this most likely will influence the patient’s (and doctor’s) choice about whether to use the drug as a treatment option.

Determine both Type I and Type II errors for the following scenario:

Assume a null hypothesis, H 0 , that states the percentage of adults with jobs is at least 88 percent.

Identify the Type I and Type II errors from these four possible choices.

  • Not to reject the null hypothesis that the percentage of adults who have jobs is at least 88 percent when that percentage is actually less than 88 percent
  • Not to reject the null hypothesis that the percentage of adults who have jobs is at least 88 percent when the percentage is actually at least 88 percent
  • Reject the null hypothesis that the percentage of adults who have jobs is at least 88 percent when the percentage is actually at least 88 percent
  • Reject the null hypothesis that the percentage of adults who have jobs is at least 88 percent when that percentage is actually less than 88 percent

As an Amazon Associate we earn from qualifying purchases.

This book may not be used in the training of large language models or otherwise be ingested into large language models or generative AI offerings without OpenStax's permission.

Want to cite, share, or modify this book? This book uses the Creative Commons Attribution License and you must attribute Texas Education Agency (TEA). The original material is available at: https://www.texasgateway.org/book/tea-statistics . Changes were made to the original material, including updates to art, structure, and other content updates.

Access for free at https://openstax.org/books/statistics/pages/1-introduction
  • Authors: Barbara Illowsky, Susan Dean
  • Publisher/website: OpenStax
  • Book title: Statistics
  • Publication date: Mar 27, 2020
  • Location: Houston, Texas
  • Book URL: https://openstax.org/books/statistics/pages/1-introduction
  • Section URL: https://openstax.org/books/statistics/pages/9-2-outcomes-and-the-type-i-and-type-ii-errors

© Jan 23, 2024 Texas Education Agency (TEA). The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the Creative Commons license and may not be reproduced without the prior and express written consent of Rice University.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Prevent plagiarism. Run a free check.

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

types of errors in hypothesis testing with examples

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved March 14, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

9.2: Outcomes, Type I and Type II Errors

  • Last updated
  • Save as PDF
  • Page ID 16267

When you perform a hypothesis test, there are four possible outcomes depending on the actual truth (or falseness) of the null hypothesis  H 0 and the decision to reject or not.

The outcomes are summarized in the following table:

The four possible outcomes in the table are:

  • The decision is  not to reject H 0 when H 0 is true (correct decision) .
  • The decision is to reject H 0 when H 0 is true (incorrect decision known as a Type I error ).
  • The decision is not to reject H 0 when  H 0 is false (incorrect decision known as a Type II error ).
  • The decision is to reject H 0 when H 0 is false ( correct decision whose probability is called the Power of the Test ).

Each of the errors occurs with a particular probability. The Greek letters  α and β represent the probabilities.

α = probability of a Type I error = P (Type I error) = probability of rejecting the null hypothesis when the null hypothesis is true.

β = probability of a Type II error = P (Type II error) = probability of not rejecting the null hypothesis when the null hypothesis is false.

α and β should be as small as possible because they are probabilities of errors. They are rarely zero.

The Power of the Test is 1 –  β . Since β is probability of making type II error, we want this probability to be small. In other words, we want the value 1 –  β  to be as closed to one as possible. Increasing the sample size can increase the Power of the Test.

Suppose the null hypothesis,  H 0 , is: Frank’s rock climbing equipment is safe.

  • Ty pe I error: Frank thinks that his rock climbing equipment may not be safe when, in fact, it really is safe.
  • Type II error: Frank thinks that his rock climbing equipment may be safe when, in fact, it is not safe.

α = Probability that Frank thinks his rock climbing equipment may not be safe when it really is safe. β = Probability that Frank thinks his rock climbing equipment may be safe when it is not safe.

Notice that, in this case, the error with the greater consequence is the Type II error. (If Frank thinks his rock climbing equipment is safe, he will go ahead and use it.)

Suppose the null hypothesis,  H 0 , is: the blood cultures contain no traces of pathogen X . State the Type I and Type II errors.

[practice-area rows=”2″][/practice-area] [reveal-answer q=”313018″]Solution[/reveal-answer] [hidden-answer a=”313018″]

Type I error : The researcher thinks the blood cultures do contain traces of pathogen X, when in fact, they do not. Type II error : The researcher thinks the blood cultures do not contain traces of pathogen X, when in fact, they do.

[/hidden-answer]

Suppose the null hypothesis,  H 0 : The victim of an automobile accident is alive when he arrives at the emergency room of a hospital. State the 4 possible outcomes of performing a hypothesis test.

α = probability that the emergency crew thinks the victim is dead when, in fact, he is really alive = P (Type I error). β = probability that the emergency crew does not know if the victim is alive when, in fact, the victim is dead = P (Type II error).

The error with the greater consequence is the Type I error. (If the emergency crew thinks the victim is dead, they will not treat him.)

Thumbnail for the embedded element "Type 1 errors | Inferential statistics | Probability and Statistics | Khan Academy"

A YouTube element has been excluded from this version of the text. You can view it online here: http://pb.libretexts.org/esm/?p=120

Suppose the null hypothesis, H 0 , is: A patient is not sick. Which type of error has the greater consequence, Type I or Type II?

[practice-area rows=”1=2″][/practice-area] [reveal-answer q=”3933″]Solution[/reveal-answer] [hidden-answer a=”3933″]

Type I Error: The patient will not be thought well when, in fact, he is not sick.

Type II Error: The patient will be thought well when, in fact, he is sick.

The error with the greater consequence is the Type II error: the patient will be thought well when, in fact, he is sick. He will not be able to get treatment.[/hidden-answer]

Boy Genetic Labs claim to be able to increase the likelihood that a pregnancy will result in a boy being born. Statisticians want to test the claim. Suppose that the null hypothesis,  H 0 , is: Boy Genetic Labs has no effect on gender outcome. Which type of error has the greater consequence, Type I or Type II?

H 0 : Boy Genetic Labs has no effect on gender outcome.

H a : Boy Genetic Labs has effect on gender outcome.

  • Type I error: This results when a true null hypothesis is rejected. In the context of this scenario, we would state that we believe that Boy Genetic Labs influences the gender outcome, when in fact it has no effect. The probability of this error occurring is denoted by the Greek letter alpha, α .
  • Type II error: This results when we fail to reject a false null hypothesis. In context, we would state that Boy Genetic Labs does not influence the gender outcome of a pregnancy when, in fact, it does. The probability of this error occurring is denoted by the Greek letter beta, β .

The error of greater consequence would be the Type I error since couples would use the Boy Genetic Labs product in hopes of increasing the chances of having a boy.

“Red tide” is a bloom of poison-producing algae–a few different species of a class of plankton called dinoflagellates. When the weather and water conditions cause these blooms, shellfish such as clams living in the area develop dangerous levels of a paralysis-inducing toxin. In Massachusetts, the Division of Marine Fisheries (DMF) monitors levels of the toxin in shellfish by regular sampling of shellfish along the coastline. If the mean level of toxin in clams exceeds 800 μg (micrograms) of toxin per kg of clam meat in any area, clam harvesting is banned there until the bloom is over and levels of toxin in clams subside. Describe both a Type I and a Type II error in this context, and state which error has the greater consequence.

[reveal-answer q=”432609″]Solution[/reveal-answer] [hidden-answer a=”432609″]

In this scenario, an appropriate null hypothesis would be H 0 : the mean level of toxins is at most 800 μg. ( H 0 : μ 0 ≤ 800 μg ) H a : the mean level of toxins exceeds 800 μg. (H a : μ 0  > 800 μg )

Type I error: The DMF believes that toxin levels are still too high when, in fact, toxin levels are at most 800 μg. The DMF continues the harvesting ban. Type II error: The DMF believes that toxin levels are within acceptable levels (are at least 800 μg) when, in fact, toxin levels are still too high (more than 800 μg). The DMF lifts the harvesting ban.

This error could be the most serious. If the ban is lifted and clams are still toxic, consumers could possibly eat tainted food. In summary, the more dangerous error would be to commit a Type II error, because this error involves the availability of tainted clams for consumption.[/hidden-answer]

A certain experimental drug claims a cure rate of higher than 75% for males with prostate cancer.

Describe both the Type I and Type II errors in context. Which error is the more serious?

H 0 : The cure rate is less than 75%.

H a : The cure rate is higher than 75%.

  • Type I: A cancer patient believes the cure rate for the drug is more than 75% when the cure rate actually is less than 75%.
  • Type II: A cancer patient believes the the cure rate is less than 75% cure rate when the cure rate is actually higher than 75%.

In this scenario, the Type II error contains the more severe consequence. If a patient believes the drug works at least 75% of the time, this most likely will influence the patient’s (and doctor’s) choice about whether to use the drug as a treatment option.

Determine both Type I and Type II errors for the following scenario:

Assume a null hypothesis, H 0 , that states the percentage of adults with jobs is at least 88%.

Identify the Type I and Type II errors from these four statements.

a)Not to reject the null hypothesis that the percentage of adults who have jobs is at least 88% when that percentage is actually less than 88%

b)Not to reject the null hypothesis that the percentage of adults who have jobs is at least 88% when the percentage is actually at least 88%.

c)Reject the null hypothesis that the percentage of adults who have jobs is at least 88% when the percentage is actually at least 88%.

d)Reject the null hypothesis that the percentage of adults who have jobs is at least 88% when that percentage is actually less than 88%.

[reveal-answer q=”76062″]Type I error:[/reveal-answer] [hidden-answer a=”76062″]c [/hidden-answer] [reveal-answer q=”975662″]Type II error:[/reveal-answer] [hidden-answer a=”975662″]b[/hidden-answer]

If H 0 : The percentage of adults with jobs is at least 88%, then H a : The percentage of adults with jobs is less than 88%.

[reveal-answer q=”864484″]Type I error: [/reveal-answer] [hidden-answer a=”864484″]c [/hidden-answer] [reveal-answer q=”126260″]Type II error:[/reveal-answer] [hidden-answer a=”126260″]b[/hidden-answer]

Concept Review

In every hypothesis test, the outcomes are dependent on a correct interpretation of the data. Incorrect calculations or misunderstood summary statistics can yield errors that affect the results. A  Type I error occurs when a true null hypothesis is rejected. A Type II error occurs when a false null hypothesis is not rejected.

The probabilities of these errors are denoted by the Greek letters  α and β , for a Type I and a Type II error respectively. The power of the test, 1 – β , quantifies the likelihood that a test will yield the correct result of a true alternative hypothesis being accepted. A high power is desirable.

Formula Review

  • OpenStax, Statistics, Outcomes and the Type I and Type II Errors. Provided by : OpenStax. Located at : http://cnx.org/contents/[email protected] . License : CC BY: Attribution
  • Introductory Statistics . Authored by : Barbara Illowski, Susan Dean. Provided by : Open Stax. Located at : http://cnx.org/contents/[email protected] . License : CC BY: Attribution . License Terms : Download for free at http://cnx.org/contents/[email protected]
  • Type 1 errors | Inferential statistics | Probability and Statistics | Khan Academy. Authored by : Khan Academy. Located at : https://youtu.be/EowIec7Y8HM . License : All Rights Reserved . License Terms : Standard YouTube License

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • Ind Psychiatry J
  • v.18(2); Jul-Dec 2009

Hypothesis testing, type I and type II errors

Amitav banerjee.

Department of Community Medicine, D. Y. Patil Medical College, Pune, India

U. B. Chitnis

S. l. jadhav, j. s. bhawalkar, s. chaudhury.

1 Department of Psychiatry, RINPAS, Kanke, Ranchi, India

Hypothesis testing is an important activity of empirical research and evidence-based medicine. A well worked up hypothesis is half the answer to the research question. For this, both knowledge of the subject derived from extensive review of the literature and working knowledge of basic statistical concepts are desirable. The present paper discusses the methods of working up a good hypothesis and statistical concepts of hypothesis testing.

Karl Popper is probably the most influential philosopher of science in the 20 th century (Wulff et al ., 1986). Many scientists, even those who do not usually read books on philosophy, are acquainted with the basic principles of his views on science. The popularity of Popper’s philosophy is due partly to the fact that it has been well explained in simple terms by, among others, the Nobel Prize winner Peter Medawar (Medawar, 1969). Popper makes the very important point that empirical scientists (those who stress on observations only as the starting point of research) put the cart in front of the horse when they claim that science proceeds from observation to theory, since there is no such thing as a pure observation which does not depend on theory. Popper states, “… the belief that we can start with pure observation alone, without anything in the nature of a theory, is absurd: As may be illustrated by the story of the man who dedicated his life to natural science, wrote down everything he could observe, and bequeathed his ‘priceless’ collection of observations to the Royal Society to be used as inductive (empirical) evidence.

STARTING POINT OF RESEARCH: HYPOTHESIS OR OBSERVATION?

The first step in the scientific process is not observation but the generation of a hypothesis which may then be tested critically by observations and experiments. Popper also makes the important claim that the goal of the scientist’s efforts is not the verification but the falsification of the initial hypothesis. It is logically impossible to verify the truth of a general law by repeated observations, but, at least in principle, it is possible to falsify such a law by a single observation. Repeated observations of white swans did not prove that all swans are white, but the observation of a single black swan sufficed to falsify that general statement (Popper, 1976).

CHARACTERISTICS OF A GOOD HYPOTHESIS

A good hypothesis must be based on a good research question. It should be simple, specific and stated in advance (Hulley et al ., 2001).

Hypothesis should be simple

A simple hypothesis contains one predictor and one outcome variable, e.g. positive family history of schizophrenia increases the risk of developing the condition in first-degree relatives. Here the single predictor variable is positive family history of schizophrenia and the outcome variable is schizophrenia. A complex hypothesis contains more than one predictor variable or more than one outcome variable, e.g., a positive family history and stressful life events are associated with an increased incidence of Alzheimer’s disease. Here there are 2 predictor variables, i.e., positive family history and stressful life events, while one outcome variable, i.e., Alzheimer’s disease. Complex hypothesis like this cannot be easily tested with a single statistical test and should always be separated into 2 or more simple hypotheses.

Hypothesis should be specific

A specific hypothesis leaves no ambiguity about the subjects and variables, or about how the test of statistical significance will be applied. It uses concise operational definitions that summarize the nature and source of the subjects and the approach to measuring variables (History of medication with tranquilizers, as measured by review of medical store records and physicians’ prescriptions in the past year, is more common in patients who attempted suicides than in controls hospitalized for other conditions). This is a long-winded sentence, but it explicitly states the nature of predictor and outcome variables, how they will be measured and the research hypothesis. Often these details may be included in the study proposal and may not be stated in the research hypothesis. However, they should be clear in the mind of the investigator while conceptualizing the study.

Hypothesis should be stated in advance

The hypothesis must be stated in writing during the proposal state. This will help to keep the research effort focused on the primary objective and create a stronger basis for interpreting the study’s results as compared to a hypothesis that emerges as a result of inspecting the data. The habit of post hoc hypothesis testing (common among researchers) is nothing but using third-degree methods on the data (data dredging), to yield at least something significant. This leads to overrating the occasional chance associations in the study.

TYPES OF HYPOTHESES

For the purpose of testing statistical significance, hypotheses are classified by the way they describe the expected difference between the study groups.

Null and alternative hypotheses

The null hypothesis states that there is no association between the predictor and outcome variables in the population (There is no difference between tranquilizer habits of patients with attempted suicides and those of age- and sex- matched “control” patients hospitalized for other diagnoses). The null hypothesis is the formal basis for testing statistical significance. By starting with the proposition that there is no association, statistical tests can estimate the probability that an observed association could be due to chance.

The proposition that there is an association — that patients with attempted suicides will report different tranquilizer habits from those of the controls — is called the alternative hypothesis. The alternative hypothesis cannot be tested directly; it is accepted by exclusion if the test of statistical significance rejects the null hypothesis.

One- and two-tailed alternative hypotheses

A one-tailed (or one-sided) hypothesis specifies the direction of the association between the predictor and outcome variables. The prediction that patients of attempted suicides will have a higher rate of use of tranquilizers than control patients is a one-tailed hypothesis. A two-tailed hypothesis states only that an association exists; it does not specify the direction. The prediction that patients with attempted suicides will have a different rate of tranquilizer use — either higher or lower than control patients — is a two-tailed hypothesis. (The word tails refers to the tail ends of the statistical distribution such as the familiar bell-shaped normal curve that is used to test a hypothesis. One tail represents a positive effect or association; the other, a negative effect.) A one-tailed hypothesis has the statistical advantage of permitting a smaller sample size as compared to that permissible by a two-tailed hypothesis. Unfortunately, one-tailed hypotheses are not always appropriate; in fact, some investigators believe that they should never be used. However, they are appropriate when only one direction for the association is important or biologically meaningful. An example is the one-sided hypothesis that a drug has a greater frequency of side effects than a placebo; the possibility that the drug has fewer side effects than the placebo is not worth testing. Whatever strategy is used, it should be stated in advance; otherwise, it would lack statistical rigor. Data dredging after it has been collected and post hoc deciding to change over to one-tailed hypothesis testing to reduce the sample size and P value are indicative of lack of scientific integrity.

STATISTICAL PRINCIPLES OF HYPOTHESIS TESTING

A hypothesis (for example, Tamiflu [oseltamivir], drug of choice in H1N1 influenza, is associated with an increased incidence of acute psychotic manifestations) is either true or false in the real world. Because the investigator cannot study all people who are at risk, he must test the hypothesis in a sample of that target population. No matter how many data a researcher collects, he can never absolutely prove (or disprove) his hypothesis. There will always be a need to draw inferences about phenomena in the population from events observed in the sample (Hulley et al ., 2001). In some ways, the investigator’s problem is similar to that faced by a judge judging a defendant [ Table 1 ]. The absolute truth whether the defendant committed the crime cannot be determined. Instead, the judge begins by presuming innocence — the defendant did not commit the crime. The judge must decide whether there is sufficient evidence to reject the presumed innocence of the defendant; the standard is known as beyond a reasonable doubt. A judge can err, however, by convicting a defendant who is innocent, or by failing to convict one who is actually guilty. In similar fashion, the investigator starts by presuming the null hypothesis, or no association between the predictor and outcome variables in the population. Based on the data collected in his sample, the investigator uses statistical tests to determine whether there is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis that there is an association in the population. The standard for these tests is shown as the level of statistical significance.

The analogy between judge’s decisions and statistical tests

TYPE I (ALSO KNOWN AS ‘α’) AND TYPE II (ALSO KNOWN AS ‘β’)ERRORS

Just like a judge’s conclusion, an investigator’s conclusion may be wrong. Sometimes, by chance alone, a sample is not representative of the population. Thus the results in the sample do not reflect reality in the population, and the random error leads to an erroneous inference. A type I error (false-positive) occurs if an investigator rejects a null hypothesis that is actually true in the population; a type II error (false-negative) occurs if the investigator fails to reject a null hypothesis that is actually false in the population. Although type I and type II errors can never be avoided entirely, the investigator can reduce their likelihood by increasing the sample size (the larger the sample, the lesser is the likelihood that it will differ substantially from the population).

False-positive and false-negative results can also occur because of bias (observer, instrument, recall, etc.). (Errors due to bias, however, are not referred to as type I and type II errors.) Such errors are troublesome, since they may be difficult to detect and cannot usually be quantified.

EFFECT SIZE

The likelihood that a study will be able to detect an association between a predictor variable and an outcome variable depends, of course, on the actual magnitude of that association in the target population. If it is large (such as 90% increase in the incidence of psychosis in people who are on Tamiflu), it will be easy to detect in the sample. Conversely, if the size of the association is small (such as 2% increase in psychosis), it will be difficult to detect in the sample. Unfortunately, the investigator often does not know the actual magnitude of the association — one of the purposes of the study is to estimate it. Instead, the investigator must choose the size of the association that he would like to be able to detect in the sample. This quantity is known as the effect size. Selecting an appropriate effect size is the most difficult aspect of sample size planning. Sometimes, the investigator can use data from other studies or pilot tests to make an informed guess about a reasonable effect size. When there are no data with which to estimate it, he can choose the smallest effect size that would be clinically meaningful, for example, a 10% increase in the incidence of psychosis. Of course, from the public health point of view, even a 1% increase in psychosis incidence would be important. Thus the choice of the effect size is always somewhat arbitrary, and considerations of feasibility are often paramount. When the number of available subjects is limited, the investigator may have to work backward to determine whether the effect size that his study will be able to detect with that number of subjects is reasonable.

α,β,AND POWER

After a study is completed, the investigator uses statistical tests to try to reject the null hypothesis in favor of its alternative (much in the same way that a prosecuting attorney tries to convince a judge to reject innocence in favor of guilt). Depending on whether the null hypothesis is true or false in the target population, and assuming that the study is free of bias, 4 situations are possible, as shown in Table 2 below. In 2 of these, the findings in the sample and reality in the population are concordant, and the investigator’s inference will be correct. In the other 2 situations, either a type I (α) or a type II (β) error has been made, and the inference will be incorrect.

Truth in the population versus the results in the study sample: The four possibilities

The investigator establishes the maximum chance of making type I and type II errors in advance of the study. The probability of committing a type I error (rejecting the null hypothesis when it is actually true) is called α (alpha) the other name for this is the level of statistical significance.

If a study of Tamiflu and psychosis is designed with α = 0.05, for example, then the investigator has set 5% as the maximum chance of incorrectly rejecting the null hypothesis (and erroneously inferring that use of Tamiflu and psychosis incidence are associated in the population). This is the level of reasonable doubt that the investigator is willing to accept when he uses statistical tests to analyze the data after the study is completed.

The probability of making a type II error (failing to reject the null hypothesis when it is actually false) is called β (beta). The quantity (1 - β) is called power, the probability of observing an effect in the sample (if one), of a specified effect size or greater exists in the population.

If β is set at 0.10, then the investigator has decided that he is willing to accept a 10% chance of missing an association of a given effect size between Tamiflu and psychosis. This represents a power of 0.90, i.e., a 90% chance of finding an association of that size. For example, suppose that there really would be a 30% increase in psychosis incidence if the entire population took Tamiflu. Then 90 times out of 100, the investigator would observe an effect of that size or larger in his study. This does not mean, however, that the investigator will be absolutely unable to detect a smaller effect; just that he will have less than 90% likelihood of doing so.

Ideally alpha and beta errors would be set at zero, eliminating the possibility of false-positive and false-negative results. In practice they are made as small as possible. Reducing them, however, usually requires increasing the sample size. Sample size planning aims at choosing a sufficient number of subjects to keep alpha and beta at acceptably low levels without making the study unnecessarily expensive or difficult.

Many studies s et al pha at 0.05 and beta at 0.20 (a power of 0.80). These are somewhat arbitrary values, and others are sometimes used; the conventional range for alpha is between 0.01 and 0.10; and for beta, between 0.05 and 0.20. In general the investigator should choose a low value of alpha when the research question makes it particularly important to avoid a type I (false-positive) error, and he should choose a low value of beta when it is especially important to avoid a type II error.

The null hypothesis acts like a punching bag: It is assumed to be true in order to shadowbox it into false with a statistical test. When the data are analyzed, such tests determine the P value, the probability of obtaining the study results by chance if the null hypothesis is true. The null hypothesis is rejected in favor of the alternative hypothesis if the P value is less than alpha, the predetermined level of statistical significance (Daniel, 2000). “Nonsignificant” results — those with P value greater than alpha — do not imply that there is no association in the population; they only mean that the association observed in the sample is small compared with what could have occurred by chance alone. For example, an investigator might find that men with family history of mental illness were twice as likely to develop schizophrenia as those with no family history, but with a P value of 0.09. This means that even if family history and schizophrenia were not associated in the population, there was a 9% chance of finding such an association due to random error in the sample. If the investigator had set the significance level at 0.05, he would have to conclude that the association in the sample was “not statistically significant.” It might be tempting for the investigator to change his mind about the level of statistical significance ex post facto and report the results “showed statistical significance at P < 10”. A better choice would be to report that the “results, although suggestive of an association, did not achieve statistical significance ( P = .09)”. This solution acknowledges that statistical significance is not an “all or none” situation.

Hypothesis testing is the sheet anchor of empirical research and in the rapidly emerging practice of evidence-based medicine. However, empirical research and, ipso facto, hypothesis testing have their limits. The empirical approach to research cannot eliminate uncertainty completely. At the best, it can quantify uncertainty. This uncertainty can be of 2 types: Type I error (falsely rejecting a null hypothesis) and type II error (falsely accepting a null hypothesis). The acceptable magnitudes of type I and type II errors are set in advance and are important for sample size calculations. Another important point to remember is that we cannot ‘prove’ or ‘disprove’ anything by hypothesis testing and statistical tests. We can only knock down or reject the null hypothesis and by default accept the alternative hypothesis. If we fail to reject the null hypothesis, we accept it by default.

Source of Support: Nil

Conflict of Interest: None declared.

  • Daniel W. W. In: Biostatistics. 7th ed. New York: John Wiley and Sons, Inc; 2002. Hypothesis testing; pp. 204–294. [ Google Scholar ]
  • Hulley S. B, Cummings S. R, Browner W. S, Grady D, Hearst N, Newman T. B. 2nd ed. Philadelphia: Lippincott Williams and Wilkins; 2001. Getting ready to estimate sample size: Hypothesis and underlying principles In: Designing Clinical Research-An epidemiologic approach; pp. 51–63. [ Google Scholar ]
  • Medawar P. B. Philadelphia: American Philosophical Society; 1969. Induction and intuition in scientific thought. [ Google Scholar ]
  • Popper K. Unended Quest. An Intellectual Autobiography. Fontana Collins; p. 42. [ Google Scholar ]
  • Wulff H. R, Pedersen S. A, Rosenberg R. Oxford: Blackwell Scientific Publicatons; Empirism and Realism: A philosophical problem. In: Philosophy of Medicine. [ Google Scholar ]
  • Prompt Library
  • DS/AI Trends
  • Stats Tools
  • Interview Questions
  • Generative AI
  • Machine Learning
  • Deep Learning

Type I & Type II Errors in Hypothesis Testing: Examples

types of errors in hypothesis testing with examples

Table of Contents

What is a Type I Error?

When doing hypothesis testing, one ends up incorrectly rejecting the null hypothesis (default state of being) when in reality it holds true. The probability of rejecting a null hypothesis when it actually holds good is called as Type I error . Generally, a higher Type I error triggers eyebrows because this indicates that there is evidence against the default state of being. This essentially means that unexpected outcomes or alternate hypotheses can be true. Thus, it is recommended that one should aim to keep Type I errors as small as possible. Type I error is also called as “ false positive “.

Lets try and understand type I error with the help of person held guilty or otherwise given the fact that he is innocent. The claim made or the hypothesis is that the person has committed a crime or is guilty. The null hypothesis will be that the person is not guilty or innocent. Based on the evidence gathered, the null hypothesis that the person is not guilty gets rejected. This means that the person is held guilty. However, the rejection of null hypothesis is false. This means that the person is held guilty although he/she was not guilty. In other words, the innocent person is convicted. This is an example of Type I error.

In order to achieve the lower Type I error, the hypothesis testing assigns a fairly small value to the significance level. Common values for significance level are 0.05 and 0.01, although, on average scenarios, 0.05 is used. Mathematically speaking, if the significance level is set to be 0.05, it is acceptable/OK to falsely or incorrectly reject the Null Hypothesis for 5% of the time.

Type I Error & House on Fire

Whether house is on fire?

Whether the house is on fire?

Type I Error & Covid-19 Diagnosis

covid-19 Type I Type II Error

What is a Type II Error?

Type ii error & house on fire, type ii error & covid-19 diagnosis.

In the case of Covid-19 example, if the person having a breathing problem fails to reject the Null hypothesis , and does not go for Covid-19 diagnostic tests when he/she should actually have rejected it. This may prove fatal to life in case the person is actually suffering from Covid-19. Type II errors can turn out to be very fatal and expensive.

Type I Error & Type II Error Explained with Diagram

Type I Error Type II Error

Given the diagram above, one could observe the following two scenarios:

  • Type I Error : When one rejects the Null Hypothesis (H0 – Default state of being) given that H0 is true, one commits a Type I error. It can also be termed as false positive.
  • Type II Error : When one fails to reject the Null hypothesis when it is actually false or does not hold good, one commits a Type II error. It can also be termed as a false negative.
  • In other cases when one rejects the Null Hypothesis when it is false or not true, and when fails to reject the Null hypothesis when it is true is the correct decision .

Type I Error & Type II Error: Trade-off

Ideally it is desired that both the Type I and Type II error rates should remain small. But in practice, this is extermely hard to achieve. There typically is a trade-off. The Type I error can be made small by only rejecting H0 if we are quite sure that it doesn’t hold. This would mean a very small value of significance level such as 0.01. However, this will result in an increase in the Type II error. Alternatively, The Type II error can be made small by rejecting H0 in the presence of even modest evidence that it does not hold. This can be obtained by having slightly higher value of significance level ssuch as 0.1. This will, however, cause the Type I error to be large. In practice, we typically view Type I errors as “bad” or “not good” than Type II errors, because the former involves declaring a scientific finding that is not correct. Hence, when the hypothesis testing is performed, What is desired is typically a low Type I error rate — e.g., at most α = 0.05,  while trying to make the Type II error small (or, equivalently, the power large).

Understanding the difference between Type I and Type II errors can help you make more informed decisions about how to use statistics in your research. If you are looking for some resources on how to integrate these concepts into your own work, reach out to us. We would be happy to provide additional training or answer any questions that may arise!

Recent Posts

Ajitesh Kumar

  • OKRs vs KPIs vs KRAs: Differences and Examples - February 21, 2024
  • CEP vs Traditional Database Examples - February 2, 2024
  • Retrieval Augmented Generation (RAG) & LLM: Examples - February 1, 2024

Ajitesh Kumar

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

  • Search for:
  • Excellence Awaits: IITs, NITs & IIITs Journey

ChatGPT Prompts (250+)

  • Generate Design Ideas for App
  • Expand Feature Set of App
  • Create a User Journey Map for App
  • Generate Visual Design Ideas for App
  • Generate a List of Competitors for App
  • OKRs vs KPIs vs KRAs: Differences and Examples
  • CEP vs Traditional Database Examples
  • Retrieval Augmented Generation (RAG) & LLM: Examples
  • Attention Mechanism in Transformers: Examples
  • NLP Tokenization in Machine Learning: Python Examples

Data Science / AI Trends

  • • Prepend any arxiv.org link with talk2 to load the paper into a responsive chat application
  • • Custom LLM and AI Agents (RAG) On Structured + Unstructured Data - AI Brain For Your Organization
  • • Guides, papers, lecture, notebooks and resources for prompt engineering
  • • Common tricks to make LLMs efficient and stable
  • • Machine learning in finance

Free Online Tools

  • Create Scatter Plots Online for your Excel Data
  • Histogram / Frequency Distribution Creation Tool
  • Online Pie Chart Maker Tool
  • Z-test vs T-test Decision Tool
  • Independent samples t-test calculator

Recent Comments

Very Nice Explaination. Thankyiu very much,

in your case E respresent Member or Oraganization which include on e or more peers?

Such a informative post. Keep it up

Thank you....for your support. you given a good solution for me.

Hi Ajitesh, I came across this article while doing research for my university project where I am meant Define a…

What Is Hypothesis Testing? Types and Python Code Example

Curiosity has always been a part of human nature. Since the beginning of time, this has been one of the most important tools for birthing civilizations. Still, our curiosity grows — it tests and expands our limits. Humanity has explored the plains of land, water, and air. We've built underwater habitats where we could live for weeks. Our civilization has explored various planets. We've explored land to an unlimited degree.

These things were possible because humans asked questions and searched until they found answers. However, for us to get these answers, a proven method must be used and followed through to validate our results. Historically, philosophers assumed the earth was flat and you would fall off when you reached the edge. While philosophers like Aristotle argued that the earth was spherical based on the formation of the stars, they could not prove it at the time.

This is because they didn't have adequate resources to explore space or mathematically prove Earth's shape. It was a Greek mathematician named Eratosthenes who calculated the earth's circumference with incredible precision. He used scientific methods to show that the Earth was not flat. Since then, other methods have been used to prove the Earth's spherical shape.

When there are questions or statements that are yet to be tested and confirmed based on some scientific method, they are called hypotheses. Basically, we have two types of hypotheses: null and alternate.

A null hypothesis is one's default belief or argument about a subject matter. In the case of the earth's shape, the null hypothesis was that the earth was flat.

An alternate hypothesis is a belief or argument a person might try to establish. Aristotle and Eratosthenes argued that the earth was spherical.

Other examples of a random alternate hypothesis include:

  • The weather may have an impact on a person's mood.
  • More people wear suits on Mondays compared to other days of the week.
  • Children are more likely to be brilliant if both parents are in academia, and so on.

What is Hypothesis Testing?

Hypothesis testing is the act of testing whether a hypothesis or inference is true. When an alternate hypothesis is introduced, we test it against the null hypothesis to know which is correct. Let's use a plant experiment by a 12-year-old student to see how this works.

The hypothesis is that a plant will grow taller when given a certain type of fertilizer. The student takes two samples of the same plant, fertilizes one, and leaves the other unfertilized. He measures the plants' height every few days and records the results in a table.

After a week or two, he compares the final height of both plants to see which grew taller. If the plant given fertilizer grew taller, the hypothesis is established as fact. If not, the hypothesis is not supported. This simple experiment shows how to form a hypothesis, test it experimentally, and analyze the results.

In hypothesis testing, there are two types of error: Type I and Type II.

When we reject the null hypothesis in a case where it is correct, we've committed a Type I error. Type II errors occur when we fail to reject the null hypothesis when it is incorrect.

In our plant experiment above, if the student finds out that both plants' heights are the same at the end of the test period yet opines that fertilizer helps with plant growth, he has committed a Type I error.

However, if the fertilized plant comes out taller and the student records that both plants are the same or that the one without fertilizer grew taller, he has committed a Type II error because he has failed to reject the null hypothesis.

What are the Steps in Hypothesis Testing?

The following steps explain how we can test a hypothesis:

Step #1 - Define the Null and Alternative Hypotheses

Before making any test, we must first define what we are testing and what the default assumption is about the subject. In this article, we'll be testing if the average weight of 10-year-old children is more than 32kg.

Our null hypothesis is that 10 year old children weigh 32 kg on average. Our alternate hypothesis is that the average weight is more than 32kg. Ho denotes a null hypothesis, while H1 denotes an alternate hypothesis.

Step #2 - Choose a Significance Level

The significance level is a threshold for determining if the test is valid. It gives credibility to our hypothesis test to ensure we are not just luck-dependent but have enough evidence to support our claims. We usually set our significance level before conducting our tests. The criterion for determining our significance value is known as p-value.

A lower p-value means that there is stronger evidence against the null hypothesis, and therefore, a greater degree of significance. A p-value of 0.05 is widely accepted to be significant in most fields of science. P-values do not denote the probability of the outcome of the result, they just serve as a benchmark for determining whether our test result is due to chance. For our test, our p-value will be 0.05.

Step #3 - Collect Data and Calculate a Test Statistic

You can obtain your data from online data stores or conduct your research directly. Data can be scraped or researched online. The methodology might depend on the research you are trying to conduct.

We can calculate our test using any of the appropriate hypothesis tests. This can be a T-test, Z-test, Chi-squared, and so on. There are several hypothesis tests, each suiting different purposes and research questions. In this article, we'll use the T-test to run our hypothesis, but I'll explain the Z-test, and chi-squared too.

T-test is used for comparison of two sets of data when we don't know the population standard deviation. It's a parametric test, meaning it makes assumptions about the distribution of the data. These assumptions include that the data is normally distributed and that the variances of the two groups are equal. In a more simple and practical sense, imagine that we have test scores in a class for males and females, but we don't know how different or similar these scores are. We can use a t-test to see if there's a real difference.

The Z-test is used for comparison between two sets of data when the population standard deviation is known. It is also a parametric test, but it makes fewer assumptions about the distribution of data. The z-test assumes that the data is normally distributed, but it does not assume that the variances of the two groups are equal. In our class test example, with the t-test, we can say that if we already know how spread out the scores are in both groups, we can now use the z-test to see if there's a difference in the average scores.

The Chi-squared test is used to compare two or more categorical variables. The chi-squared test is a non-parametric test, meaning it does not make any assumptions about the distribution of data. It can be used to test a variety of hypotheses, including whether two or more groups have equal proportions.

Step #4 - Decide on the Null Hypothesis Based on the Test Statistic and Significance Level

After conducting our test and calculating the test statistic, we can compare its value to the predetermined significance level. If the test statistic falls beyond the significance level, we can decide to reject the null hypothesis, indicating that there is sufficient evidence to support our alternative hypothesis.

On the other contrary, if the test statistic does not exceed the significance level, we fail to reject the null hypothesis, signifying that we do not have enough statistical evidence to conclude in favor of the alternative hypothesis.

Step #5 - Interpret the Results

Depending on the decision made in the previous step, we can interpret the result in the context of our study and the practical implications. For our case study, we can interpret whether we have significant evidence to support our claim that the average weight of 10 year old children is more than 32kg or not.

For our test, we are generating random dummy data for the weight of the children. We'll use a t-test to evaluate whether our hypothesis is correct or not.

For a better understanding, let's look at what each block of code does.

The first block is the import statement, where we import numpy and scipy.stats . Numpy is a Python library used for scientific computing. It has a large library of functions for working with arrays. Scipy is a library for mathematical functions. It has a stat module for performing statistical functions, and that's what we'll be using for our t-test.

The weights of the children were generated at random since we aren't working with an actual dataset. The random module within the Numpy library provides a function for generating random numbers, which is randint .

The randint function takes three arguments. The first (20) is the lower bound of the random numbers to be generated. The second (40) is the upper bound, and the third (100) specifies the number of random integers to generate. That is, we are generating random weight values for 100 children. In real circumstances, these weight samples would have been obtained by taking the weight of the required number of children needed for the test.

Using the code above, we declared our null and alternate hypotheses stating the average weight of a 10-year-old in both cases.

t_stat and p_value are the variables in which we'll store the results of our functions. stats.ttest_1samp is the function that calculates our test. It takes in two variables, the first is the data variable that stores the array of weights for children, and the second (32) is the value against which we'll test the mean of our array of weights or dataset in cases where we are using a real-world dataset.

The code above prints both values for t_stats and p_value .

Lastly, we evaluated our p_value against our significance value, which is 0.05. If our p_value is less than 0.05, we reject the null hypothesis. Otherwise, we fail to reject the null hypothesis. Below is the output of this program. Our null hypothesis was rejected.

In this article, we discussed the importance of hypothesis testing. We highlighted how science has advanced human knowledge and civilization through formulating and testing hypotheses.

We discussed Type I and Type II errors in hypothesis testing and how they underscore the importance of careful consideration and analysis in scientific inquiry. It reinforces the idea that conclusions should be drawn based on thorough statistical analysis rather than assumptions or biases.

We also generated a sample dataset using the relevant Python libraries and used the needed functions to calculate and test our alternate hypothesis.

Thank you for reading! Please follow me on LinkedIn where I also post more data related content.

Technical support engineer with 4 years of experience & 6 months in data analytics. Passionate about data science, programming, & statistics.

If you read this far, thank the author to show them you care. Say Thanks

Learn to code for free. freeCodeCamp's open source curriculum has helped more than 40,000 people get jobs as developers. Get started

Hypothesis Testing

Hypothesis testing is a tool for making statistical inferences about the population data. It is an analysis tool that tests assumptions and determines how likely something is within a given standard of accuracy. Hypothesis testing provides a way to verify whether the results of an experiment are valid.

A null hypothesis and an alternative hypothesis are set up before performing the hypothesis testing. This helps to arrive at a conclusion regarding the sample obtained from the population. In this article, we will learn more about hypothesis testing, its types, steps to perform the testing, and associated examples.

What is Hypothesis Testing in Statistics?

Hypothesis testing uses sample data from the population to draw useful conclusions regarding the population probability distribution . It tests an assumption made about the data using different types of hypothesis testing methodologies. The hypothesis testing results in either rejecting or not rejecting the null hypothesis.

Hypothesis Testing Definition

Hypothesis testing can be defined as a statistical tool that is used to identify if the results of an experiment are meaningful or not. It involves setting up a null hypothesis and an alternative hypothesis. These two hypotheses will always be mutually exclusive. This means that if the null hypothesis is true then the alternative hypothesis is false and vice versa. An example of hypothesis testing is setting up a test to check if a new medicine works on a disease in a more efficient manner.

Null Hypothesis

The null hypothesis is a concise mathematical statement that is used to indicate that there is no difference between two possibilities. In other words, there is no difference between certain characteristics of data. This hypothesis assumes that the outcomes of an experiment are based on chance alone. It is denoted as \(H_{0}\). Hypothesis testing is used to conclude if the null hypothesis can be rejected or not. Suppose an experiment is conducted to check if girls are shorter than boys at the age of 5. The null hypothesis will say that they are the same height.

Alternative Hypothesis

The alternative hypothesis is an alternative to the null hypothesis. It is used to show that the observations of an experiment are due to some real effect. It indicates that there is a statistical significance between two possible outcomes and can be denoted as \(H_{1}\) or \(H_{a}\). For the above-mentioned example, the alternative hypothesis would be that girls are shorter than boys at the age of 5.

Hypothesis Testing P Value

In hypothesis testing, the p value is used to indicate whether the results obtained after conducting a test are statistically significant or not. It also indicates the probability of making an error in rejecting or not rejecting the null hypothesis.This value is always a number between 0 and 1. The p value is compared to an alpha level, \(\alpha\) or significance level. The alpha level can be defined as the acceptable risk of incorrectly rejecting the null hypothesis. The alpha level is usually chosen between 1% to 5%.

Hypothesis Testing Critical region

All sets of values that lead to rejecting the null hypothesis lie in the critical region. Furthermore, the value that separates the critical region from the non-critical region is known as the critical value.

Hypothesis Testing Formula

Depending upon the type of data available and the size, different types of hypothesis testing are used to determine whether the null hypothesis can be rejected or not. The hypothesis testing formula for some important test statistics are given below:

  • z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\). \(\overline{x}\) is the sample mean, \(\mu\) is the population mean, \(\sigma\) is the population standard deviation and n is the size of the sample.
  • t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\). s is the sample standard deviation.
  • \(\chi ^{2} = \sum \frac{(O_{i}-E_{i})^{2}}{E_{i}}\). \(O_{i}\) is the observed value and \(E_{i}\) is the expected value.

We will learn more about these test statistics in the upcoming section.

Types of Hypothesis Testing

Selecting the correct test for performing hypothesis testing can be confusing. These tests are used to determine a test statistic on the basis of which the null hypothesis can either be rejected or not rejected. Some of the important tests used for hypothesis testing are given below.

Hypothesis Testing Z Test

A z test is a way of hypothesis testing that is used for a large sample size (n ≥ 30). It is used to determine whether there is a difference between the population mean and the sample mean when the population standard deviation is known. It can also be used to compare the mean of two samples. It is used to compute the z test statistic. The formulas are given as follows:

  • One sample: z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\).
  • Two samples: z = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\).

Hypothesis Testing t Test

The t test is another method of hypothesis testing that is used for a small sample size (n < 30). It is also used to compare the sample mean and population mean. However, the population standard deviation is not known. Instead, the sample standard deviation is known. The mean of two samples can also be compared using the t test.

  • One sample: t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\).
  • Two samples: t = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}}\).

Hypothesis Testing Chi Square

The Chi square test is a hypothesis testing method that is used to check whether the variables in a population are independent or not. It is used when the test statistic is chi-squared distributed.

One Tailed Hypothesis Testing

One tailed hypothesis testing is done when the rejection region is only in one direction. It can also be known as directional hypothesis testing because the effects can be tested in one direction only. This type of testing is further classified into the right tailed test and left tailed test.

Right Tailed Hypothesis Testing

The right tail test is also known as the upper tail test. This test is used to check whether the population parameter is greater than some value. The null and alternative hypotheses for this test are given as follows:

\(H_{0}\): The population parameter is ≤ some value

\(H_{1}\): The population parameter is > some value.

If the test statistic has a greater value than the critical value then the null hypothesis is rejected

Right Tail Hypothesis Testing

Left Tailed Hypothesis Testing

The left tail test is also known as the lower tail test. It is used to check whether the population parameter is less than some value. The hypotheses for this hypothesis testing can be written as follows:

\(H_{0}\): The population parameter is ≥ some value

\(H_{1}\): The population parameter is < some value.

The null hypothesis is rejected if the test statistic has a value lesser than the critical value.

Left Tail Hypothesis Testing

Two Tailed Hypothesis Testing

In this hypothesis testing method, the critical region lies on both sides of the sampling distribution. It is also known as a non - directional hypothesis testing method. The two-tailed test is used when it needs to be determined if the population parameter is assumed to be different than some value. The hypotheses can be set up as follows:

\(H_{0}\): the population parameter = some value

\(H_{1}\): the population parameter ≠ some value

The null hypothesis is rejected if the test statistic has a value that is not equal to the critical value.

Two Tail Hypothesis Testing

Hypothesis Testing Steps

Hypothesis testing can be easily performed in five simple steps. The most important step is to correctly set up the hypotheses and identify the right method for hypothesis testing. The basic steps to perform hypothesis testing are as follows:

  • Step 1: Set up the null hypothesis by correctly identifying whether it is the left-tailed, right-tailed, or two-tailed hypothesis testing.
  • Step 2: Set up the alternative hypothesis.
  • Step 3: Choose the correct significance level, \(\alpha\), and find the critical value.
  • Step 4: Calculate the correct test statistic (z, t or \(\chi\)) and p-value.
  • Step 5: Compare the test statistic with the critical value or compare the p-value with \(\alpha\) to arrive at a conclusion. In other words, decide if the null hypothesis is to be rejected or not.

Hypothesis Testing Example

The best way to solve a problem on hypothesis testing is by applying the 5 steps mentioned in the previous section. Suppose a researcher claims that the mean average weight of men is greater than 100kgs with a standard deviation of 15kgs. 30 men are chosen with an average weight of 112.5 Kgs. Using hypothesis testing, check if there is enough evidence to support the researcher's claim. The confidence interval is given as 95%.

Step 1: This is an example of a right-tailed test. Set up the null hypothesis as \(H_{0}\): \(\mu\) = 100.

Step 2: The alternative hypothesis is given by \(H_{1}\): \(\mu\) > 100.

Step 3: As this is a one-tailed test, \(\alpha\) = 100% - 95% = 5%. This can be used to determine the critical value.

1 - \(\alpha\) = 1 - 0.05 = 0.95

0.95 gives the required area under the curve. Now using a normal distribution table, the area 0.95 is at z = 1.645. A similar process can be followed for a t-test. The only additional requirement is to calculate the degrees of freedom given by n - 1.

Step 4: Calculate the z test statistic. This is because the sample size is 30. Furthermore, the sample and population means are known along with the standard deviation.

z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\).

\(\mu\) = 100, \(\overline{x}\) = 112.5, n = 30, \(\sigma\) = 15

z = \(\frac{112.5-100}{\frac{15}{\sqrt{30}}}\) = 4.56

Step 5: Conclusion. As 4.56 > 1.645 thus, the null hypothesis can be rejected.

Hypothesis Testing and Confidence Intervals

Confidence intervals form an important part of hypothesis testing. This is because the alpha level can be determined from a given confidence interval. Suppose a confidence interval is given as 95%. Subtract the confidence interval from 100%. This gives 100 - 95 = 5% or 0.05. This is the alpha value of a one-tailed hypothesis testing. To obtain the alpha value for a two-tailed hypothesis testing, divide this value by 2. This gives 0.05 / 2 = 0.025.

Related Articles:

  • Probability and Statistics
  • Data Handling

Important Notes on Hypothesis Testing

  • Hypothesis testing is a technique that is used to verify whether the results of an experiment are statistically significant.
  • It involves the setting up of a null hypothesis and an alternate hypothesis.
  • There are three types of tests that can be conducted under hypothesis testing - z test, t test, and chi square test.
  • Hypothesis testing can be classified as right tail, left tail, and two tail tests.

Examples on Hypothesis Testing

  • Example 1: The average weight of a dumbbell in a gym is 90lbs. However, a physical trainer believes that the average weight might be higher. A random sample of 5 dumbbells with an average weight of 110lbs and a standard deviation of 18lbs. Using hypothesis testing check if the physical trainer's claim can be supported for a 95% confidence level. Solution: As the sample size is lesser than 30, the t-test is used. \(H_{0}\): \(\mu\) = 90, \(H_{1}\): \(\mu\) > 90 \(\overline{x}\) = 110, \(\mu\) = 90, n = 5, s = 18. \(\alpha\) = 0.05 Using the t-distribution table, the critical value is 2.132 t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\) t = 2.484 As 2.484 > 2.132, the null hypothesis is rejected. Answer: The average weight of the dumbbells may be greater than 90lbs
  • Example 2: The average score on a test is 80 with a standard deviation of 10. With a new teaching curriculum introduced it is believed that this score will change. On random testing, the score of 38 students, the mean was found to be 88. With a 0.05 significance level, is there any evidence to support this claim? Solution: This is an example of two-tail hypothesis testing. The z test will be used. \(H_{0}\): \(\mu\) = 80, \(H_{1}\): \(\mu\) ≠ 80 \(\overline{x}\) = 88, \(\mu\) = 80, n = 36, \(\sigma\) = 10. \(\alpha\) = 0.05 / 2 = 0.025 The critical value using the normal distribution table is 1.96 z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\) z = \(\frac{88-80}{\frac{10}{\sqrt{36}}}\) = 4.8 As 4.8 > 1.96, the null hypothesis is rejected. Answer: There is a difference in the scores after the new curriculum was introduced.
  • Example 3: The average score of a class is 90. However, a teacher believes that the average score might be lower. The scores of 6 students were randomly measured. The mean was 82 with a standard deviation of 18. With a 0.05 significance level use hypothesis testing to check if this claim is true. Solution: The t test will be used. \(H_{0}\): \(\mu\) = 90, \(H_{1}\): \(\mu\) < 90 \(\overline{x}\) = 110, \(\mu\) = 90, n = 6, s = 18 The critical value from the t table is -2.015 t = \(\frac{\overline{x}-\mu}{\frac{s}{\sqrt{n}}}\) t = \(\frac{82-90}{\frac{18}{\sqrt{6}}}\) t = -1.088 As -1.088 > -2.015, we fail to reject the null hypothesis. Answer: There is not enough evidence to support the claim.

go to slide go to slide go to slide

types of errors in hypothesis testing with examples

Book a Free Trial Class

FAQs on Hypothesis Testing

What is hypothesis testing.

Hypothesis testing in statistics is a tool that is used to make inferences about the population data. It is also used to check if the results of an experiment are valid.

What is the z Test in Hypothesis Testing?

The z test in hypothesis testing is used to find the z test statistic for normally distributed data . The z test is used when the standard deviation of the population is known and the sample size is greater than or equal to 30.

What is the t Test in Hypothesis Testing?

The t test in hypothesis testing is used when the data follows a student t distribution . It is used when the sample size is less than 30 and standard deviation of the population is not known.

What is the formula for z test in Hypothesis Testing?

The formula for a one sample z test in hypothesis testing is z = \(\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}\) and for two samples is z = \(\frac{(\overline{x_{1}}-\overline{x_{2}})-(\mu_{1}-\mu_{2})}{\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}}\).

What is the p Value in Hypothesis Testing?

The p value helps to determine if the test results are statistically significant or not. In hypothesis testing, the null hypothesis can either be rejected or not rejected based on the comparison between the p value and the alpha level.

What is One Tail Hypothesis Testing?

When the rejection region is only on one side of the distribution curve then it is known as one tail hypothesis testing. The right tail test and the left tail test are two types of directional hypothesis testing.

What is the Alpha Level in Two Tail Hypothesis Testing?

To get the alpha level in a two tail hypothesis testing divide \(\alpha\) by 2. This is done as there are two rejection regions in the curve.

types of errors in hypothesis testing with examples

Hypothesis Testing: Understanding the Basics, Types, and Importance

Hypothesis testing is a statistical method used to determine whether a hypothesis about a population parameter is true or not. This technique helps researchers and decision-makers make informed decisions based on evidence rather than guesses. Hypothesis testing is an essential tool in scientific research, social sciences, and business analysis. In this article, we will delve deeper into the basics of hypothesis testing, types of hypotheses, significance level, p-values, and the importance of hypothesis testing.

  • Introduction

What is a hypothesis?

What is hypothesis testing, types of hypotheses, null hypothesis, alternative hypothesis, one-tailed and two-tailed tests, significance level and p-values, avoiding type i and type ii errors, making informed decisions, testing business strategies, a/b testing, formulating the null and alternative hypotheses, selecting the appropriate test, setting the level of significance, calculating the p-value, making a decision, common misconceptions about hypothesis testing, understanding hypothesis testing.

A hypothesis is an assumption or a proposition made about a population parameter. It is a statement that can be tested and either supported or refuted. For example, a hypothesis could be that a new medication reduces the severity of symptoms in patients with a particular disease.

Hypothesis testing is a statistical method that helps to determine whether a hypothesis is true or not. It is a procedure that involves collecting and analyzing data to evaluate the probability of the null hypothesis being true. The null hypothesis is the hypothesis that there is no significant difference between a sample and the population.

In hypothesis testing, there are two types of hypotheses: null and alternative.

The null hypothesis, denoted by H0, is a statement of no effect, no relationship, or no difference between the sample and the population. It is assumed to be true until there is sufficient evidence to reject it. For example, the null hypothesis could be that there is no significant difference in the blood pressure of patients who received the medication and those who received a placebo.

The alternative hypothesis, denoted by H1, is a statement of an effect, relationship, or difference between the sample and the population. It is the opposite of the null hypothesis. For example, the alternative hypothesis could be that the medication reduces the blood pressure of patients compared to those who received a placebo.

There are two types of alternative hypotheses: one-tailed and two-tailed. A one-tailed test is used when there is a directional hypothesis. For example, the hypothesis could be that the medication reduces blood pressure. A two-tailed test is used when there is a non-directional hypothesis. For example, the hypothesis could be that there is a significant difference in blood pressure between patients who received the medication and those who received a placebo.

The significance level, denoted by α, is the probability of rejecting the null hypothesis when it is true. It is set at the beginning of the test, usually at 5% or 1%. The p-value is the probability of obtaining a test statistic as extreme as

or more extreme than the observed one, assuming that the null hypothesis is true. If the p-value is less than the significance level, we reject the null hypothesis.

Importance of Hypothesis Testing

Hypothesis testing helps to avoid Type I and Type II errors. Type I error occurs when we reject the null hypothesis when it is actually true. Type II error occurs when we fail to reject the null hypothesis when it is actually false. By setting a significance level and calculating the p-value, we can control the probability of making these errors.

Hypothesis testing helps researchers and decision-makers make informed decisions based on evidence. For example, a medical researcher can use hypothesis testing to determine the effectiveness of a new drug. A business analyst can use hypothesis testing to evaluate the performance of a marketing campaign. By testing hypotheses, decision-makers can avoid making decisions based on guesses or assumptions.

Hypothesis testing is widely used in business analysis to test strategies and make data-driven decisions. For example, a business owner can use hypothesis testing to determine whether a new product will be profitable. By conducting A/B testing, businesses can compare the performance of two versions of a product and make data-driven decisions.

Examples of Hypothesis Testing

  • A/B testing is a popular technique used in online marketing and web design. It involves comparing two versions of a webpage or an advertisement to determine which one performs better. By conducting A/B testing, businesses can optimize their websites and advertisements to increase conversions and sales.

A t-test is used to compare the means of two samples. It is commonly used in medical research, social sciences, and business analysis. For example, a researcher can use a t-test to determine whether there is a significant difference in the cholesterol levels of patients who received a new drug and those who received a placebo.

Analysis of Variance (ANOVA) is a statistical technique used to compare the means of more than two samples. It is commonly used in medical research, social sciences, and business analysis. For example, a business owner can use ANOVA to determine whether there is a significant difference in the sales performance of three different stores.

Steps in Hypothesis Testing

The first step in hypothesis testing is to formulate the null and alternative hypotheses. The null hypothesis is the hypothesis that there is no significant difference between the sample and the population, while the alternative hypothesis is the opposite.

The second step is to select the appropriate test based on the type of data and the research question. There are different types of tests for different types of data, such as t-test for continuous data and chi-square test for categorical data.

The third step is to set the level of significance, which is usually 5% or 1%. The significance level represents the probability of rejecting the null hypothesis when it is actually true.

The fourth step is to calculate the p-value, which represents the probability of obtaining a test statistic as extreme as or more extreme than the observed one, assuming that the null hypothesis is true.

The final step is to make a decision based on the p-value and the significance level. If the p-value is less than the significance level, we reject the null hypothesis. Otherwise, we fail to reject the null hypothesis.

There are several common misconceptions about hypothesis testing. One of the most common misconceptions is that rejecting the null hypothesis means that the alternative hypothesis is true. However

this is not necessarily the case. Rejecting the null hypothesis only means that there is evidence against it, but it does not prove that the alternative hypothesis is true. Another common misconception is that hypothesis testing can prove causality. However, hypothesis testing can only provide evidence for or against a hypothesis, and causality can only be inferred from a well-designed experiment.

Hypothesis testing is an important statistical technique used to test hypotheses and make informed decisions based on evidence. It helps to avoid Type I and Type II errors, and it is widely used in medical research, social sciences, and business analysis. By following the steps in hypothesis testing and avoiding common misconceptions, researchers and decision-makers can make data-driven decisions and avoid making decisions based on guesses or assumptions.

  • What is the difference between Type I and Type II errors in hypothesis testing?
  • Type I error occurs when we reject the null hypothesis when it is actually true, while Type II error occurs when we fail to reject the null hypothesis when it is actually false.
  • How do you select the appropriate test in hypothesis testing?
  • The appropriate test is selected based on the type of data and the research question. There are different types of tests for different types of data, such as t-test for continuous data and chi-square test for categorical data.
  • Can hypothesis testing prove causality?
  • No, hypothesis testing can only provide evidence for or against a hypothesis, and causality can only be inferred from a well-designed experiment.
  • Why is hypothesis testing important in business analysis?
  • Hypothesis testing is important in business analysis because it helps businesses make data-driven decisions and avoid making decisions based on guesses or assumptions. By testing hypotheses, businesses can evaluate the effectiveness of their strategies and optimize their performance.
  • What is A/B testing?

If you want to learn more about statistical analysis, including central tendency measures, check out our  comprehensive statistical course . Our course provides a hands-on learning experience that covers all the essential statistical concepts and tools, empowering you to analyze complex data with confidence. With practical examples and interactive exercises, you’ll gain the skills you need to succeed in your statistical analysis endeavors. Enroll now and take your statistical knowledge to the next level!

If you’re looking to jumpstart your career as a data analyst, consider enrolling in our comprehensive  Data Analyst Bootcamp with Internship program . Our program provides you with the skills and experience necessary to succeed in today’s data-driven world. You’ll learn the fundamentals of statistical analysis, as well as how to use tools such as SQL, Python, Excel, and PowerBI to analyze and visualize data. But that’s not all – our program also includes a 3-month internship with us where you can showcase your Capstone Project.

2 Responses

This is a great and comprehensive article on hypothesis testing, covering everything from the basics to practical examples. I particularly appreciate the section on common misconceptions, as it’s important to understand what hypothesis testing can and cannot do. Overall, a valuable resource for anyone looking to understand this statistical technique.

Thanks, Ana Carol for your Kind words, Yes these topics are very important to know in Artificial intelligence.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Math Article
  • Type I And Type Ii Errors

Type I and Type II Errors

Type I and Type II errors are subjected to the result of the null hypothesis. In case of type I or type-1 error, the null hypothesis is rejected though it is true whereas type II or type-2 error, the null hypothesis is not rejected even when the alternative hypothesis is true. Both the error type-i and type-ii are also known as “ false negative ”. A lot of statistical theory rotates around the reduction of one or both of these errors, still, the total elimination of both is explained as a statistical impossibility.

Type I Error

A type I error appears when the null hypothesis (H 0 ) of an experiment is true, but still, it is rejected. It is stating something which is not present or a false hit. A type I error is often called a false positive (an event that shows that a given condition is present when it is absent). In words of community tales, a person may see the bear when there is none (raising a false alarm) where the null hypothesis (H 0 ) contains the statement: “There is no bear”.

The type I error significance level or rate level is the probability of refusing the null hypothesis given that it is true. It is represented by Greek letter α (alpha) and is also known as alpha level. Usually, the significance level or the probability of type i error is set to 0.05 (5%), assuming that it is satisfactory to have a 5% probability of inaccurately rejecting the null hypothesis.

Type II Error

A type II error appears when the null hypothesis is false but mistakenly fails to be refused. It is losing to state what is present and a miss. A type II error is also known as false negative (where a real hit was rejected by the test and is observed as a miss), in an experiment checking for a condition with a final outcome of true or false.

A type II error is assigned when a true alternative hypothesis is not acknowledged. In other words, an examiner may miss discovering the bear when in fact a bear is present (hence fails in raising the alarm). Again, H0, the null hypothesis, consists of the statement that, “There is no bear”, wherein, if a wolf is indeed present, is a type II error on the part of the investigator. Here, the bear either exists or does not exist within given circumstances, the question arises here is if it is correctly identified or not, either missing detecting it when it is present, or identifying it when it is not present.

The rate level of the type II error is represented by the Greek letter β (beta) and linked to the power of a test (which equals 1−β).

Also, read:

Table of Type I and Type II Error

The relationship between truth or false of the null hypothesis and outcomes or result of the test is given in the tabular form:

Type I and Type II Errors Example

Check out some real-life examples to understand the type-i and type-ii error in the null hypothesis.

Example 1 : Let us consider a null hypothesis – A man is not guilty of a crime.

Then in this case:

Example 2: Null hypothesis- A patient’s signs after treatment A, are the same from a placebo.

Leave a Comment Cancel reply

Your Mobile number and Email id will not be published. Required fields are marked *

Request OTP on Voice Call

Post My Comment

types of errors in hypothesis testing with examples

  • Share Share

Register with BYJU'S & Download Free PDFs

Register with byju's & watch live videos.

close

Tutorial Playlist

Statistics tutorial, everything you need to know about the probability density function in statistics, the best guide to understand central limit theorem, an in-depth guide to measures of central tendency : mean, median and mode, the ultimate guide to understand conditional probability.

A Comprehensive Look at Percentile in Statistics

The Best Guide to Understand Bayes Theorem

Everything you need to know about the normal distribution, an in-depth explanation of cumulative distribution function, a complete guide to chi-square test, a complete guide on hypothesis testing in statistics, understanding the fundamentals of arithmetic and geometric progression, the definitive guide to understand spearman’s rank correlation, a comprehensive guide to understand mean squared error, all you need to know about the empirical rule in statistics, the complete guide to skewness and kurtosis, a holistic look at bernoulli distribution.

All You Need to Know About Bias in Statistics

A Complete Guide to Get a Grasp of Time Series Analysis

The Key Differences Between Z-Test Vs. T-Test

The Complete Guide to Understand Pearson's Correlation

A complete guide on the types of statistical studies, everything you need to know about poisson distribution, your best guide to understand correlation vs. regression, the most comprehensive guide for beginners on what is correlation, what is hypothesis testing in statistics types and examples.

Lesson 10 of 24 By Avijeet Biswal

A Complete Guide on Hypothesis Testing in Statistics

Table of Contents

In today’s data-driven world , decisions are based on data all the time. Hypothesis plays a crucial role in that process, whether it may be making business decisions, in the health sector, academia, or in quality improvement. Without hypothesis & hypothesis tests, you risk drawing the wrong conclusions and making bad decisions. In this tutorial, you will look at Hypothesis Testing in Statistics.

What Is Hypothesis Testing in Statistics?

Hypothesis Testing is a type of statistical analysis in which you put your assumptions about a population parameter to the test. It is used to estimate the relationship between 2 statistical variables.

Let's discuss few examples of statistical hypothesis from real-life - 

  • A teacher assumes that 60% of his college's students come from lower-middle-class families.
  • A doctor believes that 3D (Diet, Dose, and Discipline) is 90% effective for diabetic patients.

Now that you know about hypothesis testing, look at the two types of hypothesis testing in statistics.

Hypothesis Testing Formula

Z = ( x̅ – μ0 ) / (σ /√n)

  • Here, x̅ is the sample mean,
  • μ0 is the population mean,
  • σ is the standard deviation,
  • n is the sample size.

How Hypothesis Testing Works?

An analyst performs hypothesis testing on a statistical sample to present evidence of the plausibility of the null hypothesis. Measurements and analyses are conducted on a random sample of the population to test a theory. Analysts use a random population sample to test two hypotheses: the null and alternative hypotheses.

The null hypothesis is typically an equality hypothesis between population parameters; for example, a null hypothesis may claim that the population means return equals zero. The alternate hypothesis is essentially the inverse of the null hypothesis (e.g., the population means the return is not equal to zero). As a result, they are mutually exclusive, and only one can be correct. One of the two possibilities, however, will always be correct.

Your Dream Career is Just Around The Corner!

Your Dream Career is Just Around The Corner!

Null Hypothesis and Alternate Hypothesis

The Null Hypothesis is the assumption that the event will not occur. A null hypothesis has no bearing on the study's outcome unless it is rejected.

H0 is the symbol for it, and it is pronounced H-naught.

The Alternate Hypothesis is the logical opposite of the null hypothesis. The acceptance of the alternative hypothesis follows the rejection of the null hypothesis. H1 is the symbol for it.

Let's understand this with an example.

A sanitizer manufacturer claims that its product kills 95 percent of germs on average. 

To put this company's claim to the test, create a null and alternate hypothesis.

H0 (Null Hypothesis): Average = 95%.

Alternative Hypothesis (H1): The average is less than 95%.

Another straightforward example to understand this concept is determining whether or not a coin is fair and balanced. The null hypothesis states that the probability of a show of heads is equal to the likelihood of a show of tails. In contrast, the alternate theory states that the probability of a show of heads and tails would be very different.

Become a Data Scientist with Hands-on Training!

Become a Data Scientist with Hands-on Training!

Hypothesis Testing Calculation With Examples

Let's consider a hypothesis test for the average height of women in the United States. Suppose our null hypothesis is that the average height is 5'4". We gather a sample of 100 women and determine that their average height is 5'5". The standard deviation of population is 2.

To calculate the z-score, we would use the following formula:

z = ( x̅ – μ0 ) / (σ /√n)

z = (5'5" - 5'4") / (2" / √100)

z = 0.5 / (0.045)

 We will reject the null hypothesis as the z-score of 11.11 is very large and conclude that there is evidence to suggest that the average height of women in the US is greater than 5'4".

Steps of Hypothesis Testing

Step 1: specify your null and alternate hypotheses.

It is critical to rephrase your original research hypothesis (the prediction that you wish to study) as a null (Ho) and alternative (Ha) hypothesis so that you can test it quantitatively. Your first hypothesis, which predicts a link between variables, is generally your alternate hypothesis. The null hypothesis predicts no link between the variables of interest.

Step 2: Gather Data

For a statistical test to be legitimate, sampling and data collection must be done in a way that is meant to test your hypothesis. You cannot draw statistical conclusions about the population you are interested in if your data is not representative.

Step 3: Conduct a Statistical Test

Other statistical tests are available, but they all compare within-group variance (how to spread out the data inside a category) against between-group variance (how different the categories are from one another). If the between-group variation is big enough that there is little or no overlap between groups, your statistical test will display a low p-value to represent this. This suggests that the disparities between these groups are unlikely to have occurred by accident. Alternatively, if there is a large within-group variance and a low between-group variance, your statistical test will show a high p-value. Any difference you find across groups is most likely attributable to chance. The variety of variables and the level of measurement of your obtained data will influence your statistical test selection.

Step 4: Determine Rejection Of Your Null Hypothesis

Your statistical test results must determine whether your null hypothesis should be rejected or not. In most circumstances, you will base your judgment on the p-value provided by the statistical test. In most circumstances, your preset level of significance for rejecting the null hypothesis will be 0.05 - that is, when there is less than a 5% likelihood that these data would be seen if the null hypothesis were true. In other circumstances, researchers use a lower level of significance, such as 0.01 (1%). This reduces the possibility of wrongly rejecting the null hypothesis.

Step 5: Present Your Results 

The findings of hypothesis testing will be discussed in the results and discussion portions of your research paper, dissertation, or thesis. You should include a concise overview of the data and a summary of the findings of your statistical test in the results section. You can talk about whether your results confirmed your initial hypothesis or not in the conversation. Rejecting or failing to reject the null hypothesis is a formal term used in hypothesis testing. This is likely a must for your statistics assignments.

Types of Hypothesis Testing

To determine whether a discovery or relationship is statistically significant, hypothesis testing uses a z-test. It usually checks to see if two means are the same (the null hypothesis). Only when the population standard deviation is known and the sample size is 30 data points or more, can a z-test be applied.

A statistical test called a t-test is employed to compare the means of two groups. To determine whether two groups differ or if a procedure or treatment affects the population of interest, it is frequently used in hypothesis testing.

Chi-Square 

You utilize a Chi-square test for hypothesis testing concerning whether your data is as predicted. To determine if the expected and observed results are well-fitted, the Chi-square test analyzes the differences between categorical variables from a random sample. The test's fundamental premise is that the observed values in your data should be compared to the predicted values that would be present if the null hypothesis were true.

Hypothesis Testing and Confidence Intervals

Both confidence intervals and hypothesis tests are inferential techniques that depend on approximating the sample distribution. Data from a sample is used to estimate a population parameter using confidence intervals. Data from a sample is used in hypothesis testing to examine a given hypothesis. We must have a postulated parameter to conduct hypothesis testing.

Bootstrap distributions and randomization distributions are created using comparable simulation techniques. The observed sample statistic is the focal point of a bootstrap distribution, whereas the null hypothesis value is the focal point of a randomization distribution.

A variety of feasible population parameter estimates are included in confidence ranges. In this lesson, we created just two-tailed confidence intervals. There is a direct connection between these two-tail confidence intervals and these two-tail hypothesis tests. The results of a two-tailed hypothesis test and two-tailed confidence intervals typically provide the same results. In other words, a hypothesis test at the 0.05 level will virtually always fail to reject the null hypothesis if the 95% confidence interval contains the predicted value. A hypothesis test at the 0.05 level will nearly certainly reject the null hypothesis if the 95% confidence interval does not include the hypothesized parameter.

Simple and Composite Hypothesis Testing

Depending on the population distribution, you can classify the statistical hypothesis into two types.

Simple Hypothesis: A simple hypothesis specifies an exact value for the parameter.

Composite Hypothesis: A composite hypothesis specifies a range of values.

A company is claiming that their average sales for this quarter are 1000 units. This is an example of a simple hypothesis.

Suppose the company claims that the sales are in the range of 900 to 1000 units. Then this is a case of a composite hypothesis.

One-Tailed and Two-Tailed Hypothesis Testing

The One-Tailed test, also called a directional test, considers a critical region of data that would result in the null hypothesis being rejected if the test sample falls into it, inevitably meaning the acceptance of the alternate hypothesis.

In a one-tailed test, the critical distribution area is one-sided, meaning the test sample is either greater or lesser than a specific value.

In two tails, the test sample is checked to be greater or less than a range of values in a Two-Tailed test, implying that the critical distribution area is two-sided.

If the sample falls within this range, the alternate hypothesis will be accepted, and the null hypothesis will be rejected.

Become a Data Scientist With Real-World Experience

Become a Data Scientist With Real-World Experience

Right Tailed Hypothesis Testing

If the larger than (>) sign appears in your hypothesis statement, you are using a right-tailed test, also known as an upper test. Or, to put it another way, the disparity is to the right. For instance, you can contrast the battery life before and after a change in production. Your hypothesis statements can be the following if you want to know if the battery life is longer than the original (let's say 90 hours):

  • The null hypothesis is (H0 <= 90) or less change.
  • A possibility is that battery life has risen (H1) > 90.

The crucial point in this situation is that the alternate hypothesis (H1), not the null hypothesis, decides whether you get a right-tailed test.

Left Tailed Hypothesis Testing

Alternative hypotheses that assert the true value of a parameter is lower than the null hypothesis are tested with a left-tailed test; they are indicated by the asterisk "<".

Suppose H0: mean = 50 and H1: mean not equal to 50

According to the H1, the mean can be greater than or less than 50. This is an example of a Two-tailed test.

In a similar manner, if H0: mean >=50, then H1: mean <50

Here the mean is less than 50. It is called a One-tailed test.

Type 1 and Type 2 Error

A hypothesis test can result in two types of errors.

Type 1 Error: A Type-I error occurs when sample results reject the null hypothesis despite being true.

Type 2 Error: A Type-II error occurs when the null hypothesis is not rejected when it is false, unlike a Type-I error.

Suppose a teacher evaluates the examination paper to decide whether a student passes or fails.

H0: Student has passed

H1: Student has failed

Type I error will be the teacher failing the student [rejects H0] although the student scored the passing marks [H0 was true]. 

Type II error will be the case where the teacher passes the student [do not reject H0] although the student did not score the passing marks [H1 is true].

Level of Significance

The alpha value is a criterion for determining whether a test statistic is statistically significant. In a statistical test, Alpha represents an acceptable probability of a Type I error. Because alpha is a probability, it can be anywhere between 0 and 1. In practice, the most commonly used alpha values are 0.01, 0.05, and 0.1, which represent a 1%, 5%, and 10% chance of a Type I error, respectively (i.e. rejecting the null hypothesis when it is in fact correct).

Future-Proof Your AI/ML Career: Top Dos and Don'ts

Future-Proof Your AI/ML Career: Top Dos and Don'ts

A p-value is a metric that expresses the likelihood that an observed difference could have occurred by chance. As the p-value decreases the statistical significance of the observed difference increases. If the p-value is too low, you reject the null hypothesis.

Here you have taken an example in which you are trying to test whether the new advertising campaign has increased the product's sales. The p-value is the likelihood that the null hypothesis, which states that there is no change in the sales due to the new advertising campaign, is true. If the p-value is .30, then there is a 30% chance that there is no increase or decrease in the product's sales.  If the p-value is 0.03, then there is a 3% probability that there is no increase or decrease in the sales value due to the new advertising campaign. As you can see, the lower the p-value, the chances of the alternate hypothesis being true increases, which means that the new advertising campaign causes an increase or decrease in sales.

Why is Hypothesis Testing Important in Research Methodology?

Hypothesis testing is crucial in research methodology for several reasons:

  • Provides evidence-based conclusions: It allows researchers to make objective conclusions based on empirical data, providing evidence to support or refute their research hypotheses.
  • Supports decision-making: It helps make informed decisions, such as accepting or rejecting a new treatment, implementing policy changes, or adopting new practices.
  • Adds rigor and validity: It adds scientific rigor to research using statistical methods to analyze data, ensuring that conclusions are based on sound statistical evidence.
  • Contributes to the advancement of knowledge: By testing hypotheses, researchers contribute to the growth of knowledge in their respective fields by confirming existing theories or discovering new patterns and relationships.

Limitations of Hypothesis Testing

Hypothesis testing has some limitations that researchers should be aware of:

  • It cannot prove or establish the truth: Hypothesis testing provides evidence to support or reject a hypothesis, but it cannot confirm the absolute truth of the research question.
  • Results are sample-specific: Hypothesis testing is based on analyzing a sample from a population, and the conclusions drawn are specific to that particular sample.
  • Possible errors: During hypothesis testing, there is a chance of committing type I error (rejecting a true null hypothesis) or type II error (failing to reject a false null hypothesis).
  • Assumptions and requirements: Different tests have specific assumptions and requirements that must be met to accurately interpret results.

After reading this tutorial, you would have a much better understanding of hypothesis testing, one of the most important concepts in the field of Data Science . The majority of hypotheses are based on speculation about observed behavior, natural phenomena, or established theories.

If you are interested in statistics of data science and skills needed for such a career, you ought to explore Simplilearn’s Post Graduate Program in Data Science.

If you have any questions regarding this ‘Hypothesis Testing In Statistics’ tutorial, do share them in the comment section. Our subject matter expert will respond to your queries. Happy learning!

1. What is hypothesis testing in statistics with example?

Hypothesis testing is a statistical method used to determine if there is enough evidence in a sample data to draw conclusions about a population. It involves formulating two competing hypotheses, the null hypothesis (H0) and the alternative hypothesis (Ha), and then collecting data to assess the evidence. An example: testing if a new drug improves patient recovery (Ha) compared to the standard treatment (H0) based on collected patient data.

2. What is hypothesis testing and its types?

Hypothesis testing is a statistical method used to make inferences about a population based on sample data. It involves formulating two hypotheses: the null hypothesis (H0), which represents the default assumption, and the alternative hypothesis (Ha), which contradicts H0. The goal is to assess the evidence and determine whether there is enough statistical significance to reject the null hypothesis in favor of the alternative hypothesis.

Types of hypothesis testing:

  • One-sample test: Used to compare a sample to a known value or a hypothesized value.
  • Two-sample test: Compares two independent samples to assess if there is a significant difference between their means or distributions.
  • Paired-sample test: Compares two related samples, such as pre-test and post-test data, to evaluate changes within the same subjects over time or under different conditions.
  • Chi-square test: Used to analyze categorical data and determine if there is a significant association between variables.
  • ANOVA (Analysis of Variance): Compares means across multiple groups to check if there is a significant difference between them.

3. What are the steps of hypothesis testing?

The steps of hypothesis testing are as follows:

  • Formulate the hypotheses: State the null hypothesis (H0) and the alternative hypothesis (Ha) based on the research question.
  • Set the significance level: Determine the acceptable level of error (alpha) for making a decision.
  • Collect and analyze data: Gather and process the sample data.
  • Compute test statistic: Calculate the appropriate statistical test to assess the evidence.
  • Make a decision: Compare the test statistic with critical values or p-values and determine whether to reject H0 in favor of Ha or not.
  • Draw conclusions: Interpret the results and communicate the findings in the context of the research question.

4. What are the 2 types of hypothesis testing?

  • One-tailed (or one-sided) test: Tests for the significance of an effect in only one direction, either positive or negative.
  • Two-tailed (or two-sided) test: Tests for the significance of an effect in both directions, allowing for the possibility of a positive or negative effect.

The choice between one-tailed and two-tailed tests depends on the specific research question and the directionality of the expected effect.

5. What are the 3 major types of hypothesis?

The three major types of hypotheses are:

  • Null Hypothesis (H0): Represents the default assumption, stating that there is no significant effect or relationship in the data.
  • Alternative Hypothesis (Ha): Contradicts the null hypothesis and proposes a specific effect or relationship that researchers want to investigate.
  • Nondirectional Hypothesis: An alternative hypothesis that doesn't specify the direction of the effect, leaving it open for both positive and negative possibilities.

Find our Data Analyst Online Bootcamp in top cities:

About the author.

Avijeet Biswal

Avijeet is a Senior Research Analyst at Simplilearn. Passionate about Data Analytics, Machine Learning, and Deep Learning, Avijeet is also interested in politics, cricket, and football.

Recommended Resources

The Key Differences Between Z-Test Vs. T-Test

Free eBook: Top Programming Languages For A Data Scientist

Normality Test in Minitab: Minitab with Statistics

Normality Test in Minitab: Minitab with Statistics

A Comprehensive Look at Percentile in Statistics

Machine Learning Career Guide: A Playbook to Becoming a Machine Learning Engineer

  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.
  • School Guide
  • Mathematics
  • Number System and Arithmetic
  • Trigonometry
  • Probability
  • Mensuration
  • Maths Formulas
  • Class 8 Maths Notes
  • Class 9 Maths Notes
  • Class 10 Maths Notes
  • Class 11 Maths Notes
  • Class 12 Maths Notes
  • Difference Between Variance and Standard Deviation
  • Skewness Formula
  • What is Arithmetic?
  • How are sine, cosine, and tangent calculated?
  • What is the value of (-8/27) -2/3 ?
  • In a triangle XYZ right angled at Y, find the side length of YZ, if XY = 5 cm and ∠C = 30°
  • What is the probability that the sum of the numbers is divisible by 4 when two dice are thrown simultaneously?
  • What is the probability of getting a sum of 9 when two dice are thrown simultaneously?
  • What is the probability of getting a prime number between 1 to 100?
  • Difference between an Arithmetic Sequence and a Geometric Sequence
  • Simplify and express the result in positive power notation: (−4) 5 ÷ (−4) 8
  • How to express a number as a power?
  • How to find the mean of a dataset?
  • How to Find Square Root of a Number?
  • Evaluating Definite Integrals
  • Quotient Rule | Formula, Proof and Examples
  • Derivative of Functions in Parametric Forms
  • Derivatives of Composite Functions

Tests of Significance

Test of significance is a process for comparing observed data with a claim (also called a hypothesis), the truth of which is being assessed in further analysis.

In this article, we will learn about, Significance testing, null hypothesis, test of significance and others in detail.

Table of Content

Significance Testing

Tests of significance in statistics, process of significance testing, types of errors, statistical tests, types of statistical tests, what is p-value testing.

Statistics involves the issue of assessing whether a result obtained from an experiment is important enough or not. In the field of quantitative significance, there are defined tests that may have relevant uses. The designation of tests depends on the type of tests or the tests of significance are more known as the simple significance tests.

These stand up for certain levels of error mislead. Sometimes the trial designer is called upon to predefine the probability of sampling error in the initial stage of the experiment. The population sampling test is regarded as one which does not study the whole, and as such the sampling error always exists. The testing of the significance is an equally important part of the statistical research.

Null Hypothesis

Every test for significance starts with a null hypothesis H 0 . H 0 represents a theory that has been suggested, either because it’s believed to be true or because it’s to be used as a basis for argument, but has not been proved. For example, during a clinical test of a replacement drug, the null hypothesis could be that the new drug is not any better, on average than the present drug. We would write H 0 : there’s no difference between the 2 drugs on average.

In technical terms, it is a probability measurement of a certain statistical test or research in the theory making in a way that the outcome must have occurred by chance instead of the test or experiment being right. The ultimate goal of descriptive statistical research is the revelation of the truth In doing so, the researcher has to make sure that the sample is of good quality, the error is minimal, and the measures are precise. These things are to be completed through several stages. The researcher will need to know whether the experimental outcomes are from a proper study process or just due to chance.

The sample size is the one that primarily specifies the probability that the event could occur without the effect of really performed research. It may be weak or strong depending on a certain statistical significance. Its bearings are put into question. They may or may not make a difference. The presence of a careless researcher can be a start of when a researcher instead of carefully making use of language in the report of his experiment, the significance of the study might be misinterpreted.

In the process of testing for statistical significance, the following steps must be taken:

Step 1: Start by coming up with a research idea or question for your thesis. Step 2: Create a neutral comparison to test against your hypothesis. Step 3: Decide on the level of certainty you need for your results, which affects the type of sign language translators and communication methods you’ll use. Step 4: Choose the appropriate statistical test to analyze your data accurately. Step 5: Understand and explain what your results mean in the context of your research question.

There are basically two types of errors:

Type I Error

Type ii error.

Now let’s learn about these errors in detail.

A type I error is where the researcher finds out that the relationship presumed maxim is a case; however, there is evidence showing it is not a function explained. This type of error leads to a failure of the researcher who says that the H 0 or null hypothesis has to be accepted while in reality, it was supposed to be rejected together with the research hypothesis. Researchers commit an error in the first type when α (alpha) is their probability.

Type II error is the same as the type I error is the case. You begin to suppress your emotions and avoid experiencing any connection when someone thinks that you have no relation even though there does exist among you. In this sort of error, the researcher is expected to see the research hypothesis as true and treat the null hypotheses as false while he may do not and the opposite situation happens. Type II error is identified with β that equals to the possibility to make a type II error which is an error of omission.

One-tailed and two-tailed statistical tests help determine how significant a finding is in a set of data.

When we think that a parameter might change in one specific direction from a baseline, we use a one-tailed test. For example, if we’re testing whether a new drug makes people perform better, we might only care if it improves performance, not if it makes it worse.

On the flip side, a two-tailed test comes into play when changes could go in either direction from the baseline. For instance, if we’re studying the effect of a new teaching method on test scores, we’d want to know if it makes scores better or worse, so we’d use a two-tailed test.

Hypothesis testing can be done via use of either one-tailed or two-tailed statistical test. The purpose of these tests is to obtain the probability with which a parameter from a given data set is statistically significant. These are also called lateral flow and dipstick tests.

  • One-tailed test can be used so that the differences of the parameter estimations within only one side from a given standard can be perceived plausible.
  • Two-tailed test needs to be applied in the case when you consider deviations from both sides of benchmark value as possible in science.

The expression “tail” is used in the terminology in which those tests are referred and the reason for that is that outliers, i.e. observation ended up rejecting the null hypothesis, are the extreme points of the distribution, those areas normally have a small influence or “tail off” similar to the bell shape or normal distribution. One study should make an application either the one-tailed test or two-tailed test according to the judgment of the research hypothesis.

In the case of data information significance, the p-value is an additional and significant term for hypothesis testing. The p-value is a function whose domain is the observed result of sample and range is testing subset of statistical hypothesis which is being used for testing of statistical hypothesis. It must determine what the threshold value is before starting of the test. The significance level holds the name, traditional 1% or 5%, which stands for the level of the significance considered to be of value. One of the parameters of the Savings function is α.

In the condition if the p-value is greater than or equal the α term, inconsistency between our null model and the data exists. As a result the null hypothesis should be rejected and a new hypothesis may be supposed being true, or may be assumed as such one.

Example on Test of Significance

Some examples of test of significance are added below:

Example 1: T-Test for Medical Research – The T Test

For example, a medical study researching the performance of a new drug that comes to the conclusion of a reduced in blood pressure. The researchers predict that the patients taking the new drug will show a frankly larger decrease in blood pressure as opposed to the study participants on a placebo. They collect data from two groups: treat one group with an experimental drug and give all the placebo to the second group. Researchers apply a t-test to the data in order determine the value of two assumed normal populations difference and study whether it statistically significant. The H0 (null hypothesis) could state that there is no significant difference in the blood pressure registered in the two groups of subjects, while the HA1 (alternative hypothesis) should be indicating the positivity of a significant difference. They can check whether or not the outcomes are significantly different by using the t-test, and therefore reduce the possibility of any confusing hypotheses.

Example 2: Chi-Square Analysis in Market Research

Think about the situation where you have to carry out a market research work to ascertain the link between customers satisfaction (comprised of satisfied satisfied or neutral scores) and their product preferences (the three products designated as Product A, Product B, and Product C). A chi-square test was used by the researchers to check whether they had a substantial association with the two categorical variables they were dealing with. The H0 null hypothesis states customer satisfaction and product preferences are unrelated, the contrary to which H1 alternative hypothesis shows the customers’ satisfaction and product preferences are related. Thereby, the researchers will be able to execute the chi-square test on the gathered data and find out if the existed observations among customer satisfaction and product preferences are statistically significant by doing so. This allows us to make conclusions how the satisfaction degree of customers affects the market conception of goods for the target market.

Example 3: ANOVA in Educational Research

Think of a researcher whom is studying if there is any difference between various learning ways and their effect on students’ study achievements. HO represents the null hypothesis which asserts no differences in scores for the groups while the alternative hypothesis (HA) claims at least one group has a different mean. Via use Analysis of Variance ( ANOVA ), a researcher determines whether or not there is any statistically significant difference in performance hence, across the methods of teaching.

Example 4: Regression Analysis in Economics

In an economic study, researchers examine the connection between ads cost and revenue for the group of businesses that have recently disclosed their financial results. The null space proposes that there is no such linear connection between the advertisement spending and purchases. Among the models, the regression analysis used to determine whether the changes in sales are attributed to the changes in advertising to a statistically significant level (the regression line slope is significantly different from zero) is chosen.

Example 5: Paired T-Test in Psychology

A psychologist decides to do a study to find out if a new type of therapy can make someone get rid of anxiety. Patients are evaluated of their level of anxiety prior to initiating the intervention and right after. The null hypothesis claims that there is no noticeable difference in the levels of anxiety from a pre-intervention to a post-intervention setting. Using a paired t-test, a psychologist who collected the anxiety scores of a group before and after the experiment can prove statistically the observed change in these scores.

FAQs on Test of Significance

Define statistical significance test.

Random distribution of observed data implies that there must be a certain cause behind which could then be associated with the data. This outcome is also referred to as the statistical significance. Whatever the explicit field or the profession that rely utterly on numbers and research, like finance, economics, investing, medicine, and biology, statistic is important.

What is the meaning of a test of significance?

Statistical significant tests work in order to determine if the differences found in assessment data is just due to random errors arising from sampling or not. This is a “silent” category of research that ought to be overlooked for it brings on mere incompatibilities.

What is the importance of the Significance test?

In experiments, the significance tests indeed have specific applied value. That is because they help researchers to draw conclusion whether the data supports or not the null hypothesis, and therefore whether the alternative hypothesis is true or not.

How many types of Significance tests are there in statistical mathematics?

In statistics, we have tests like t-test, aZ-test, chi-square test, annoVA test, binomial test, mediana test and others. Greatly decentralized data can be tested with parametric tests.

How does choosing a significance level (α) influence the interpretation of the attributable tests?

The parameter α which stands for the significance level is a function of this threshold, and to fail this test null hypothesis value has to be rejected. Hence, a smaller α value means higher strictness of acceptance threshold and false positives are limited while there could be an increase in false negatives..

Is significance testing limited to parametric methods like comparison of two means or, it can be applied to non-parametric datasets also?

Inference is something useful which can be miscellaneous and can adapt to parametric or non-parametric data. Non-parametric tests, for instance the Mann-Whitney U test and the Wilcoxon signed-rank test, are often applied in operations research, since they do not require that data meet the assumptions of parametric tests.

Please Login to comment...

author

  • Math-Statistics
  • School Learning
  • School Mathematics
  • How to Make Faceless Snapchat Shows Earning $1,000/Day
  • Zepto Pass Hits 1 Million Subscribers in Just One Week
  • China’s "AI Plus": An Initiative To Transform Industries
  • Top 10 Free Apps for Chatting
  • Dev Scripter 2024 - Biggest Technical Writing Event By GeeksforGeeks

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

Lab 7 – Hypothesis Testing

Introduction.

This lab is intended to serve as an introduction to hypothesis testing. Where available, we will use built-in functions in R to find and report p-values. In particular, we will investigate the finding of p-values for proportions, differences in proportions, odds ratios, means, and differences of means

Hypothesis Testing

Testing proportions.

While we are able to use our normality assumptions in constructing confidence intervals and p-values for single proportions, we are also able to compute them exactly using the theoretical properties of a true distribution. That is, while counting the number of heads in successive coin flips will become approximately normal as the number of flips increases, we can also utilize the fact that coin flipping follows a binomial distribution . As such, we can use an exact binomial to compute our confidence intervals and p-values rather than relying a normal approximations. This is done with the R function binom.test() . This function takes several arugments which you can learn more about with ?binom.test .

For now, let’s consider again the Johns Hopkins study that found 31 of 39 infants born preterm survived to 6 months. Here, we are wanting to check the hypothesis \(H_0: p_0 = 0.7\) . To do this, we will use binom.test() with the arguments:

  • x – the number of “successes” we observed
  • n – the total number of observations
  • p – our hypothesized proportion

The function will print out a litany of useful information

First, observed the bottom line, “probability of success” which is simply our observed test statistic, \(\hat{p} = 31/39\) . Note, however, that the \(p\) -value found here is not the same as what we found in the slides. This is a consequence of the fact that here we are utilizing assumptions from the binomial distribution rather than our normal approximation.

Just as the \(p\) -values change, so too do our confidence intervals. For example, in our slides Wednesday we found a 95% confidence interval to be (0.668, 0.922), while here they are slightly different, being (0.635, 0.907).

It is worth noting that we can also use binom.test() for the construction of confidence intervals. By default, the interval printed out is 95%, but we could just as well find an 80% confidence interval using the argument conf.level

Question 1: We see above that changing our confidence interval to 80% from 95% changed the interval, but not the resulting p-value. Why is this the case?

Question 2: You might also notice that the confidence interval we found using binom.test() is no longer symmetric around the estimate \(\hat{p}\) . Why is this the case?

Question 3: There is an intimate relationship between p-values and confidence intervals in that the exact same attributes of our sample data are used to construct each. Presented below are two scenarios:

  • We flipped a coin 5 times and it landed on head 4 times
  • We flipped a coin 50 times and it landed on head 40 times

Using the binom.test() function, create 95% confidence intervals and construct a \(p\) -value for the null hypothesis \(p_0 = 0.5\) . Then answer the following:

  • What is our estimate of \(\hat{p}\) for each of these scenarios?
  • What changed in our \(p\) -values and confidence intervals?
  • Which of these scenarios provides more evidence against the null hypothesis? Why?

Testing means

Testing odds ratios, testing things beyond our wildest dreams.

IMAGES

  1. Hypothesis Testing and Types of Errors

    types of errors in hypothesis testing with examples

  2. Type I & Type II Errors

    types of errors in hypothesis testing with examples

  3. PPT

    types of errors in hypothesis testing with examples

  4. Chapter 10 Hypothesis Testing

    types of errors in hypothesis testing with examples

  5. What are Type 1 and Type 2 Errors in A/B Testing and How to Avoid Them

    types of errors in hypothesis testing with examples

  6. PPT

    types of errors in hypothesis testing with examples

VIDEO

  1. Hypothesis Testing

  2. TYPES OF ERRORS IN HYPOTHESIS TESTING LEC147

  3. Hypothesis testing and errors in hypothesis testing

  4. SA14 A Level Statistics Hypothesis testing and types of errors

  5. Hypothesis testing exercise example t distribution

  6. Hypothesis Testing|| 8948156741

COMMENTS

  1. Types I & Type II Errors in Hypothesis Testing

    I have a question about Type I and Type II errors in the realm of equivalence testing using two one sided difference testing (TOST). In a recent 2020 publication that I co-authored with a statistician, we stated that the probability of concluding non-equivalence when that is the truth, (which is the opposite of power, the probability of ...

  2. Type I & Type II Errors

    Have a human editor polish your writing to ensure your arguments are judged on merit, not grammar errors. Get expert writing help

  3. 6.1

    6.1 - Type I and Type II Errors. When conducting a hypothesis test there are two possible decisions: reject the null hypothesis or fail to reject the null hypothesis. You should remember though, hypothesis testing uses data from a sample to make an inference about a population. When conducting a hypothesis test we do not know the population ...

  4. 9.3: Outcomes and the Type I and Type II Errors

    The following are examples of Type I and Type II errors. Example 9.3.1 9.3. 1: Type I vs. Type II errors. Suppose the null hypothesis, H0 H 0, is: Frank's rock climbing equipment is safe. Type I error: Frank thinks that his rock climbing equipment may not be safe when, in fact, it really is safe.

  5. Type I & Type II Errors

    Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test.Significance is usually denoted by a p-value, or probability value.. Statistical significance is arbitrary - it depends on the threshold, or alpha value, chosen by the researcher.

  6. Hypothesis Testing and Types of Errors

    Hypothesis Testing and Types of Errors. Illustrating a sample drawn from a population. Source: Six-Sigma-Material.com. Suppose we want to study income of a population. We study a sample from the population and draw conclusions. The sample should represent the population for our study to be a reliable one. Null hypothesis (H 0) ( H 0) is that ...

  7. 8.2: Type I and II Errors

    You will need to be able to write out a sentence interpreting either the type I or II errors given a set of hypotheses. You also need to know the relationship between \(\alpha\), β, confidence level, and power. Hypothesis tests are not flawless, since we can make a wrong decision in statistical hypothesis tests based on the data.

  8. 9.2 Outcomes and the Type I and Type II Errors

    Introduction; 9.1 Null and Alternative Hypotheses; 9.2 Outcomes and the Type I and Type II Errors; 9.3 Distribution Needed for Hypothesis Testing; 9.4 Rare Events, the Sample, and the Decision and Conclusion; 9.5 Additional Information and Full Hypothesis Test Examples; 9.6 Hypothesis Testing of a Single Mean and Single Proportion; Key Terms; Chapter Review; Formula Review

  9. Hypothesis Testing

    Hypothesis testing example Based on the type of data you collected, ... (Type I error). Hypothesis testing example In your analysis of the difference in average height between men and women, you find that the p-value of 0.002 is below your cutoff of 0.05, so you decide to reject your null hypothesis of no difference.

  10. 9.2: Outcomes, Type I and Type II Errors

    9.2: Outcomes, Type I and Type II Errors. When you perform a hypothesis test, there are four possible outcomes depending on the actual truth (or falseness) of the null hypothesis H0 and the decision to reject or not. The decision is not to reject H0 when H0 is true (correct decision).

  11. Hypothesis Testing along with Type I & Type II Errors explained simply

    Type I and Type II Errors. This type of statistical analysis is prone to errors. In the above example, it might be the case that the 20 students chosen are already very engaged and we wrongly decided the high mean engagement ratio is because of the new feature. The diagram below represents the four different scenarios that can happen.

  12. Clarifying Type I and Type II Errors in Hypothesis Testing

    Type II Errors — False Negatives (Beta) Beta (β) is another type of error, which is the possibility that you have not rejected the null hypothesis when it is actually incorrect. Type II errors are also known as false negatives. Beta is linked to something called Power, which, given that the null hypothesis is actually false, is the ...

  13. Hypothesis testing, type I and type II errors

    An example is the one-sided hypothesis that a drug has a greater frequency of side effects than a placebo; the possibility that the drug has fewer side effects than the placebo is not worth testing. Whatever strategy is used, it should be stated in advance; otherwise, it would lack statistical rigor.

  14. PDF 9.2 Types of Errors in Hypothesis testing

    What type of mistake could we make? 4 We have only two possible outcomes to a hypothesis test… 1) Reject the null (H 0) This occurs when our data provides some support for the alternative hypothesis. 2) Do not reject the null This occurs when our data did not give strong evidence against the null.

  15. Type I & Type II Errors in Hypothesis Testing: Examples

    This article describes Type I and Type II errors made due to incorrect evaluation of the outcome of hypothesis testing, based on a couple of examples such as the person comitting a crime, the house on fire, and Covid-19.You may want to note that it is key to understand type I and type II errors as these concepts will show up when we are evaluating a hypothesis such as those related to machine ...

  16. Introduction to Hypothesis Testing with Examples

    Likelihood ratio. In the likelihood ratio test, we reject the null hypothesis if the ratio is above a certain value i.e, reject the null hypothesis if L(X) > 𝜉, else accept it. 𝜉 is called the critical ratio.. So this is how we can draw a decision boundary: we separate the observations for which the likelihood ratio is greater than the critical ratio from the observations for which it ...

  17. What Is Hypothesis Testing? Types and Python Code Example

    In this article, we discussed the importance of hypothesis testing. We highlighted how science has advanced human knowledge and civilization through formulating and testing hypotheses. We discussed Type I and Type II errors in hypothesis testing and how they underscore the importance of careful consideration and analysis in scientific inquiry.

  18. Hypothesis Testing

    Hypothesis testing is a tool for making statistical inferences about the population data. It is an analysis tool that tests assumptions and determines how likely something is within a given standard of accuracy. Hypothesis testing provides a way to verify whether the results of an experiment are valid. A null hypothesis and an alternative ...

  19. Hypothesis Testing: Understanding the Basics, Types, and Importance

    Hypothesis testing is a statistical method used to determine whether a hypothesis about a population parameter is true or not. This technique helps researchers and decision-makers make informed decisions based on evidence rather than guesses. Hypothesis testing is an essential tool in scientific research, social sciences, and business analysis.

  20. Type I and Type II Error

    Type I and Type II errors are subjected to the result of the null hypothesis. In case of type I or type-1 error, the null hypothesis is rejected though it is true whereas type II or type-2 error, the null hypothesis is not rejected even when the alternative hypothesis is true. ... (where a real hit was rejected by the test and is observed as a ...

  21. What is Hypothesis Testing in Statistics? Types and Examples

    Hypothesis testing is a statistical method used to determine if there is enough evidence in a sample data to draw conclusions about a population. It involves formulating two competing hypotheses, the null hypothesis (H0) and the alternative hypothesis (Ha), and then collecting data to assess the evidence.

  22. Understanding Hypothesis Testing

    In hypothesis testing, Type I and Type II errors are two possible errors that researchers can make when drawing conclusions about a population based on a sample of data. These errors are associated with the decisions made regarding the null hypothesis and the alternative hypothesis. ... Real life Hypothesis Testing example. Let's examine ...

  23. Types of Errors in Hypothesis Testing Notes for Competitive Exams

    Researchers need to carefully select the significance level, consider the consequences of each type of error, and design experiments that minimize the risk of both Type 1 and Type 2 errors in hypothesis testing based on the goals and requirements of the study.

  24. Tests of Significance: Explanation and Examples

    Step 2: Create a neutral comparison to test against your hypothesis. Step 3: Decide on the level of certainty you need for your results, which affects the type of sign language translators and communication methods you'll use. Step 4: Choose the appropriate statistical test to analyze your data accurately.

  25. Lab 7

    Here, we are wanting to check the hypothesis H0: p0 = 0.7 H 0: p 0 = 0.7. To do this, we will use binom.test () with the arguments: x - the number of "successes" we observed. n - the total number of observations. p - our hypothesized proportion. The function will print out a litany of useful information.