Science Focus (Issue 31)

By Helen Wong 王思齊 Does project-based learning improve students’ academic performance? Which candidate is more likely to win in an election? Is a new drug effective in treating a certain disease? While these scenarios may seem unrelated, they all share a common thread: We need to collect information from samples or to make an inference about a population, whether it’s all students, all voters, or all patients. This process is formally known as statistical inference. Assuming the sampling process is random and unbiased, the quantities calculated from these samples, such as sample mean or sample variance, will vary from one sample to another. Thus, sample means obtained from different rounds of sampling follow a specific distribution. Without going into the rigorous mathematical proof, the central limit theorem tells us that the sampling distribution of the sample mean will be approximately normally distributed when the sample size is sufficiently large, even if the underlying population is not normally distributed. But what happens when our sample size is small and we have no idea about the population standard deviation? Today, we take for granted that in such cases, given that the underlying population is a normal distribution, the sampling distribution of sample mean follows the Student’s t-distribution, thanks to a brewer named William Sealy Gosset (1876–1937) [1–3]. Born in Canterbury, England, Gosset was educated at the University of Oxford, where he earned a firstclass degree in chemistry in 1899. Around this time, the Guinness brewery in Dublin recognized the need for rigorous quality control in beer production and began recruiting graduates from Oxford and Cambridge for this purpose. Gosset was among those selected. As an apprentice brewer, Gosset needed to evaluate how the quality of barley and hops might affect that of the beer. The quality of agricultural products is known to vary throughout a year, depending on factors such as climate and soil conditions. Therefore, Gosset’s goal was to maintain a consistently high quality of beer while also ensuring cost-effectiveness. This necessitated relying on small samples to draw conclusions that could inform the large-scale brewing process. By the early 20th century, the central limit theorem had been established, and many were familiar with using the normal distribution for statistical inference with large sample sizes. Gosset conducted experiments by sampling acidity values from beers produced under various conditions, such as using different batches of malted barley, to determine whether there were significant differences in mean acidity between these groups. Through his calculations, Gosset discovered that when the sample size was small, the sampling t分佈 啤酒、 學生 與 Beer, Student, t-Distribution and

RkJQdWJsaXNoZXIy NDk5Njg=