Ebook Statistics - The art and science of learning from data (4th edition): Part 2. (BQ) Part 2 book Statistics - The art and science of learning from data has contents: Statistical inference - confidence intervals; comparing two groups, multiple regression, nonparametric statistics, comparing groups - analysis of variance methods,...and other contents.. Cũng như các giáo án bài giảng khác được thành viên giới thiệu hoặc do tìm kiếm lại và chia sẽ lại cho các bạn với mục đích nghiên cứu , chúng tôi không thu phí từ người dùng ,nếu phát hiện nội dung phi phạm bản quyền hoặc vi phạm pháp luật xin thông báo cho website ,Ngoài thư viện tài liệu này, bạn có thể tải đồ án thạc sĩ tiến sĩ phục vụ học tập Một số tài liệu tải về sai font không hiển thị đúng, có thể máy tính bạn không hỗ trợ font củ, bạn download các font .vntime củ về cài sẽ xem được.
Part Inferential Statistics 3
Statistical Inference: Confidence Intervals
Statistical Inference: Significance Tests About Hypotheses
Chapter 10 Comparing Two Groups
The tools from previous chapters, in particular the idea of the normal
distribution as a sampling distribution, are used in this chapter to turn information from a sample into an
Statistical Inference: Confidence Intervals
interval of plausible values for an unknown population proportion or population mean.
8.1 Point and Interval Estimates of Population Parameters
8.2 Constructing a Confidence
Interval to Estimate a Population Proportion
8.3 Constructing a Confidence Interval to Estimate a Population Mean
8.4 Choosing the Sample Size for a Study
8.5 Using Computers to Make New Estimation Methods Possible
Analyzing Data from the General Social Survey
Picture the Scenario
For more than 30 years, the National Opinion Research Center at the University of Chicago (www.norc. uchicago.edu) has conducted an opin-ion survey called the General Social Survey (GSS). The survey randomly samples about 2000 adult Americans. In a 90-minute in-person interview, the interviewer asks a long list of ques-tions about opinions and behavior for a wide variety of issues. Other nations have similar surveys. For instance, ev-ery five years Statistics Canada con-ducts its own General Social Survey. Eurobarometer regularly samples about 1000 people in each country in the European Union and a host of other polling institutions, such as Gallup (gallup.com),the Pew Research Center (pewresearch.org), or Nielsen (nielsen.com), feature results from all sorts of polls on their websites.
Analyzing such data helps re-searchers learn about how people think and behave at a given time and track opinions over time.Activity 1 in Chapter 1 showed how to access the GSS data at sda.berkeley.edu/GSS.
Questions to Explore
Based on data from a recent GSS, how can you make an inference about
j The proportion of Americans who are willing to pay higher prices to protect the environment?
j The proportion of Americans who agree that it is better for everyone involved if the man is the achiever outside the home and the woman takes care of the home and family?
j The mean number of hours that Americans watch TV per day?
We will analyze data from the GSS in examples and exercises through-out this chapter. For instance, in Example 3 we’ll see how to estimate the proportion of Americans who are willing to pay higher prices to protect the environment.
We’ll answer the other two ques-tions in Examples 2 and 5, and in ex-ercises we’ll explore opinions about issues such as whether it should or should not be the government’s re-sponsibility to reduce income dif-ferences between the rich and poor, whether a preschool child is likely to suffer if his or her mother works, and how politically conservative or liberal Americans are.
Section 8.1 Point and Interval Estimates of Population Parameters 361
A statistic describes a sample. Examples are the sample mean x and standard deviation s.
A parameter describes a population. Examples are the population mean m and standard deviation s. b
A sampling distribution specifies the possible values a statistic can take and their probabilities. b
We use the proportion to summarize the relative frequency of observations in a category for a categorical variable.
The proportion equals the number in the category divided by the sample size. We use the mean as one way to summarize the center of the observations for a quantitative variable. b
A sample of about 2000 people (as the GSS takes) is relatively small. For instance, in the United States, a survey of this size gathers data for less than 1 of every 100,000 people. How can we possibly make reliable predictions about the entire population with so few people?
We now have the tools to see how this is done. We’re ready to learn about a powerful use of statistics:statistical inference about population parameters using sample data. Inference methods help us to predict how close a sample statistic falls to the population parameter.We can make decisions and predictions about populations even if we have data for relatively few subjects from that population. The previous chapter illustrated that it’s often possible to predict the winner of an election in which millions of people voted, knowing only how a couple of thousand people voted.
For statistical inference methods, you may wonder what the relevance of learn-ing about the role of randomization is in gathering data (Chapter 4), concepts of probability (Chapter 5), and the normal distribution (Chapter 6) and its use as a sampling distribution (Chapter 7).They’re important for two primary reasons:
j Statistical inference methods use probability calculations that assume that the data were gathered with a random sample or a randomized experiment.
j The probability calculations refer to a sampling distribution of a statistic, which is often approximately a normal distribution.
In other words, statistical inference uses sampling distributions of statistics calculated from data gathered using randomization, and those sampling distribu-tions are often approximately normal.
There are two types of statistical inference methods—estimation of popula-tion parameters and testing hypotheses about the parameter values.This chapter discusses the first type, estimating population parameters. We’ll learn how to estimate population proportions for categorical variables and population means for quantitative variables. For instance, a study dealing with how college students pay for their education might estimate the proportion of college students who work part time and the mean annual income for those who work.The most infor-mative estimation method constructs an interval of numbers, called a confidence interval, within which the unknown parameter value is believed to fall.
8.1 Point and Interval Estimates of Population Parameters
Population parameters have two types of estimates, a point estimate and an interval estimate.
Point Estimate and Interval Estimate
A point estimate is a single number that is our best guess for the parameter.
An interval estimate is an interval of numbers that is believed to contain the actual value of the parameter.
For example, one General Social Survey asked,“Do you believe in hell?” From the sample data, the point estimate for the proportion of adult Americans who would respond yes equals 0.73—more than 7 of 10.The adjective “point” in point estimate refers to using a single number or point as the parameter estimate.
An interval estimate, found with the method introduced in the next section, predicts that the proportion of adult Americans who believe in hell falls between 0.71 and 0.75. Figure 8.1 illustrates this idea.
www.downloadslide.com 362 Chapter 8 Statistical Inference: Confidence Intervals
Point estimate (single number)
0.7 0.73 0.8
Interval estimate (interval of numbers)
0. 0.71 0.75 0.8
m Figure 8.1 A Point Estimate Predicts a Parameter by a Single Number. An interval estimate is an interval of numbers that are believable values for the parameter. Question Why is a point estimate alone not sufficiently informative?
A point estimate by itself is not sufficient because it doesn’t tell us how close the estimate is likely to be to the parameter.An interval estimate is more useful. It tells us that the point estimate of 0.73 falls within a margin of error of 0.02 of the actual population proportion. By incorporating a margin of error, the interval estimate helps us gauge the accuracy of the point estimate.
Point Estimation: Making a Best Guess for a Population Parameter
Once we’ve collected the data, how do we find a point estimate, representing our best guess for a parameter value? The answer is straightforward—we can use an appropriate sample statistic. For example, for a population mean m, the sample mean x is a point estimate of m. For the population proportion, the sample pro-portion is a point estimate.
Point estimates are the most common form of inference reported by the mass media. For example, the Gallup organization conducts a monthly survey to es-timate the U.S. president’s popularity, and the mass media report the results. In mid-July 2014, this survey reported that 42% of the American public approved of President Obama’s performance in office. This percentage was a point estimate rather than a parameter because Gallup used a sample of about 1500 people rather than the entire population. For simplicity, we’ll usually use the term es-timate in place of point estimate when there is no risk of confusing it with an interval estimate.
Properties of Point Estimators For any particular parameter, there are several possible point estimates. Consider, for instance, estimating the parameter m from a normal distribution, which describes the center. With sample data from a normal distribution, two possible estimates of m are the sample mean but also the sample median because the distribution is symmetric. What makes one point estimator better than another? A good estimator of a parameter has two desirable properties:
Property 1: A good estimator has a sampling distribution that is centered at the parameter it tries to estimate. We define center in this case as the mean of that sampling distribution. An estimator with this property is said to be unbiased. From Section 7.2, we know that under random sampling the mean of the sam-pling distribution of the sample mean x equals the population mean m. So, the sample mean x is an unbiased estimator of m. Figure 8.2 recalls this result.
Similarly, from Section 7.1, we know that under random sampling the mean of the sampling distribution of the sample proportion equals the population proportion p. So, the sample proportion is an unbiased estimator of the popu-lation proportion p.
Section 8.1 Point and Interval Estimates of Population Parameters 363
From Chapter 7, the standard deviation of the sampling distribution of the statistic describes the variability in the possible values of the statistic for the given sample size. It also tells us how much the statistic would vary from sample to sample of
that size. b
Sampling distribution of x (describes where the sample mean is likely to fall when sampling from this population)
– µ (Population mean)
m Figure 8.2 The Sample Mean x Is an Unbiased Estimator. Its sampling distribution is centered at the parameter it estimates—the population mean m. Question When is the sampling distribution bell shaped, as it is in this figure?
Property 2: A good estimator has a small standard deviation compared to other estimators.This tells us the estimator tends to fall closer than other esti-mates to the parameter. For example, for estimating the center m of a normal distribution, both the sample mean and the sample median are unbiased, but the sample mean has a smaller standard deviation (see margin figure and Exercise 8.123).Therefore, the sample mean is the better estimator for m.
The sampling distribution of the sample
mean has the smaller standard deviation.
Interval Estimation: Constructing an Interval That Contains the Parameter (We Hope!)
A recent survey1 on the starting salary of new college graduates estimated the mean starting salary equal to $45,500. Does $45,500 seem plausible to you? Too high? Too low? Any individual point estimate may or may not be close to the parameter it estimates. For the estimate to be useful, we need to know how close it is likely to fall to the actual parameter value. Is the estimate of $45,500 likely to be within $1000 of the actual population mean? Within $5000? Within $10,000? Inference about a parameter should provide not only a point estimate but should also indicate its likely precision.
An interval estimate indicates precision by giving an interval of numbers around the point estimate. The interval is made up of numbers that are the most believable values for the unknown parameter, based on the data observed. For instance, perhaps a survey of new college graduates predicts that the mean starting salary falls somewhere between $43,500 and $47,500, that is, within a margin of error of $2000 of the point estimate of $45,500. An interval estimate is designed to contain the parameter with some chosen probability, such as 0.95. Because interval estimates contain the parameter with a certain degree of confi-dence, they are referred to as confidence intervals.
A confidence interval is an interval containing the most believable values for a parameter. It is formed by a method that combines a point estimate with a margin of error. The probability that this method produces an interval that contains the parameter is called the confidence level. This is a number chosen to be close to 1, most commonly 0.95.
1By National Association of Colleges and Employers, www.naceweb.org.