


Chap 1?数据与统计资料 Data and Statistics

1.2?数据 Data

数据集 Data Sets

个体 Element?变量 Variables 观测值 Observations

测量尺度 Scales of Measurement:

名义尺度 Nominal Scale?顺序尺度 Ordinal Scale??区间尺度 Interval Scale

比率尺度 Ratio Scale

分类型数据 Categorical Data?数量型数据 Quantitative?Data

分类变量 Categorical Variable ?数量变量 Quantitative?Variable

截面数据 Cross-Sectional Data ?时间序列数据 Time Series Data

1.3?数据来源 Data Sources

现有来源 Existing Sources

观测性研究 observational (nonexperimental) studies

实验 experimental studies

时间需求 Time Requirement??获取成本 Cost of Acquisition

数据采集误差 Data Errors

1.4?描述统计 Descriptive Statistics (图形或数值形式汇总的统计方法被称为描述统计)

1.5?统计推断 Statistical Inference

普查 census?(搜集总体全部数据的调查过程)

抽样调查 sample?survey?(搜集样本数据的调查过程)

1.6 逻辑分析方法 Analytics

描述性分析 Descriptive analytics 预测性分析 Predictive analytics

规范性分析 Prescriptive analytics (产生一个最佳行动过程的分析技术集合)

1.7?大数据和数据挖掘 Big Data and Data Mining

1.9?统计实践的道德准则 Ethical Guidelines for Statistical Practic

Chap 2?描述统计学(一):表格法和图形法

Descriptive Statistics: Tabular and Graphical Displays

品质型数据 qualitative data

数量型数据 quantitative data

2.1 汇总分类型变量的数据 Summarizing Data for a Categorical Variable

频数分布 Frequency Distribution

相对频数分布??Relative?Frequency Distribution

相对频数(频率) Relative?Frequency

百分数频数分布 Percent Frequency Distribution

条形图?Bar?Chart 饼图 Pie?Chart

2.2?汇总数量型变量的数据 Summarizing Data for a Quantitative Variable

打点图 Dot Plot ?Histogram 直方图

偏度 ?Skewness

对称的 Symmetric

Moderately Right?Skewed?适度右偏 ?Highly?Skewed?Right 严重右偏

累积曲线 ogive??累积频数分布 Cumulative Frequency Distribution

累积相对频数分布 Cumulative Relative?Frequency?Distribution

累积百分数频数分布 Cumulative Percent Frequency Distribution

茎叶显示 Stem-and-Leaf Display??Leaf?Unit叶单位

2.3?用图形方法汇总两个变量的数据 Summarizing Data for Two Variables using Tables

交叉分组表 Crosstabulation

辛普森悖论 Simpson’s Paradox

2.4?用图形显示方法汇总两个变量的数据 Summarizing Data for Two Variables
Using Graphical Displays

散点图 Scatter?Diagram?趋势线 Trendline

复合条形图 Side-by-Side Bar Chart

结构条形图 Stacked Bar Chart


Data Visualization: ?Best Practices in Creating Effective Graphical Displays

数据仪表盘 Data?Dashboards

Chap 3?描述统计学(二):数值方法

Descriptive Statistics: ?Numerical Measures

样本统计量 sample statistics

总体参数 population parameters

点估计量 point?estimator


3.1?位置的度量 Measures of Location

平均数 mean

样本平均数 Sample?Mean??x拔

总体平均数 Population?Mean?μ

加权平均数 Weighted?Mean ?中位数 Median

几何平均数 Geometric Mean ?众数 Mode

百分位数 Percentile ?四分位数 Quartiles

第三四分位数 Third Quartile (75th?Percentile)

切尾均值 Trimmed?Mean(去掉两端的极端值后所计算的算术平均数

3.2?变异程度的度量 Measures of Variability

极差 Range 四分位数间距 Interquartile Range=IQR

方差 Variance ?标准差 Standard?Deviation

标准差系数 Coefficient of Variation

3.3 分布形态、相对位置的度量以及异常值的检测

Measures of Distribution Shape, Relative Location, and Detecting Outliers

分布形态:偏度 ?Distribution?Shape:Skewness




z-分数 z-Scores (标准化变量)

切比雪夫定理 Chebyshev’s Theorem:

与平均数的距离在z个标准差之内 的数据项 所占比例 至少为 1-?1/z2

经验法则 Empirical Rule ?异常值的检测 Detecting Outliers


五数概括法 Five-Number Summaries ?箱线图 Box Plots

3.5?两变量间关系的度量 Measures of Association Between Two Variables

协方差 Covariance

相关系数 Correlation Coefficient

Chap 7?抽样和抽样分布 Sampling and Sampling Distributions

抽样总体 sampled population

抽样框 frame

7.1?Electronics Associates?公司的抽样问题

参数 parameters (总体的数字特征)

7.2?抽样 Sampling

从有限总体中抽样 Sampling from a Finite Population

从无限总体中抽样Sampling from an Infinite Population

简单随机样本 A simple random sample

无放回抽样 sampling?without?replacement

有放回抽样 sampling?with?replacement

7.3?点估计 Point Estimation

样本统计量 sample?statistic

点估计量 point?estimator

点估计值 point?estimate


x拔??as Point Estimator of μ

S as?Point Estimator?of?6

P拔 as?Point?Estimator?of?p

7.4?抽样分布 Sampling Distribution

7.5?样本均值的抽样分布 Sampling Distribution of x拔

X拔的数学期望 ?Expected Value of? x拔??μ

无偏估计 unbiased?estimator

X拔的标准差 the standard deviation of x拔?6/Sqrt(n)

样本大小 the sample size n

总体大小 the?population size N

有限总体的修正系数 finite population correction factor

样本均值的标准差?the standard deviation of x拔

总体均值的标准误差standard error of the mean

总体服从正态分布 the population has a normal distribution

中心极限定理?Central Limit Theorem:

In selecting random samples of size n from a population, the sampling distribution of the sample mean

?can be approximated by a normal distribution as the sample size becomes large.

书上案例 EAI问题中的X拔抽样分布 Sampling Distribution of

?for SAT Scores


样本容量与x拔的抽样分布的关系 Relationship Between the Sample Size and the Sampling Distribution of


7.6?样本比率的抽样分布 Sampling Distribution of p拔

p拔的数学期望 ?Expected Value of p拔??p

p拔的标准差 the standard deviation of p拔??Sqrt(p(1-p)/n)


The sampling distribution of

?can be approximated by a normal distribution whenever the sample size is large enough to satisfy the two conditions:


7.7?点估计的性质 Properties of Point Estimators

无偏性 unbiased

有效性 efficiency

一致性 consistency

7.8?其他抽样方法 Other Sampling Methods

分层抽样 Stratified Random Sampling

整群抽样 Cluster Sampling

系统抽样 Systematic Sampling

方便抽样 Convenience Sampling

判断抽样 Judgment Sampling

概率抽样 probability?sampling?非概率抽样 non-probability?sampling

Chap 14 简单线性回归 Simple Linear Regression

应变量 dependent variable

自变量 independent variable

14.1 简单线性模型 Simple?Linear?Regression Model

简单线性回归模型 simple?linear?regression model

简单线性回归方程 simple linear regression equation

估计的简单线性回归方程 the?estimated simple linear regression equation

14.2 最小二乘法 Least Squares?Method

估计回归方程的截距b0 intercept for the estimated regression equation


14.3?判定系数 coefficient of determination R-sq

总的平方和 SST total sum of squares

误差平方和 SSE sum of squares due to error

回归平方和 SSR sum of squares due to regression

样本相关系数 sample?correlation?coefficient

14.4?模型的假定(误差项的假定) Assumptions About the Error Terme

01 e是一个均值为0的随机变量。




01 The error e is a random variable with mean of zero.

02 The variance of e, denoted by s 2, is the same for all values of the independent variable.

03 The values of e?are independent.

04 The?error e is a normally distributed random variable.

14.5?显著性检验 Testing for Significance

误差项e的方差的估计=an estimate of error e’s s2=MSE=the mean square error=SSE/n-p-1

T检验 T test

F检验 F test

假设 Hypotheses 原假设H0?备择假设Hα

检验统计量Test?Statistic ??拒绝规则(拒绝域)Rejection?Rule


01?Determine the hypotheses 确定原假设和备择假设

02 Specify?the level of significance 确定显著性水平

03 Select the test statistic 选择检验统计量

04 State the rejection rule 确定拒绝域

05 Compute the value of the test statistic 计算检验统计量的值

06 Determine whether to reject H0 确定是否拒绝原假设

β1的置信区间 Confidence Interval for β1

14.6?应用估计的回归方程进行估计和预测 Using the Estimated Regression Equation for Estimation and Prediction

置信区间 A confidence interval is an interval estimate of the mean value of y?for a given value of x.

预测区间 A prediction interval is used whenever we want to predict an individual value of y for a new observation corresponding to a given value of x.


14.7?计算机解法 Computer Solution

14.8?残差分析 Residual Analysis

→为了确定误差项的假定是否成立 to?determine whether Assumptions About the Error Terme are right

第i次观察的残差 residual?for?observation?i

关于x的残差图 residual plot against x

标准化残差图 standardized residual plot

正态概率图 normal?probability?plot

正态分数 normal standardized score

14,9 异常值和有影响的观测值 outliers?and?influential?observations

检测异常值 detecting outliers

杠杆率 leverage ratio

高杠杆率点 high?leverage?points

Chap 15?多元线性回归 Multiple?Regression


多元回归模型 Multiple Regression Model

多元回归方程 Multiple Regression Equation

估计的多元回归方程 Estimated Multiple Regression Equation


15.3?多元判定系数 Multiple Coefficient of Determination

多元判定系数 Multiple Coefficient of Determination

修正多元判定系数 Adjusted Multiple Coefficient of Determination

15.4 模型的假定 Model?Assumptions

多元回归中关于误差项e的假定 assumptions about the error term e in the multiple regression

15.5 显著性检验 Testing?for?Significance

F检验 the?test?for?overall?significance

T检验 a?test?for?individual?significance

方差来源 source 平方和 sum?of squares

自由度 degrees of freedom 均方 mean?square

多重共线性 multicollinearity→the correlation among the independent variables

15.6?利用估计的回归方程进行估计和预测 Using the Estimated Regression Equation for Estimation and Prediction

Confidence Interval?置信区间 Prediction?Interval 预测区间

Lower?Limit?下限 ?Upper?Limit 上限

15.7?定性自变量 Categorical?Independent?Variable

虚拟变量 Dummy?Variable

解释参数 Interpreting the Parameters

更复杂的定性变量 More Complex Categorical Variables

→If a categorical variable has k levels, k-1 dummy variables are required, with each dummy variable being coded as 0 or 1.

15.8?残差分析 Residual Analysis

第i次观测的残差的标准差 standard?deviation of?residual?i

检测异常值 detecting?outliers?


学生化删除残差删除第i次观测值得到的标准化残差)studentized deleted residuals

有影响的观测值 Influential Observations


库克距离 Cook’s Distance

库克距离测度 Using Cook’s Distance Measure




03用杠杆率检测有影响的观测值?hi>3(p+1)/n 杠杆率 leverage ratio

15.9?Logistic回归 Logistic?Regression

Logistic回归方程 ?Logistic?Regression Equation

Logistic回归中的E(y) 被解释为概率 interpretation of?E(y) as a probability in logistic regression

估计的Logistic回归方程?The estimated logistic regression equation


The odds in favor of an event occurring

机会比率 odds?ratio=?odds1/odds0


Odds2: 该组自变量的值都没有变化,y=1的机会比

对于Logistic回归方程的解释 Interpreting the Logistic Regression Equation?


对数机会比变换 Logit?Transformation

估计的对数机会比 estimated?logit

Chap 16?回归分析 建立模型 Regression Analysis Model Building

16.1?一般线性模型 General Linear Model

模拟曲线关系 Modeling Curvilinear Relationships

标准化残差图 Standardized Residual Plot

具有一个变量的二阶模型 second-order model with one predictor variable

交互作用 Interaction

涉及因变量的变换 Transformations Involving the Dependent Variable

对数变换 logarithmic transformation 倒数变换 reciprocal transformation

内蕴线性的非线性模型 Nonlinear Models That Are Intrinsically Linear

指数模型 The exponential model

16.2?确定什么时候增加或者删除变量 Determining When to Add or Delete Variables


16.3?大型问题的分析 Analysis of a Larger Problem

16.4?变量选择方法 Variable Selection Procedures

逐步回归 Stepwise Regression??前向选择 Forward Selection

后向消元 Backward Elimination ?最佳子集回归 Best-Subsets Regression

16.5?实验设计的多元回归方法 Multiple Regression Approach to Experimental Design

16.6?自相关性 和 杜宾-瓦特森 实验

自相关性 Autocorrelation ?杜宾-瓦特斯实验 Durbin-Watson Test

杜宾-瓦特森检验统计量 Durbin-Watson Test Statistic ?序列相关 serial?correlation

Chap 17?时间序列及预测 Time Series Analysis and Forecasting

因果预测方法 causal method?时间序列方法 time series method.

截面回归 Cross-sectional regression ?时间序列回归 ?Time-series regression


17.1 时间序列的模式 Time Series Patterns

水平模式 Horizontal Pattern?平稳时间序列 stationary?time?series

趋势模式 Trend?Pattern 季节模式 Seasonal?Pattern?

季节与趋势模式 Trend and Seasonal Pattern

循环模式 Cyclical Pattern

17.2?预测精度 Forecast Accuracy

预测误差 forecast error

平均绝对误差 Mean Absolute Error (MAE) ?预测误差的绝对值的平均数

均方误差 Mean Squared Error (MSE) ?预测误差平方的平均数(预测误差平方和/自由度)

平均绝对百分数误差?Mean Absolute Percentage Error (MAPE) 百分数预测误差的平均数

朴素预测法 Naive forecast


移动平均法 moving averages method??加权移动平均法 ?Weighted Moving Averages

指数平滑预测 ?Exponential Smoothing Forecast ?平滑常数α smoothing constant


趋势推测法 ?Trend Projection??线性趋势方程 Linear Trend Regression

非线性趋势回归 Nonlinear Trend Regression

二次趋势方程 quadratic trend equation ?指数趋势方程 exponential trend equation


没有趋势的季节性 Seasonality without Trend ?季节性和趋势 Seasonality and Trend

17.6?时间序列分解法 Time Series Decomposition

加法分解模型 Additive Decomposition Model

乘法分解模型 ?Multiplicative Decomposition Model


第一步 计算季节指数 Calculate seasonal index

第二步 消除季节影响的时间序列 Deseasonalized?Time Series

第三步 利用消除季节影响的时间序列确定趋势 Using the Deseasonalized Time Series to Identify Trend

第四步 季节调整 Seasonal Adjustments
