@mShuaiZhao
2018-01-27T08:25:46.000000Z
字数 2497
阅读 394
Coursera
2018.01
scatterplots
散点图
evaluating the relationship
direction
方向,正相关还是负相关
shape
曲线形状
stength
关系强弱
outliers
离群点
naive approach
直接讲这些离群点去除掉再处理数据
这些outliers可能会包含一些很有趣的信息
histogram
In a histogram, data are binned into intervals and height of the bars represent the number of cases that fall into each interval.
skewness
倾斜度?
modality
形态
unimodal
bimodal
uniform
multimodal
a distribution may be unimodal with prominent peak
dotplot
box plot
intensity map
mean
median
mode
range
range:
variance
deviation n. 偏差
roughly the average squared deviation from the mean
samlpe variance
population variance
为什么是除以呢?
Why do we square the differences?
standard deviation
variability vs. diversity
Remember, distributions where more observations are clustered around the center, are less variable, versus distributions where more observations are away from the center, are more variable.
diversity是指差异性,多样性;
variability指偏离中心的程度。
interquartile range
we define robust statistics as measures on which extreme observations have little effect
mean : consider the spread of the distribution
Data transformations are useful tricks for making certain types of data easier to model.
transformations
a transformation is a rescaling of the data using a function
when data are very strongly skewed, we sometimes transform them so they are easier to model
(natural) log transformation
other transformations
goals of transformations
to see the data structure differently
to reduce skew assist in modeling
to straighten a nonlinear relationship in a scatterplot
frequency table & bar plot
How are bar plots different than histograms?
pie chart? No
饼图?这是不科学的。信息量太少,当有很多level的时候不好用。
contingency table
relative frequencies
相对频率,就是相对总数的占比。
segmented bar plot
useful for visualizing conditional frequency distributions
compare relative frequencies to explore the relationship between the variables
relative frequency segmented bar plot
mosaicplot
side-by-side box plots
gender discrimination
data
two competing claims
类比于法庭判决
不能证明也也不能就肯定null hypothesis是正确的。
recap: hypothesis testing framework
simulation scheme
用纸牌游戏来作模拟实验。
non-face card 用来代表晋升了的人。 35张。
face card 代表没有晋升的人。13张。
summary