Statistic(2) - Hypothesis Test(2)
2021. 1. 30. 10:25ㆍ[AI]/Data Science Fundamentals
<Learned Stuff>
Key Points
- T-Test Condition
- 독립성 : paired?(cat or cat v.s. cat or dog)
- 등분산성 : similar variance?
- 정규성 : normally distributed?
- Non parametric methods (대표 : Chisquare)
- Categorical data
- 극단적 outlier 있는 경우
- ANOVA test
- t-test in many groups
<New Stuff>
[Chisquare Test]
[One sample Chisquare test]
# H0 : distribution is similar (고루게 분포해 있다)
# H1 : distribution is not similar (고루게 분포해 있지 않다)
# chi_square_value = sum((obs-exp)^2/(exp))
# obs1, obs2, obs3...
# exp1, exp2, exp3...
1 - stats.chi2.cdf(chi_square_value,df = n-1)
# ==> returns pvalue
# df : degree of freedom
chisquare(obs,axis=None)
# ==> returns chi_square_value & pvalue
# if pvalue < 0.05 ==> reject H0
[Two sample Chisquare test]
# H0 : variable is independent (연관이 없다)
# H1 : variable is not independent (연관이 없진 않다)
# exp(i,j) = (row(i)_tot)(col(j)_tot)/(tot_sum)
# chi_square_value = sum((obs-exp)^2/exp)
chi2_contingency(obs)
# ==> returns chi_square_value & pvalue
# df = (# of row-1)(# of col-1)
# if pvalue < 0.05 ==> reject H0
[ANOVA]
-
그룹이 많은 상태에서 t-test 쓰면 alpha error 있음 ==> Anova 써야함
-
variance를 활용한 평균 비교
# H0 : N개 그룹에 대해서 평균의 차이가 없다
# H1 : N개 그룹에 대해서 평균의 차이가 있다 (적어도 한 그룹의 mean은 다를것이다)
# F = (between variance) / (within variance)
group = [Aarr,Barr,Carr,Darr]
F,pval = stats.f_oneway(gorup)
# if pval < 0.05 ==> reject H0
728x90
'[AI] > Data Science Fundamentals' 카테고리의 다른 글
Data Preprocess & EDA(4) - Data Visualization (0) | 2021.01.30 |
---|---|
Statistic(1) - Hypothesis Test(1) (0) | 2021.01.30 |
Statistic(3) - Confidence Level (0) | 2021.01.30 |
Statistic(4) - Bayesian (0) | 2021.01.30 |
[Statistic (Summary)] T-test & $\chi^2$-test (0) | 2021.01.30 |