Linear Models(2) - Multiple Regression
2021. 3. 7. 15:39ㆍ[AI]/Machine Learning
Learned Stuff
Key Points
- split train data / test data
- multiple regression
- evaluation metrics
- $MSE$ (Mean Squared Error)
- $MAE$ (Mean Absolute Error)
- $RMSE$ (Root Mean Squared Error)
- $R^2$ (Coefficient fo Determination)
- bias / variance tradeoff
New Stuff
[Split Train Data / Test Data]
code
# df 라는 DataFrame이 있다고 가정
train = df.sample(frac=0.75,random_state=1) # df의 75%를 train data로 보겠다는 의미
test = df.drop(train.index) # test data는 df에서 train data를 뺀 것
[Multiple Regression]
code
# train_data, test_data 라는 dataframe이 있다고 가정
# train_data : 학습할 데이터
# test_data : 예측용 데이터
from sklearn.linear_model import LinearRegression
# 데이터 구분하기 (X_train, X_test, y_train, y_test)
model = LinearRegression()
features = ['a','b'] # 'a','b' feature 라고 가정
target = ['c'] # 'c' target 이라고 가정
X_train = train_data_df[features]
X_test = test_data_df[features]
y_train = train_data_df[target]
y_test = test_data_df[target]
# 모델 학습시키기
model.fit(X_train, y_train)
# 모델의 slope & y_intercept 구하기
model_slope = model.coef_ # array 형태로 반환 / slope 2 개 반환
model_y_intercept = model.intercept_ # array 형태로 반환
# 학습한 모델을 바탕으로 예측하기
y_pred = model.predict(X_test) # array 형태
[Evaluation Metrics]
- $y$ : 실제 label 값들
- $\bar{y}$ : predicted label 값들
- $\hat{y}$ : 실제 label 값들의 평균
1. $MSE$ (Mean Squared Error)
$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{i} - \hat{y_{i}})^{2}$
2. $MAE$ (Mean Absolute Error)
$MAE = \frac{1}{n}\sum_{i=1}^{n}\left | y_{i} - \hat{y_{i}} \right |$
3. $RMSE$ (Root Mean Squared Error)
$RMSE = \sqrt{MSE}$
4. $R^2$ (Coefficient fo Determination)
$R^{2} = \frac{\sum_{i=1}^{n}(\hat{y_{i}} - \bar{y_{i}})^{2}}{\sum_{i=1}^{n}(y_{i} - \bar{y_{i}})^{2}}$
code
# y : 실제 label 값들 (array 형태)
# y_pred : 예측 label 값들 (array 형태)
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
mse = mean_squared_error(y, y_pred) # mse (float)
mae = mean_absolute_error(y, y_pred) # mae (float)
rmse = mse ** 0.5 # rmse (float)
r2 = r2_score(y, y_pred) # r_squared (float)
[Bias / Variance Tradeoff]
train data | test data | |
---|---|---|
high bias | overfitting | low variance |
low bias | underfitting | high variance |
728x90
'[AI] > Machine Learning' 카테고리의 다른 글
[Machine Learning] - Mindmap (0) | 2021.03.07 |
---|---|
Linear Models(1) - Simple Regression (0) | 2021.03.07 |
Linear Models(3) - Ridge Regression (0) | 2021.03.07 |
Linear Models(4) - Logistic Regression (0) | 2021.03.07 |
Tree Based Model(1) - Decision Trees (0) | 2021.03.07 |