Linear Models(2) - Multiple Regression

2021. 3. 7. 15:39[AI]/Machine Learning

Learned Stuff

Key Points

  • split train data / test data

 

  • multiple regression

 

  • evaluation metrics
    • $MSE$ (Mean Squared Error)
    • $MAE$ (Mean Absolute Error)
    • $RMSE$ (Root Mean Squared Error)
    • $R^2$ (Coefficient fo Determination)

 

  • bias / variance tradeoff

New Stuff

[Split Train Data / Test Data]

code

# df 라는 DataFrame이 있다고 가정

train = df.sample(frac=0.75,random_state=1) # df의 75%를 train data로 보겠다는 의미
test = df.drop(train.index) # test data는 df에서 train data를 뺀 것

 

[Multiple Regression]

code

# train_data, test_data 라는 dataframe이 있다고 가정 
# train_data : 학습할 데이터
# test_data : 예측용 데이터

from sklearn.linear_model import LinearRegression 

# 데이터 구분하기 (X_train, X_test, y_train, y_test)
model = LinearRegression() 

features = ['a','b'] # 'a','b' feature 라고 가정
target = ['c'] # 'c' target 이라고 가정 

X_train = train_data_df[features]
X_test = test_data_df[features]

y_train = train_data_df[target]
y_test = test_data_df[target]

# 모델 학습시키기 
model.fit(X_train, y_train) 

# 모델의 slope & y_intercept 구하기 
model_slope = model.coef_ # array 형태로 반환 / slope 2 개 반환
model_y_intercept = model.intercept_ # array 형태로 반환 

# 학습한 모델을 바탕으로 예측하기 
y_pred = model.predict(X_test) # array 형태

 

[Evaluation Metrics]

  • $y$ : 실제 label 값들
  • $\bar{y}$ : predicted label 값들
  • $\hat{y}$ : 실제 label 값들의 평균

 

1. $MSE$ (Mean Squared Error)

$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{i} - \hat{y_{i}})^{2}$

 

2. $MAE$ (Mean Absolute Error)

$MAE = \frac{1}{n}\sum_{i=1}^{n}\left | y_{i} - \hat{y_{i}} \right |$

 

3. $RMSE$ (Root Mean Squared Error)

$RMSE = \sqrt{MSE}$

 

4. $R^2$ (Coefficient fo Determination)

$R^{2} = \frac{\sum_{i=1}^{n}(\hat{y_{i}} - \bar{y_{i}})^{2}}{\sum_{i=1}^{n}(y_{i} - \bar{y_{i}})^{2}}$

 

code

# y : 실제 label 값들 (array 형태)
# y_pred : 예측 label 값들 (array 형태)

from sklearn.metrics import  mean_squared_error, mean_absolute_error, r2_score

mse = mean_squared_error(y, y_pred) # mse (float)
mae = mean_absolute_error(y, y_pred) # mae (float)
rmse = mse ** 0.5 # rmse (float)
r2 = r2_score(y, y_pred) # r_squared (float)

 

[Bias / Variance Tradeoff]

  train data test data
high bias overfitting low variance
low bias underfitting high variance
728x90

'[AI] > Machine Learning' 카테고리의 다른 글

[Machine Learning] - Mindmap  (0) 2021.03.07
Linear Models(1) - Simple Regression  (0) 2021.03.07
Linear Models(3) - Ridge Regression  (0) 2021.03.07
Linear Models(4) - Logistic Regression  (0) 2021.03.07
Tree Based Model(1) - Decision Trees  (0) 2021.03.07