[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q45-Q48

EXAMTOPIC DUMPS Q45-Q48 문제, 관련 내용을 정리합니다. (문제에 대한 답은 개인적인 학습내용과 discussion 기반해 작성한 것으로, 공식사이트에서 제안하는 답과 상이할 수 있습니다.

Q 48.

You started working on a classification problem with time series data and achieved an area under the receiver operating characteristic curve (AUC ROC) value of 99% for training data after just a few experiments. You haven't explored using any sophisticated algorithms or spent any time on hyperparameter tuning. What should your next step be to identify and fix the problem?

Time-Series data split

Classification on TS data, 99% Too high accuracy on training data after a few experiments : NEXT steps before Algorithms Search, Hyp tuning

❌ A. Address the model ~~overfitting~~ by using a less complex algorithm.

⭕ B. Address data leakage by applying nested cross-validation during model training.

Nested CV ; CV method suggested for TS data

❌ C. Address data leakage by ~~removing features highly correlated with the target value.~~

❌ D. Address the model ~~overfitting~~ by tuning the hyperparameters to reduce the AUC ROC value.

overfitting ; usually detected by the big difference between train & val error

Data Leakage & Training-Serving Skew

Prepare your data - Prevent data leakage & training-serving skew

To prevent Data Leakeage & Training-Serving Skew_

Before using any data, make sure you know what the data means and whether or not you should use it as a feature

Check the correlation in the Train tab. High correlations should be flagged for review.

Training-serving skew: make sure you only provide input features to the model that are available in the exact same form at serving time.

Data Leakage : When you use input features during training that "leak" information about the target that you are trying to predict which is unavailable when the model is actually served.
- This can be detected when a feature that is highly correlated with the target column is included as one of the input features.
- EX ) Model to predict whether a customer will sign up for a subscription in the next month and one of the input features is a future subscription payment from that customer. This can lead to strong model performance during testing, but not when deployed in production, since future subscription payment information isn't available at serving time.
Training-serving skew : When input features used during training time are different from the ones provided to the model at serving time, causing poor model quality in production.
- EX 1) Building a model to predict hourly temperatures but training with data that only contains weekly temperatures.
- EX 2) Always providing a student's grades in the training data when predicting student dropout, but not providing this information at serving time.

저작자표시 비영리 변경금지

'Certificate - DS > Machine learning engineer' 카테고리의 다른 글

[PMLE CERTIFICATE - EXAMTOPIC DUMPS Q53-Q56 (0)	2021.12.10
[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q49-Q52 (0)	2021.12.10
[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q41-Q44 (0)	2021.12.10
[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q37-Q40 (0)	2021.12.10
[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q33-Q36 (0)	2021.12.10

JINSTORY

[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q45-Q48

Q 48.

Time-Series data split

Data Leakage & Training-Serving Skew

To prevent Data Leakeage & Training-Serving Skew_

'Certificate - DS > Machine learning engineer' 카테고리의 다른 글

티스토리툴바