[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q33-Q36

EXAMTOPIC DUMPS Q33-Q36 ; MINIMIZE computation time & manual intervention for data normalization in Bigquery, Experiment on the model performance of multiple Keras DNN model architectures in the same dashboard, Test Performance vs. Production Performance

Q 33.

You have a demand forecasting pipeline in production that uses Dataflow to preprocess raw data prior to model training and prediction. During preprocessing, you employ Z-score normalization on data stored in BigQuery and write it back to BigQuery. New training data is added every week. You want to make the process more efficient by minimizing computation time and manual intervention. What should you do?

MINIMIZE computation time & manual intervention for data normalization in Bigquery

❌ A. Normalize the data using Google Kubernetes Engine.
⭕ B. Translate the normalization algorithm into SQL for use with BigQuery.
❌ C. Use the normalizer_fn argument in TensorFlows Feature Column API.
❌ D. Normalize the data with Apache Spark using the Dataproc connector for BigQuery.

Q 34.

You need to design a customized deep neural network in Keras that will predict customer purchases based on their purchase history. You want to explore model performance using multiple model architectures, store training data, and be able to compare the evaluation metrics in the same dashboard. What should you do?

Experiment on the model performance of multiple Keras DNN model architectures in the same dashboard.

❌A. Create multiple models using AutoML Tables.

❌ B. Automate multiple training runs using Cloud Composer.

❌ C. Run multiple training jobs on AI Platform with similar job names.

⭕ D. Create an experiment in Kubeflow Pipelines to organize multiple runs.

Kubeflow Pipelines/AI Platform Pipelines

Kubeflow : End-to-end orchestration of machine learning pipelines.

Allows for easy experimentation and reusability.
Built on top of Kubernetes ⇒ scaling and portability.
❌ To use Kubeflow on GCP, ~~additional work is required to setup and manage the kubernetes cluster~~.
→ Google AI Platform Pipelines offers taking care of setting up a Google Kubernetes Engine, a bucket and installing Kubeflow Pipelines.
Visualize Results in the Pipelines UI | Kubeflow : A user interface (UI) for managing and tracking experiments, jobs, and runs An end-to-end open-source platform Built-in Notebook server service.

Kubeflow Metadata

Kubeflow pipeline running on premise. You need to record logs and data about deployed models for audit reasons : USE Kubeflow Metadata

allows tracking and managing metadata of machine learning workflows in Kubeflow**.

Metadata : information about runs, models, datasets and other artifacts.

TFX $vs.$ Kubeflow

TFX

runs on Apache Beam designed for machine learning deployment pipelines created with Tensorflow

Kubeflow

Runs on Kubernetes & Offers pipelines for many frameworks Tensorflow, PyTorch, XGBoost, ..

Other tools : notebooks and metadata management

Orchestration tool : Cloud Composer/Apache Airflow especially for ETL&ELT

Apache Airflow를 기반으로 구축
Fully-managed Service for orchestration
워크플로 생성, 예약, 모니터링, 관리
NOT suitable if low latency was required in between tasks
Need to specify how many workers you need for a given Composer environment
Building a batch orchestration workflow for data engineering. (ETL)

Orchestration tool : Cloud Scheduler especially for A Single Service

Orchestration tool : Workflows especially for Microservices

Serverless : no infrastructure to manage or scale
→ No need to specify how many workers you need
Designed for latency sensitive use cases : low latency or have a high execution count.

Q 36.

You are building a model to predict daily temperatures. You split the data randomly and then transformed the training and test datasets. Temperature data for model training is uploaded hourly. During testing, your model performed with 97% accuracy; however, after deploying to production, the model's accuracy dropped to 66%. How can you make your production model more accurate?

Test Performance vs. Production Performance

❌ A. ~~Normalize~~ the data for the training, and test datasets as two separate steps.
→ solution for overfitting

⭕ B. Split the training and test data based on time rather than a random split to avoid leakage.

❌ C. Add more data to your test set to ensure that you have a fair distribution and sample for testing.
→ solution for overfitting

❌ D. Apply data transformations before splitting, and cross-validate to make sure that the transformations are applied to both the training and test sets.
→ doesn't improve anything at all. Split and Transform is no different than Transform and Split if the transform logic is the same.

model predict daily temperatures : Time Series data & testing accuracy 97% vs. production accuracy66% → Data leakeage ?

Target leakage

Target leakage happens when your training data includes predictive information that is not available when you ask for a prediction. Target leakage can cause your model to show excellent evaluation metrics, but perform poorly on real data.

In time-series problems, it’s important to split them temporally so that you are not leaking future information that would not be available at test time into the trained model. If you are leaking it, you are artificially increasing your accuracy.
For example, suppose you want to know how much ice cream your store will sell tomorrow. You cannot include the target day's temperature in your training data, because you will not know the temperature (it hasn't happened yet). However, you could use the predicted temperature from the previous day, which could be included in the prediction request.
Tabular data preparation best practices > Avoid target leakage

저작자표시 비영리 변경금지

'Certificate - DS > Machine learning engineer' 카테고리의 다른 글

[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q41-Q44 (0)	2021.12.10
[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q37-Q40 (0)	2021.12.10
[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q25-Q28 (0)	2021.12.10
[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q29-Q32 (0)	2021.12.10
[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q21-Q24 (0)	2021.12.10

JINSTORY

[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q33-Q36

Q 33.

MINIMIZE computation time & manual intervention for data normalization in Bigquery

Q 34.

Experiment on the model performance of multiple Keras DNN model architectures in the same dashboard.

Kubeflow Pipelines/AI Platform Pipelines

`Kubeflow Metadata`

TFX $vs.$ Kubeflow

Orchestration tool : Cloud Composer/Apache Airflow especially for ETL&ELT

Orchestration tool : Cloud Scheduler especially for A Single Service

Orchestration tool : Workflows especially for Microservices

Q 36.

Test Performance vs. Production Performance

Target leakage

Target leakage happens when your training data includes predictive information that is not available when you ask for a prediction. Target leakage can cause your model to show excellent evaluation metrics, but perform poorly on real data.

'Certificate - DS > Machine learning engineer' 카테고리의 다른 글

티스토리툴바