[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q17-Q20

EXAMTOPIC DUMPS Q17-Q20 ; Cloud Storage & DLP with PII, AutoML Data Split, CI/CD using Kubeflow pipeline, AI PLATFORM TRAINING configuration - scale tier

Q 17.

You are building a real-time prediction engine that streams files which may contain Personally Identifiable Information (PII) to Google Cloud. You want to use the Cloud Data Loss Prevention (DLP) API to scan the files. How should you ensure that the PII is not accessible by unauthorized individuals?

Cloud Storage & DLP with PII

❌ A. Stream all files to Google Cloud, and then write the data to BigQuery. Periodically conduct a bulk scan of the table using the DLP API.

❌ B. Stream all files to Google Cloud, and write batches of the data to BigQuery. While the data is being written to BigQuery, conduct a bulk scan of the data using the DLP API.

❌ C. Create two buckets of data: Sensitive and Non-sensitive. Write all data to the Non-sensitive bucket. Periodically conduct a bulk scan of that bucket using the DLP API, and move the sensitive data to the Sensitive bucket.
→ To ensure that the PII is not accessible by unauthorized individuals, _REQUIRE Quarantine bucket, Sensitive bucket : Limited access policy.
→ WRITING ALL THE DATA to Non-sensitive bucket - ANYONE CAN HAVE ACCESS to it : Risky

⭕ D. Create three buckets of data: Quarantine, Sensitive, and Non-sensitive. Write all data to the Quarantine bucket. Periodically conduct a bulk scan of that bucket using the DLP API, and move the data to either the Sensitive or Non-Sensitive bucket.

Cloud DLP API - Data loss prevention for data goverance

Data classification and management - Data loss prevention
`Cloud DLP` for a powerful data inspection, classification, and de-identification platform : SCAN DATA & CREATES DATA CATALOG TAGS to identify sensitive data.
✔ Used on existing BigQuery tables, Cloud Storage buckets, or on data streams.
✔ Over one hundred predefined* detectors to identify patterns, formats, and checksums.*
✔ Can create custom detectors using a dictionary or a regular expression or can add 'Hotword' rules to increase the accuracy of findings and set exclusion rules to reduce the number of false positives.
✔ Provides a set of tools to de-identify your data, including masking, tokenization, pseudonymization, date shifting, and more.
Using Cloud DLP leads to better data governance by helping you to classify your data and give the right access to the right people.
Ingest data into a quarantine bucket ( 1 of the 3 BUCKETS)
→ Write all data to the `[QUARANTINE_BUCKET]`
Run `Cloud DLP` to identify PII. (Done by scanning the entire dataset or by sampling the data.) : `DLP API` called from the transform steps of pipeline or from stand alone scripts `Cloud Functions`.
→ Periodically conduct a bulk scan of that bucket using the `DLP API`
Move the data to the warehouse.
→ either `[SENSITIVE_DATA_BUCKET]` or `[NON_SENSITIVE_DATA_BUCKET]`

Q18.

You work for a large hotel chain and have been asked to assist the marketing team in gathering predictions for a targeted marketing strategy. You need to make predictions about user lifetime value (LTV) over the next 20 days so that marketing can be adjusted accordingly. The customer dataset is in BigQuery, and you are preparing the tabular data for training with AutoML Tables. This data has a time signal that is spread across multiple columns. How should you ensure that AutoML fits the best model to your data?

AutoML Data Split

❌ A. ~~Manually combine all columns that contain a time signal into an array~~. Allow AutoML to interpret this array appropriately. Choose an automatic data split across the training, validation, and testing sets.
→ NOT ARRAY, DATA TYPE SHOULD BE TIMESTAMP

❌ B. Submit the data for training without performing any manual transformations. Allow AutoML to handle the appropriate transformations. Choose an automatic data split across the training, validation, and testing sets.
→ TO USE TIME COLUMN APPROACH FOR SPLITING DATA, SHOULD SELECT APPROPRIATE COLUMNS FOR TIME & ENSURE THAT APPROPRIATE DATATYPE & ENOUGH UNIQUE VALUES

C. Submit the data for training ~~without performing any manual transformations~~, and indicate an appropriate column as the Time column. Allow AutoML to split your data based on the time signal provided, and reserve the more recent data for the validation and testing sets.

⭕ D. Submit the data for training without performing any manual transformations. Use the columns that have a time signal to manually split your data. Ensure that the data in your validation set is from 30 days after the data in your training set and that the data in your testing sets from 30 days after your validation set.
→ PROVIDING A TIME SIGNAL : If the time information is not contained in a single column, you can use a manual data split to use the most recent data as the test data, and the earliest data as the training data.

Q19.

You have written unit tests for a Kubeflow Pipeline that require custom libraries. You want to automate the execution of unit tests with each new push to your development branch in Cloud Source Repositories. What should you do?

CI/CD using Kubeflow pipeline

❌ A. Write a script that sequentially performs the push to your development branch and executes the unit tests on Cloud Run.
→ Cloud Run : Serverless for containerized applications

⭕ B. Using Cloud Build, set an automated trigger to execute the unit tests when changes are pushed to your development branch.
→ Cloud Build : tool for CI/CD, GCP recommends to use Cloud Build when building KubeFlow Pipelines

❌ C. Set up a Cloud Logging sink to a Pub/Sub topic that captures interactions with Cloud Source Repositories. Configure a Pub/Sub trigger for Cloud Run, and execute the unit tests on Cloud Run.
→ Cloud Logging : Centralized logging

❌ D. Set up a Cloud Logging sink to a Pub/Sub topic that captures interactions with Cloud Source Repositories. Execute the unit tests using a Cloud Function that is triggered when messages are sent to the Pub/Sub topic.

Architecture of CI/CD for ML pipeline using Kubeflow Pipelines
At the heart of this architecture is Cloud Build, infrastructure.
`Cloud Build` : a service that executes your builds on Google Cloud Platform infrastructure.
✔ Can import source code from `Cloud Storage`, `Cloud Source Repositories`, `GitHub`, or `Bitbucket`, execute a build to your specifications, and produce artifacts such as Docker containers or Java archives.
✔ Executes your build as a series of build steps, where each build step is run in a Docker container. : Defined in a build configuration file - `cloudbuild.yaml`
→ Each build step is run in a Docker container irrespective of the environment. (for task : use the supported build steps provided by `Cloud Build` or write your own build steps.)
✔ The Cloud Build process (CI/CD for your ML system) can be executed either manually or through automated build triggers.

Q20.

You are training an LSTM-based model on AI Platform to summarize text using the following job submission script: gcloud ai-platform jobs submit training $JOB_NAME \
--package-path $TRAINER_PACKAGE_PATH \
--module-name $MAIN_TRAINER_MODULE \
--job-dir $JOB_DIR \
--region $REGION \
--scale-tier basic \
-- \
--epochs 20 \
--batch_size=32 \
--learning_rate=0.001 \
You want to ensure that training time is minimized without significantly compromising the accuracy of your model. What should you do?

AI PLATFORM TRAINING configuration

❌ A. Modify the epochs parameter.

⭕ B. Modify the **scale-tier parameter.**
→ _Epochs, Batch size, learning rate all are hyperparameters that might impact model accuracy_.
→ Changing the scale tier does not impact performance–only speeds up training time.

❌ C. Modify the batch size parameter.

❌ D. Modify the learning rate parameter.

`scale tiers` = a set of cluster specifications
(1) When running a training job on AI Platform Training you must specify the number and types of machines.
(2) To make the process easier, pick `scale tires` from a set of predefined cluster specifications OR choose a custom tier and specify the machine types yourself.
Options for configuration of the scale tiers
Using GPUs for training models in the cloud - Requesting GPU-enabled machines

저작자표시 비영리 변경금지 (새창열림)

'Certificate - DS > Machine learning engineer' 카테고리의 다른 글

[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q29-Q32 (0)	2021.12.10
[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q21-Q24 (0)	2021.12.10
[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q13-Q16 (0)	2021.12.10
[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q9-Q12 (0)	2021.12.09
[EXAMTOPIC] Data Prep - Imbalanced data (0)	2021.12.09

JINSTORY

[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q17-Q20

Q 17.

Cloud Storage & DLP with PII

Cloud DLP API - Data loss prevention for data goverance

Q18.

AutoML Data Split

Q19.

CI/CD using Kubeflow pipeline

Q20.

AI PLATFORM TRAINING configuration

'Certificate - DS > Machine learning engineer' 카테고리의 다른 글

티스토리툴바