Professional Machine Learning Engineer 샘플 문제 정리

[EXAMTOPIC] GCP ML ENGINEER 샘플 문항(Q1-11)을 정리합니다.

GCP PLME

구글 클라우드 플랫폼 머신러닝 엔지니어 자격증 Professional ML Engineer Certification 을 준비해볼지 고민하면서 샘플 문제를 풀어보고 난이도가 어느 정도인지 파악했다.

샘플 문제

Source : Professional Machine Learning Engineer Sample Questions

  1. 상기 링크 접속 구글폼에 간단한 개인정보 입력
  2. 샘플 문제 Q1-Q11를 풀고 제출
  3. view accuracy를 클릭하면 채점된 결과와 정답/오답에 대한 근거와 해당 파트를 학습할 수 있는 링크도 제공한다.

Cloudonair

Certification Prep: Machine Learning Certification

자격증관련 이야기, 머신러닝 엔지니어의 RnR, 샘플 문제(일부) 접근 방법 가이드 등의 내용을 포함하고 있다. 유익해서 시험준비할 생각이면 가볍게 보고 넘어가는 걸 추천한다.

Q 1.

You work for a textile manufacturer and have been asked to build a model to detect and classify fabric defects. You trained a machine learning model with high recall based on high resolution images taken at the end of the production line. You want quality control inspectors to gain trust in your model. Which technique should you use to understand the rationale of your classifier?

KEY : XAI for image model
  1. to understand the rationale of model, to gain trust in your model ; XAI(Explainable AI) 의 목적
  2. model to detect and classify fabric defects
    ⇒ 2 Feature Attribution methods for image model ; Integrated Gradients(IG), XRAI
  • ❌ A. Use K-fold cross validation to understand how the model performs on different test datasets.

  • B. Use the Integrated Gradients method to efficiently compute feature attributions for each predicted image.

  • ❌ C. Use PCA (Principal Component Analysis) to reduce the original feature set to a smaller set of easily understood features.

  • ❌ D. Use k-means clustering to group similar images together, and calculate the Davies-Bouldin index to evaluate the separation between clusters.

Q 2.

You need to write a generic test to verify whether Dense Neural Network(DNN) models automatically released by your team have a sufficient number of parameters to learn the task for which they were built. What should you do?

KEY : To Figure out if you have enough parameters or enough layers in your DNN model
  • ❌ A. Train the model for a few iterations, and check for NaN values.

    • associated with bad input values
      → not about the model but corrupt input data
  • 🚩 B. Train the model for a few iterations, and verify that the loss is constant.

    • The loss is constant when the model converges
      → It's not enough for explaining whether the model has enough parameters or not.
  • ❌ C. Train a simple linear model, and determine if the DNN model outperforms it.

    • Comparing your model with the simple model like linear model is the one way to figure out the complexity of the model. Typically Training the model until it starts to overfit ; Hyperparmeter Tuning approach
      → It doesn't tell your model have sufficient number of parameters.
  • D. Train the model with no regularization, and verify that the loss function is close to zero.

    • basically overfitting the model to the max extent possible and seeing if the last function goes to zero ; method called as USEFUL OVERFITTING
      → have sufficient params to completely learn whatever your input data has.

Q 7.

You are an ML engineer at a media company. You want to use machine learning to analyze video content, identify objects, and alert users if there is inappropriate content. Which Google Cloud products should you use to build this project?

KEY : ML pipleline & GCP product
  1. ml pipeline - data ingestion, problem framing (output, actions)
    • Identify objects & Identify whether it's inappropriate or not
  2. suitable GCP products
    • Video Intelligence API $vs.$ AutoML Video Intelligence
  • ❌ A. Pub/Sub, Cloud Function, Cloud Vision API

    • Cloud Vision API : analyzing the static img
  • ❌ B. Pub/Sub, Cloud IoT, Dataflow, Cloud Vision API, Cloud Logging

  • C. Pub/Sub, Cloud Function, Video Intelligence API, Cloud Logging

    • Pub/Sub : Robust streaming analytic pipeline for real-time data ingestion (no doubt!)
    • Video Intelligence API : analyzing video content
      • output : pre-defined label
    • If there's no predefined objects and the content idntification pre-built model is the most cost-efficient, effective for the baseline product and help fast prototyping.
    • Cloud Function : devlop a very lightweight compute solution & allow you to write simple single purpose functions to be able to create a output from API.
    • Cloud Logging : Response to the Q they want to detect then alert users because it integrates with the cloud monitoring to be able to set alerts based on log events which are the output from the video intelligence API.
  • ❌ D. Pub/Sub, Cloud Function, AutoML Video Intelligence, Cloud Logging

    • AutoML Video Intelligence : analyzing video content
      • output : custom label

Q 8.

You work for a large retailer. You want to use ML to forecast future sales leveraging 10 years of historical sales data. The historical data is stored in Cloud Storage in Avro format. You want to rapidly experiment with all the available data. How should you build and train your model for the sales forecast?

KEY : Time-series forecasting & Rapid experiment
  • Time-series forecasting using Tremendous amount of data in Avro Format in Cloud Storage
  • RAPIDLY build & Train
  • A. Load data into BigQuery and use the ARIMA model type on BigQuery ML.

    • load data into BigQuery
      1. from Cloud Storage CSV, JSON, Avro files, Datastore backups
      2. Insert individual records using streaming inserts.
      3. Dataflow can load data directly into BigQuery.
    • Bigquery supports ML model : _Linear regression, Logistic regression, KMeans, Tensorflow Model (import file from bucket), Matrix Factorization, XGBoost (classification or regression), AutoML Tables (classification or regression), Dense Neural Networks (DNN) (classification or regression), **ARIMA_**
  • ❌ B. Convert the data into CSV format and create a regression model on AutoML Tables.

    • Takes to much time to converting it to csv
    • But Creating a regression model on AutoML might have higher accuracy than choice A (ARIMA model in Bigquery)
      → The key is "Speed of development"
      →_ When the goal is to build the most accurate model, might be correct ans._
  • ❌ C. Convert the data into TFRecords and create an RNN model on TensorFlow on AI Platform Notebooks.

    • Takes a quite a bit of time
    • But Creating a regression model on AutoML might have higher accuracy than choice A (ARIMA model in Bigquery)
      → The key is "Speed of development"
      When the goal is to use as much of the historical data as possible, might be correct ans.
  • ❌ D. Convert and refactor the data into CSV format and use the built-in XGBoost algorithm on AI Platform Training.

    • Takes toooo much time
    • Also, XGBoost for Time-series forecasting is not a great ml model.

Q 9.

You need to build an object detection model for a small startup company to identify if and where the company’s logo appears in an image. You were given a large repository of images, some with logos and some without. These images are not yet labelled. You need to label these pictures, and then train and deploy the model. What should you do?

KEY : Data Label & Efficient Solution
  • What's the data label in the process
    • large images not labelled : manual labeling could be very expensive
  • The most Efficient way to train & deploy the model
    1. what type of models for object detection
    2. serving env
  • A. Use Google Cloud’s Data Labelling Service to label your data. Use AutoML Object Detection to train and deploy the model.

    • Google cloud Data Labelling Service to label data
      → AI platform use human labeler
      • is able to identify the labels with a pretty high accuracy
      • tackles both classification and the bounding box issues as highlighted in the Q
  • ❌ B. Use Vision API to detect and identify logos in pictures and use it as a label. Use AI Platform to build and train a convolutional neural network.

    • Vision API to detect identify the logo image & provide a pre-trained label
      → more likely the local of small startup company may not be contained in the pre-trained label
  • ❌ C. Create two folders: one where the logo appears and one where it doesn’t. Manually place images in each folder. Use AI Platform to build and train a convolutional neural network.

  • ❌ D. Create two folders: one where the logo appears and one where it doesn’t. Manually place images in each folder. Use AI Platform to build and train a real time object detection model.