[EXAMTOPIC] AI Platform built-in algorithms

Introduction to built-in algorithms

With built-in algorithms on AI Platform Training, you can run training jobs on your data without writing any code for a training application.

How training with built-in algorithms works

  1. COMPARE THE AVAILABLE Built-in algorithms

    • If so, select the best fit.
    • If no built-in algorithms are suitable, you can create a training application to run on AI Platform Training.
  2. PREPROCESSING

    • Format your input data for training with the built-in algorithm.
    • MUST SUBMIT DATA AS A CSV FILE with its HEADER ROW REMOVED.
    • If applicable, follow any additional formatting requirements specific to the built-in algorithm you're using.
  3. CREATE A Cloud Storage bucket (if you don't have already) (AI Platform Training output을 저장할 버킷.)

  4. SELECT OPTIONS FOR CUSTOMIZING TRAINING JOB.

    • Selections to configure the overall training job → THEN the algorithm specifically → Optionally, make additional selections to configure hyperparameter tuning. : Training job에 대한 설정 후, algorithm에 맞는 설정하기.

    • (1) For Training job selections
      • a job name
      • the built-in algorithm to use
      • the machine(s) to use
      • the region where the job should run
      • the Cloud Storage bucket location where to store outputs.
    • (2) For the algorithm-specific selections
      • can enable AI Platform Training to perform automatic preprocessing on your dataset.
      • can also specify arguments (the learning rate, training steps, and batch size).
    • (3) For hyperparameter tuning (OPTIONAL)
      • select a goal metric (e.g, maximizing model's predictive accuracy or minimizing the training loss.)
      • Additionally, tune specific hyperparameters and _set ranges for their value_s.
  5. SUBMIT the training job → view logs to monitor its progress and status.

  6. Training job has completed successfully → can deploy trained model on AI Platform Training to set up a prediction server and get predictions on new data.

Limitations of AI Platform training job using built-in Algorithms

분산 훈련 ❌, multiple-GPU보다는 single-GPU

  1. Distributed training is not supported.
  2. Training jobs submitted through the Google Cloud Console use only legacy machine types.
    • can use Compute Engine machine types with training jobs submitted through gcloud or the Google API Client Library for Python.
    • machine types for training.
  3. GPUs are supported for some algorithms
  4. Multi-GPU machines do not yield greater speed with built-in algorithm training. If you're using GPUs, select machines with a single GPU.
  5. TPUs are not supported for tabular built-in algorithm training. ⇒ MUST CREATE A TRAINING APPLICATION.
  6. guides for each algorithm

Built-in algorithms help you train models for classification and regression.

1. Linear learner
  • For logistic regression, binary classification, and multiclass classification

  • Implemented based on a TensorFlow Estimator

  • one weight to each input feature → sums the weights to predict a numerical target value

  • easy to interpret : can compare the feature weights to determine which input features have significant impacts on your predictions

  • how large-scale linear models work.

2. Wide and deep
  • For recommender systems, search, and ranking problems

  • Implemented based on a TensorFlow Estimator

  • combines a linear model that learns and "memorizes" a wide range of rules with a deep neural network that "generalizes" the rules and applies them correctly to similar features in new, unseen data.

  • wide and deep learning

3. TabNet
  • For classification and regression problems on tabular data.

  • Implemented based on a TensorFlow Estimator

  • Provides FEATURE ATTRIBUTIONS

  • TabNet as a new built-in algorithm

4. XGBoost (eXtreme Gradient Boosting)
  • XGBoost enables efficient supervised learning for classification, regression, and ranking tasks. XGBoost training is based on decision tree ensembles, which combine the results of multiple classification and regression models.

  • how XGBoost works

5. Image classification

Comparing built-in algorithms

Algorithm Linear learner Wide and deep TabNet XGBoost Image classification Object detection
MLmodel used TensorFlow Estimator - LinearClassifier and LinearRegressor TensorFlow Estimator - DNNLinearCombinedClassifier, DNNLinearCombinedEstimator, DNNLinearCombinedRegressor. TensorFlow Estimator XGBoost Classification, regression TensorFlow image classification models TensorFlow Object Detection API
Type of problem Classification, Regression Classification, regression, ranking Classification, regression Classification, regression Classification Object detection
USECASES Sales forecasting Recommendation systems, search Advertising click-through rate (CTR) prediction, fraud detection Advertising click-through rate (CTR) prediction Classifying images etecting objects within complex image scenes
Supported accelerators for training GPU GPU GPU GPU (only supported by the distributed version of the algorithm) GPU, TPU GPU, TPU

Introduction to built-in algorithms | AI Platform Training