Introduction to built-in algorithms
With built-in algorithms on AI Platform Training, you can run training jobs on your data without writing any code for a training application.
How training with built-in algorithms works
COMPARE THE AVAILABLE Built-in algorithms
- If so, select the best fit.
- If no built-in algorithms are suitable, you can create a training application to run on AI Platform Training.
PREPROCESSING
- Format your input data for training with the built-in algorithm.
- MUST SUBMIT DATA AS A CSV FILE with its HEADER ROW REMOVED.
- If applicable, follow any additional formatting requirements specific to the built-in algorithm you're using.
CREATE A
Cloud Storage bucket
(if you don't have already) (AI Platform Training
output을 저장할 버킷.)SELECT OPTIONS FOR CUSTOMIZING TRAINING JOB.
Selections to configure the overall training job → THEN the algorithm specifically → Optionally, make additional selections to configure hyperparameter tuning. : Training job에 대한 설정 후, algorithm에 맞는 설정하기.
(1) For Training job selections
- a job name
- the built-in algorithm to use
- the machine(s) to use
- the region where the job should run
- the Cloud Storage bucket location where to store outputs.
(2) For the algorithm-specific selections
- can enable AI Platform Training to perform automatic preprocessing on your dataset.
- can also specify arguments (the learning rate, training steps, and batch size).
(3) For hyperparameter tuning (OPTIONAL)
- select a goal metric (e.g, maximizing model's predictive accuracy or minimizing the training loss.)
- Additionally, tune specific hyperparameters and _set ranges for their value_s.
SUBMIT the training job → view logs to monitor its progress and status.
Training job has completed successfully → can deploy trained model on AI Platform Training to set up a prediction server and get predictions on new data.
Limitations of AI Platform training job using built-in Algorithms
분산 훈련 ❌, multiple-GPU보다는 single-GPU
- Distributed training is not supported.
- Training jobs submitted through the Google Cloud Console use only legacy machine types.
- can use Compute Engine machine types with training jobs submitted through
gcloud
or the Google API Client Library for Python. - machine types for training.
- can use Compute Engine machine types with training jobs submitted through
- GPUs are supported for some algorithms
- Multi-GPU machines do not yield greater speed with built-in algorithm training. If you're using GPUs, select machines with a single GPU.
- TPUs are not supported for tabular built-in algorithm training. ⇒ MUST CREATE A TRAINING APPLICATION.
- guides for each algorithm
Built-in algorithms help you train models for classification and regression.
1. Linear learner
For logistic regression, binary classification, and multiclass classification
Implemented based on a
TensorFlow Estimator
one weight to each input feature → sums the weights to predict a numerical target value
easy to interpret : can compare the feature weights to determine which input features have significant impacts on your predictions
2. Wide and deep
For recommender systems, search, and ranking problems
Implemented based on a
TensorFlow Estimator
combines a linear model that learns and "memorizes" a wide range of rules with a deep neural network that "generalizes" the rules and applies them correctly to similar features in new, unseen data.
3. TabNet
For classification and regression problems on tabular data.
Implemented based on a
TensorFlow Estimator
Provides FEATURE ATTRIBUTIONS
4. XGBoost (eXtreme Gradient Boosting)
XGBoost enables efficient supervised learning for classification, regression, and ranking tasks. XGBoost training is based on decision tree ensembles, which combine the results of multiple classification and regression models.
5. Image classification
image detection algo uses TensorFlow image classification models
based on a TensorFlow implementation of EfficientNet or ResNet
6. Object detection
- Uses TensorFlow Object Detection API to identify "multiple objects" within a single image.
Comparing built-in algorithms
Algorithm | Linear learner | Wide and deep | TabNet | XGBoost | Image classification | Object detection |
---|---|---|---|---|---|---|
MLmodel used | TensorFlow Estimator - LinearClassifier and LinearRegressor | TensorFlow Estimator - DNNLinearCombinedClassifier, DNNLinearCombinedEstimator, DNNLinearCombinedRegressor. | TensorFlow Estimator | XGBoost Classification, regression | TensorFlow image classification models | TensorFlow Object Detection API |
Type of problem | Classification, Regression | Classification, regression, ranking | Classification, regression | Classification, regression | Classification | Object detection |
USECASES | Sales forecasting | Recommendation systems, search | Advertising click-through rate (CTR) prediction, fraud detection | Advertising click-through rate (CTR) prediction | Classifying images | etecting objects within complex image scenes |
Supported accelerators for training | GPU | GPU | GPU | GPU (only supported by the distributed version of the algorithm) | GPU, TPU | GPU, TPU |
'Certificate - DS > Machine learning engineer' 카테고리의 다른 글
[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q5-Q8 (0) | 2021.12.09 |
---|---|
[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q1-Q4 (0) | 2021.12.09 |
[EXAMTOPIC] Dataflow pipelines for batch/online prediction (0) | 2021.12.08 |
Which GCP services to use - No SQL Options for storage (Memorystore, Datastore, Bigtable) (0) | 2021.12.08 |
[EXAMTOPIC] AUC & ROC curve (0) | 2021.12.07 |