Production ML Systems - Tuning Prediction performance ( Batch / Streaming pipeline) 예측 성능 튜닝

Batch / Streaming pipeline 파이프라인 유형에 따른 예측 성능 튜닝 방법을 정리합니다.

Prediction/Inference Performance

Performance must consider prediction-time, not just training.

3 Considerations for Performance during inference : Throughputs, Latency, Cost
1. Throughputs requirements : how many queries per second do you need to process?
2. Latency requirements : how long can a query take?
3. Cost : in terms of infrastructure and maintenance
3 Options for Prediction/Inference Implementation
- Using a deployed model REST/HTTP API for Streaming
- Using prediction jobs on Cloud ML Engine(CMLE) for Batch
- Using direct model prediction on Cloud Dataflow for Batch/Streaming

Batch data pipeline

Batch = Bounded Dataset

Read Data & Data Processing
- Read data from some persistent storage
  - Google Cloud Storage(data lake), BigQuery(data warehouse)
- Processing, carried by Cloud Dataflow, typically enriches the data with the predictions of a ML model
Inference
1. Using a TensorFlow SavedModel
  - Loading a TF SavedModel from Cloud Storage into the Dataflow job and invoke it
2. Using TF Serving
  - Accessing TF Serving via a HTTP end-point as a microservice from CLME or Kubeflow (running on Kubernetes Engine)

Prediction Performance for Batch Pipelines

Performance for Batch Pipelines

Raw processing speed
Cloud ML Engine (AI Platform Notebooks) batch predictions > TF SavedModel > TF Serving on Cloud ML Engine

Maintainability
Cloud ML Engine (AI Platform Notebooks) batch predictions > TF Serving on Cloud ML Engine > TF SavedModel

Laurence said "what’s not to love about a fully managed service?" Fully managed service인 CMLE/AI Platform Notebooks 는 전처리 속도와 유지/보수 측면에서 제일 좋은 성능을 보인다.

Using online predictions as a microservice allows for easier upgradability and dependency management than loading up the current version into the Dataflow job. 반면 TF SavedModel, TF Serving on CMLE 둘은 전처리 속도와 유지/보수 측면에서 성능 순위가 달라진다.

Streaming data pipeline

A streaming pipeline is similar, except that the input dataset is not bounded.

저작자표시 비영리 변경금지

'Certificate - DS > Machine learning engineer' 카테고리의 다른 글

Which GCP service to use - Cloud Functions (0)	2021.11.28
Which GCP service to use - Cloud Dataflow & Cloud Dataproc (0)	2021.11.28
Designing data preparation and processing systems - Data Pipeline for Preprocessing 데이터 전처리 파이프라인 (0)	2021.11.28
Deep Learning VM Image (0)	2021.11.27
Production ML Systems - Design Training&Serving Architecture (0)	2021.11.26

JINSTORY

Production ML Systems - Tuning Prediction performance ( Batch / Streaming pipeline) 예측 성능 튜닝

Prediction/Inference Performance

3 Considerations for Performance during inference : Throughputs, Latency, Cost

3 Options for Prediction/Inference Implementation

Batch data pipeline

Prediction Performance for Batch Pipelines

Performance for Batch Pipelines

Streaming data pipeline

'Certificate - DS > Machine learning engineer' 카테고리의 다른 글

티스토리툴바