Production Machine Learning Systems 강의 - Training, Serving Architecture 파트를 정리합니다.
Design "Training" Architecture - Static, Dynamic
Static Training | Dynamic Training |
---|---|
Trained once, Offline | Add training data over time repeatedly as more data arrives |
AI Platform | Cloud Functions, App Engine, Cloud Dataflow |
Simpler to build/test | Harder Engineering : need more monitoring, model rollback, and data quarantine capabilities |
Easy to let become stale | sync out updated version (adapt to changes) |
constant relationship | inconstant relationship |
General architecture for dynamic Training
Cloud Functions
A new data file appears in Cloud storage and then the Cloud function is launched.App Engine
When a user makes a web request, perhaps from a dashboard to AppEngine, an AI Platform training job is launched, and the AI Platform job writes a new model to Cloud storage.Dataflow (streaming topic)
Messages are then aggregated with Dataflow and aggregated data is stored into BigQuery. AI Platform is launched on the arrival of new data in BigQuery and then an updated model is deployed.
Designing "Serving" architecture - Static, Dynamic, Hybrid
serving 아키텍쳐를 설계하는 목표 중 하나 : to minimize average latency
- Optimizing serving performance : rather than faster memory, we use a table
space-time tradeoff : Static serving $vs.$ Dynamic serving
Static Serving | Dynamic Serving |
---|---|
Precompute predictions > store > serve by looking it up in the table | computes the label On-demand |
Space intensive | Compute intensive |
Higher storage cost | Lower storage cost |
Low, fixed latency | Variable latency |
Lower maintenance | Higher maintenance |
Choose Serving architecture : Static, Dynamic, Hybrid
다음 2가지 기준을 참고해 아키텍쳐를 설계한다.
Latency, Storage, CPU costs
Peakedness & Cardinality
- Peakedness : how concentrated the distribution of the prediction workloads (the degree to which data values are concentrated around the mean)
- highly peaked - 자동완성기능
- Cardinality : the # of values (possible predictions) in a set
- high cardinality - CLV(customer life time value) of ecommerce platform
- low cardinality - predicting sales revenue given organization division number
- Peakedness : how concentrated the distribution of the prediction workloads (the degree to which data values are concentrated around the mean)
Serving(inference needs) style
- Predict whether email is spam :
Dynamic
most emails are likely to be different, although they may be very similar if generated
programmatically. Depending on the choice of representation, the cardinality might be enormous.- Android voice to text :
Dynamic or Hybrid
online, since there’s such a long tail of possible voice clips. But maybe with sufficient signal processing, some key phrases like “okay google” may have precomputed answers.- Shopping ad conversion rate :
Static
The set of all ads doesn’t change much from day to day. Assuming users are comfortable waiting for a short while after uploading their ads, this could be done statically, and then a batch script could be run at regular intervals throughout the day.
Static serving & Dynamic Serving in AI Platform
Dynamic Serving = Online Prediction job in AI Platform
Static serving = batch prediction job in AI Platform
- Change the call to AI Platform from an online prediction job to a batch prediction job.
- Ensure the model accepted and passed through keys as input.
- keys allow you to join your requests to predictions at serving time.
- Write the predictions to a data warehouse.
- BigQuery and create an API to read from it.
'Certificate - DS > Machine learning engineer' 카테고리의 다른 글
Designing data preparation and processing systems - Data Pipeline for Preprocessing 데이터 전처리 파이프라인 (0) | 2021.11.28 |
---|---|
Deep Learning VM Image (0) | 2021.11.27 |
Google Cloud hardware components - CPU/GPU/TPU (0) | 2021.11.26 |
Professional Machine Learning Engineer 샘플 문제 정리 (0) | 2021.11.26 |
Cloud IAM, API Gateway - Security, Privacy, compliance, legal issues (0) | 2021.11.26 |