Production ML Systems - Design Training&Serving Architecture

Production Machine Learning Systems 강의 - Training, Serving Architecture 파트를 정리합니다.

Design "Training" Architecture - Static, Dynamic

Static Training Dynamic Training
Trained once, Offline Add training data over time repeatedly as more data arrives
AI Platform Cloud Functions, App Engine, Cloud Dataflow
Simpler to build/test Harder Engineering : need more monitoring, model rollback, and data quarantine capabilities
Easy to let become stale sync out updated version (adapt to changes)
constant relationship inconstant relationship

General architecture for dynamic Training

  1. Cloud Functions

    A new data file appears in Cloud storage and then the Cloud function is launched.

  2. App Engine

    When a user makes a web request, perhaps from a dashboard to AppEngine, an AI Platform training job is launched, and the AI Platform job writes a new model to Cloud storage.

  3. Dataflow (streaming topic)

    Messages are then aggregated with Dataflow and aggregated data is stored into BigQuery. AI Platform is launched on the arrival of new data in BigQuery and then an updated model is deployed.

Designing "Serving" architecture - Static, Dynamic, Hybrid

serving 아키텍쳐를 설계하는 목표 중 하나 : to minimize average latency

  • Optimizing serving performance : rather than faster memory, we use a table

space-time tradeoff : Static serving $vs.$ Dynamic serving

Static Serving Dynamic Serving
Precompute predictions > store > serve by looking it up in the table computes the label On-demand
Space intensive Compute intensive
Higher storage cost Lower storage cost
Low, fixed latency Variable latency
Lower maintenance Higher maintenance

Choose Serving architecture : Static, Dynamic, Hybrid

다음 2가지 기준을 참고해 아키텍쳐를 설계한다.

  1. Latency, Storage, CPU costs

  2. Peakedness & Cardinality

    • Peakedness : how concentrated the distribution of the prediction workloads (the degree to which data values are concentrated around the mean)
      • highly peaked - 자동완성기능
    • Cardinality : the # of values (possible predictions) in a set
      • high cardinality - CLV(customer life time value) of ecommerce platform
      • low cardinality - predicting sales revenue given organization division number
Serving(inference needs) style
  • Predict whether email is spam : Dynamic
    most emails are likely to be different, although they may be very similar if generated
    programmatically. Depending on the choice of representation, the cardinality might be enormous.
  • Android voice to text : Dynamic or Hybrid
    online, since there’s such a long tail of possible voice clips. But maybe with sufficient signal processing, some key phrases like “okay google” may have precomputed answers.
  • Shopping ad conversion rate : Static
    The set of all ads doesn’t change much from day to day. Assuming users are comfortable waiting for a short while after uploading their ads, this could be done statically, and then a batch script could be run at regular intervals throughout the day.

Static serving & Dynamic Serving in AI Platform

  • Dynamic Serving = Online Prediction job in AI Platform

  • Static serving = batch prediction job in AI Platform

    1. Change the call to AI Platform from an online prediction job to a batch prediction job.
    2. Ensure the model accepted and passed through keys as input.
      • keys allow you to join your requests to predictions at serving time.
    3. Write the predictions to a data warehouse.
      • BigQuery and create an API to read from it.