Training performance - I/O bound, CPU bound, Memory bound

[PMLE-EXAMTOPIC] Training performance - I/O bound, CPU bound, Memory bound

Training performance

Tuning Performance to reduce training time, reduce cost, and increase scale.

Model training performance bound by 3 constraints ; I/O, CPU, Memory

Contstraint Commonly Occurs Take Action to improve the performance
I/O (Input/output) bound - Large # of inputs - Input requires parsing (heterogeneous) - Small models - input data on a storage system with low throughput(처리량) - Store efficiently - Parallelize reads - Consider batch size
Cpu bound - Expensive computations - Underpowered Hardware - Train on faster accelerator - Upgrade processor ; GPUs - Run on TPUs- Simplify model
Memory bound - Large number of inputs Complex model - Add more memory- Use fewer layers- Reduce batch size
  1. I/O (Input/output) bound : how fast can you get data into the model in each training step?
  2. Cpu bound : how fast can you compute the gradient in each training step?
    Accelerator GPUs and TPUs can radically reduce the time required to execute a single training step!
    • GPU
    • TPU option on Google Cloud
    • Simpler model training
      1. less computationally expensive activation function
      2. just train for fewer steps
  3. Memory bound : how many weights can you hold in memory, so that you can do the matrix multiplications in-memory on the GPU or TPU?
    • Less Complex model
    • REDUCE BATCH SIZE