[PMLE-EXAMTOPIC] Training performance - I/O bound, CPU bound, Memory bound
Training performance
Tuning Performance to reduce training time, reduce cost, and increase scale.
Model training performance bound by 3 constraints ; I/O, CPU, Memory
Contstraint | Commonly Occurs | Take Action to improve the performance |
---|---|---|
I/O (Input/output) bound | - Large # of inputs - Input requires parsing (heterogeneous) - Small models - input data on a storage system with low throughput(처리량) | - Store efficiently - Parallelize reads - Consider batch size |
Cpu bound | - Expensive computations - Underpowered Hardware | - Train on faster accelerator - Upgrade processor ; GPUs - Run on TPUs- Simplify model |
Memory bound | - Large number of inputs Complex model | - Add more memory- Use fewer layers- Reduce batch size |
- I/O (Input/output) bound : how fast can you get data into the model in each training step?
tf.data
API : to build complex input pipelines from simple, reusable pieces.- Better performance with the tf.data API ⇒ flexible and efficient input pipeline
- Cpu bound : how fast can you compute the gradient in each training step?
Accelerator GPUs and TPUs can radically reduce the time required to execute a single training step!- GPU
- TPU option on Google Cloud
- Simpler model training
- less computationally expensive activation function
- just train for fewer steps
- Memory bound : how many weights can you hold in memory, so that you can do the matrix multiplications in-memory on the GPU or TPU?
- Less Complex model
- REDUCE BATCH SIZE
'Certificate - DS > Machine learning engineer' 카테고리의 다른 글
GCP 실습 QWIKLABS 에러 해결 (0) | 2022.01.01 |
---|---|
Recommendation AI (0) | 2021.12.13 |
[PMLE CERTIFICATE - EXAMTOPIC] Additional DUMPS (0) | 2021.12.13 |
Testing the new app version : Canary, A/B, Shadow (0) | 2021.12.12 |
[PMLE CERTIFICATE - EXAMTOPIC] DUMPS Q57-Q60 (0) | 2021.12.10 |