EXAMTOPIC DUMPS Q24-Q28 ; BINARY CLASSIFICATION - Relationship between SOFTMAX THRESHOLD & PERCISION, Codeless ETL tool, Dealing with Security, Privacy Issues, I/O bound solutions tf input data pipeline
Q 25.
You work for a social media company. You need to detect whether posted images contain cars. Each training example is a member of exactly one class. You have trained an object detection neural network and deployed the model version to AI Platform Prediction for evaluation. Before deployment, you created an evaluation job and attached it to the AI Platform Prediction model version. You notice that the precision is lower than your business requirements allow. How should you adjust the model's final layer softmax threshold to increase precision?
BINARY CLASSIFICATION - Relationship between SOFTMAX THRESHOLD & PERCISION
- ❌ A.
Increase the recall.
→ Probably decrease the precision.
- ⭕ B. Decrease the recall.
→ Usually improving precision typically reduces recall and vice versa
- ❌ C.
Increase the number of false positives.
→ Probably decrease the precision.
- ❌ D. Decrease the number of
false negatives.
→ Probably increase the recall and reduce precision
Precision = TP/(TP+FP)
, Recall = TP/(TP+FN)
Q 26.
You are responsible for building a unified analytics environment across a variety ofon-premises data marts. Your company is experiencing data quality and security challenges when integrating data across the servers, caused by the use of a wide range of disconnected tools and temporary solutions. You need a fully managed, cloud-native data integration service that will lower the total cost of work and reduce repetitive work. Some members on your team prefer a codeless interface for building Extract, Transform, Load (ETL) process. Which service should you use?
####### Codeless ETL tool
- ❌ A. Dataflow
- ❌ B. Dataprep
- ❌ C. Apache Flink
- ⭕ D. Cloud Data Fusion
ETL processes on GCP for low/no-code solutions : Data Fusion & Cloud Dataprep
Cloud Data Fusion |
A Fully managed, cloud-native, enterprise data integration service for quickly building & managing data pipelines |
- Runs on top of Hadoop |
- rough from a usability perspective and generally more expensive, it’s likely where Google is putting significant investment. |
|
Dataprep |
More refined and cost-effective but limited in capability |
- third-party application offered by Trifacta through GCP |
|
Q 27.
You are an ML engineer at a regulated insurance company. You are asked to develop an insurance approval model that accepts or rejects insurance applications from potential customers. What factors should you consider before building the model?
Regulated Data - Dealing with Security, Privacy Issues
- A. Redaction, reproducibility, and explainability
→ usecase of Redaction for sensitivity data
- B. Traceability, reproducibility, and explainability
- C. Federated learning, reproducibility, and explainability
- ⭕ D. Differential privacy, Federated learning, and explainability
Federated Learning, Differetial Privacy, Traceability for Privacy
Federated learning 연합학습 |
Federated Learning Office Hours - AI Workshop Experiments |
Distributed machine learning approach that enables ml on decentralized datasets (_decentralized examples residing on devices, such as smartphones)_ ; 스마트폰 과 같은 장치에 있는 분산된 샘플(데이터)를 사용해 훈련하는 방식 |
- **_help protect data privacy_** & improve local speed and performance\ |
- Open-source TensorFlow Federated library |
Differential privacy |
Differential Privacy, Google Developers Blog: How we’re helping developers with differential privacy ; 민감한 개인정보를 다루는 산업군에서 ML 문제, 아키텍쳐, 평가 매트릭을 공유하면, Tensorflow Privacy를 사용해 훈련하는 방법에 대한 가이드를 받는 서비스 |
Users provide : A well-defined machine learning problem, proposed model architecture, and evaluation metrics. Customers who also have data, or expected data schema, will likely gain more from this engagement, _but there is no expectation of data sharing._ |
Users receive : _Advice on how to use Tensorflow Privacy to train the model in a manner that offers differential privacy._ |
To train and deploy models based on sensitive training data (_health records, personal email, personal photos, etc.) without compromising the privacy of the data_ |
- Tensorflow privacy (currently) most effective with ; More training data (ideally more than 10^5 or 10^6 examples) , Smaller models (ideally under 10^6 parameters) ,Classification/regression rather than generative models. |
Traceability |
Traceability information ( called data lineage ) |
- Use business metadata tool Data Catalog to create tagged data lineage |
- Democratization of data within an organization is essential to help users derive innovative insights for growth. In a big data environment, traceability of where the data in the data warehouse originated and how it flows through a business is critical. This traceability information is called data lineage. Being able to track, manage, and view data lineage helps you to simplify tracking data errors, forensics, and data dependency identification. |
- In addition, data lineage has become essential for securing business data. |
- An organization’s data governance practices require tracking all movement of sensitive data, including personally identifiable information (PII). Of key concern is ensuring that metadata stays within the customer’s cloud organization or project. |
Q 28.
You are training a Resnet model on AI Platform using TPUs to visually categorize types of defects in automobile engines. You capture the training profile using the Cloud TPU profiler plugin and observe that it is highly input-bound. You want to reduce the bot## Q 28. I/O bound solutions tf input data pipeline
You are training a _Resnet** model on AI Platform using TPUs to visually categorize types of defects in automobile engines_. You capture the training profile using the Cloud TPU profiler plugin and observe that it is highly input-bound. You want to reduce the bottleneck and speed up your model training process. Which modifications should you make to the tf.data dataset? (Choose two.)
I/O bound solutions tf input data pipeline
- ⭕ A. Use the interleave option for reading data.
→ INTERLEAVE
FOR PARALLELIZING DATA READING
- B. Reduce the value of the repeat parameter.
- C. Increase the buffer size for the shuttle option.
- ⭕D. Set the prefetch option equal to the training batch size.
→ PREFETCH
FOR PRE-LOADING THE DATA ; REDUCING THE TIME
- E. Decrease the batch size argument in your transformation.
→ BATCH SIZE IS MORE ABOUT MEMORY BOUND SOLUTION for bottleneck and speed up your model training process.