Sample Questions on Preparing for the Google Cloud Professional Data Engineer Exam | Google Cloud Skills Boost
1️⃣ Building and Operationalizing Data Processing Systems
Q 1.
An application that relies on
Cloud SQL
to read infrequently changing data is predicted to grow dramatically. How can you increase capacity for more read-only clients?
- ❌ A. Configure
high availabilityon the primary node.
→ High Availability does nothing to improve throughput
→ Configuring high availability makes the service more accessible. - ❌ B. Establish an
external replicain the customer's data center.
→ Doesn't add to throughput on the cloud
→ An external replica is more of a backup/D.R.activity. - ❌ C. Use
backups, so you can restore if there is an outage. - ⭕ D. Configure read replicas.
→ In this scenario, clients are read-only and challenge is scale
→ Read Replicas increase capacity for simultaneous reads.
Cloud SQL - Replicate the data
- A read replica is a copy of the primary instance that reflects changes to the primary in almost real time, in normal circumstances.
- You can use a read replica to offload read requests or analytics traffic from the primary instance.
Q 2.
A
BigQuery
dataset was located near Tokyo. For efficiency reasons, the company wants the dataset duplicated in Germany.
- ❌ A.
Changethe dataset from a regional location to multi-region location, specifying the regions to be included.
→ Datasets are immutable, so the location can't be updated. After creating the dataset, the location cannot be changed. - ❌ B. Export the data from BigQuery into a bucket in the
newlocation, and import it into a new dataset at the new location.
→ BigQuery writes and reads from nearby buckets, so the new location can't read the old location data. - ❌ C. Copy the data from the dataset in the source region to the dataset in the target region using BigQuery commands.
→ BigQuery doesn't provide a location-to-location move or copy command. - ⭕ D. Export the data from BigQuery into a nearby bucket in
Cloud Storage
. Copy to a new regional bucket inCloud Storage
. Import into the new dataset in the new location.
→ In this scenario, dataset is in a regional location
→BigQuery
imports and exports data to local or multi-regional buckets in the same location. So you need to useCloud Storage
as an intermediary to transfer the data to the new location.
Bigquery - Dataset Location considerations
Locations or region types : 2 types of locations
region
: a specific geographic place, such as London.multi-region
: a large geographic area, such as the United States, that contains two or more geographic places.
Dataset location
- Specify a location for storing BigQuery data when creating a dataset.
- After you create the dataset, the location cannot be changed.
- But you can copy the dataset to a different location, or manually move (recreate) the dataset in a different location.
- Processes queries in the same location as the dataset that contains the tables you're querying.
- Stores your data in the selected location in accordance with the Service Specific Terms.
Bigquery - Location considerations
Dataset locations | BigQuery | Google Cloud
Colocate your Cloud Storage buckets for loading data.
- If your BigQuery dataset is in a multi-regional location, the Cloud Storage bucket containing the data you're loading must be in a regional or multi-regional bucket in the same location.
- For example, if your BigQuery dataset is in the EU, the Cloud Storage bucket must be in a regional or multi-regional bucket in the EU.
- If your dataset is in a regional location, your Cloud Storage bucket must be a regional bucket in the same location.
- For example, if your dataset is in the Tokyo region, your Cloud Storage bucket must be a regional bucket in Tokyo.
- Exception: If your dataset is in the US multi-regional location, you can load data from a Cloud Storage bucket in any regional or multi-regional location.
Q 3.
Your client wants a transactionally consistent global relational repository. You need to be able to monitor and adjust node count for unpredictable traffic spikes.
- ❌ A. Use
Cloud Spanner
.Monitor storage usage and increase node count if more than 70% utilized.
→ SHOULD NOT USE STORAGE UTILIZATOIN AS A SCALING METRIC. - ⭕ B. Use
Cloud Spanner
. Monitor CPU utilization and increase node count if more than 70% utilized for your time span.
→ Because of the requirement to globally scalable transactions—useCloud Spanner
. CPU utilization is the recommended metric for scaling, per Google best practices. - ❌ C. Use
. Monitor data stored and increase node count if more than 70% utilized.Cloud Bigtable
- ❌ D. Use
. Monitor CPU utilization and increase node count if more than 70% utilized for your time span.Cloud Bigtable
2️⃣ Operationalizing Machine Learning Models
Q 4.
Quickly and inexpensively develop an application that sorts product reviews by most favorable to least favorable.
- ❌ A.
Train an entity classification model withDeploy the model usingTensorFlow
.AI Platform
. Use the entity to sort the reviews. - ❌ B. Build an application that performs entity analysis using the
Natural Language API
. Use the entity to sort the reviews. - ⭕ C. Build an application that performs sentiment analysis using the
Natural Language API
. Use the score and magnitude to sort the reviews.
→ Quickly and inexpensively : Use a Pre-trained model whenever possible. Creating models is expensive and time-consumming.
→ Sentiment Analysis using Natural Language API returns score and magnitude of sentiment. - ❌ D.
Train a sentiment regression model withDeploy the model usingTensorFlow
.AI Platform
. Use the magnitude to sort the reviews.
Q 5.
Maximize speed and minimize cost of deploying a TensorFlow machine-learning model on Google Cloud.
- ⭕ A. Export your trained model to a
SavedModel
format. Deploy and run your model onAI Platform
.
Google's recommended practices : Use each tool for the purpose for which it was desinged and built. JUST DEPLOY IT - ❌ B. Export your trained model to a
SavedModel
format.Deploy and run your model from aGoogle Kubernetes Engine cluster
.
→ Google Kubernetes Engine isn't the right tool for this circumstance. - ❌ C.
Export 2 copies of your trained model to aStore artifacts in Cloud Storage. Run 1 version on CPUs and another version on GPUs.SavedModel
format. - ❌ D.
Export 2 copies of your trained model to aStore artifacts inSavedModel
format.AI Platform
. Run 1 version on CPUs and another version on GPUs.
3️⃣ Security, Policy and Reliability
Q 6.
Groups Analyst1 and Analyst2 should not have access to each other's BigQuery data.
- ❌ A. Place the data
in separate tables,and assign appropriate group access.
→BigQuery
DOES NOT PROVIDE IAM ACCESS CONTROL TO THE INDIVIDUAL TABLE - ❌ B. Analyst1 and Analyst2
must be in separate projects, along with the data.
→ Analyst groups can be in the same project. - ⭕ C. Place the data in separate datasets, and assign appropriate group access.
→BigQuery
data access is controlled at the dataset level. - ❌ D. Place the data in separate tables, but
encrypteach table with a different group key.
→ Encryption does not determine access
Q 7.
Provide Analyst3 secure access to
BigQuery
query results, but not the underlying tables or datasets.
- ❌ A. Export the query results to a public
Cloud Storage
bucket.
→ Not secure - ⭕ B. Create a
BigQuery
Authorized View and assign a project-level user role to Analyst3. - ❌ C. Assign the bigquery.resultsonly.viewer role to Analyst3.
→ The resultsonly viewer role does not exist. - ❌ D. Create a
BigQuery
Authorized View and assign an organization-level role to Analyst3.
→ An organizational role is too broad and violates the principle of "least privilege."
Create an authorized view in BigQuery
Assign a project-level IAM role to your data analysts
To query the view, your data analysts need permission to run query jobs. The bigquery.user
role includes permissions to run jobs, including query jobs, within the project. If you grant a user or group the bigquery.user
role at the project level, the user can create datasets and can run query jobs against tables in those datasets. The bigquery.user
role does not give users permission to query data, view table data, or view table schema details for datasets the user did not create.
Assigning your data analysts the project-level bigquery.user
role does not give them the ability to view or query table data in the dataset containing the tables queried by the view. The bigquery.user
role also does not grant users the ability to update your views. Most individuals (data scientists, business intelligence analysts, data analysts) in an enterprise should be assigned the project-level bigquery.user
role.
When you add a group to an IAM role, the email address and domain must be associated with an active Google Account or Google Apps account.
Q 8.
Use
Data Studio
to visualize YouTube titles and aggregated view counts summarized over 30 days and segmented by Country Code in the fewest steps.
- ❌ A. Set up a YouTube data source for your channel data for
Data Studio
. Set Views as the metric, and set Video Title as a report dimension. Set Country Code as a filter.
→ Cannot produce a summarized report that meets your business requirements using the options listed. - B. Set up a YouTube data source for your channel data for
Data Studio
. Set Views as the metric, and set Video Title and Country Code as report dimensions.
→ Use the existing YouTube data source.
→ Country Code is a dimension because it's a string and should be displayed as such, that is, showing all countries, instead of filtering. - ❌ C.
Export your YouTube views toSet up aCloud Storage
.Cloud Storage
data source for Data Studio. Set Views as the metric, and set Video Title as a report dimension. Set Country Code as a filter.
→ no need to export - ❌ D.
Export your YouTube views toSet up aCloud Storage
.Cloud Storage
data source forData Studio
. Set Views as the metric, and set Video Title and Country Code as report dimensions.
→ no need to export
Data Studio
Data Studio
Dashboard- dimensions and metrics
- Manage segments
'Certificate - DS > Data engineer' 카테고리의 다른 글
[PDE CERTIFICATE - EXAMTOPIC] DUMPS Q21-Q25 (0) | 2022.01.25 |
---|---|
[PDE CERTIFICATE - EXAMTOPIC] DUMPS Q6-Q10 (0) | 2022.01.24 |
[PDE CERTIFICATE - EXAMTOPIC] DUMPS Q1-Q5 (0) | 2022.01.24 |
[DATA ENGINEER LEARNING PATH] 1 - Building and Operationalizing Data Processing Systems (0) | 2022.01.16 |
[Certificate] Google Professional Data Engineer (PDE) 자격증 (0) | 2021.12.27 |