Sample Questions on Preparing for the Google Cloud Professional Data Engineer Exam | Google Cloud Skills Boost

1️⃣ Building and Operationalizing Data Processing Systems

2️⃣ Operationalizing Machine Learning Models

3️⃣ Security, Policy and Reliability

1️⃣ Building and Operationalizing Data Processing Systems

Q 1.

An application that relies on Cloud SQL to read infrequently changing data is predicted to grow dramatically. How can you increase capacity for more read-only clients?

❌ A. Configure ~~high availability~~ on the primary node.
→ High Availability does nothing to improve throughput
→ Configuring high availability makes the service more accessible.
❌ B. Establish an ~~external replica~~ in the customer's data center.
→ Doesn't add to throughput on the cloud
→ An external replica is more of a backup/D.R.activity.
❌ C. Use ~~backups~~, so you can restore if there is an outage.
⭕ D. Configure read replicas.
→ In this scenario, clients are read-only and challenge is scale
→ Read Replicas increase capacity for simultaneous reads.

Cloud SQL - Replicate the data

Creating read replicas

A read replica is a copy of the primary instance that reflects changes to the primary in almost real time, in normal circumstances.
You can use a read replica to offload read requests or analytics traffic from the primary instance.

Q 2.

A BigQuery dataset was located near Tokyo. For efficiency reasons, the company wants the dataset duplicated in Germany.

❌ A. ~~Change~~ the dataset from a regional location to multi-region location, specifying the regions to be included.
→ Datasets are immutable, so the location can't be updated. After creating the dataset, the location cannot be changed.
❌ B. Export the data from BigQuery into a bucket in the ~~new~~ location, and import it into a new dataset at the new location.
→ BigQuery writes and reads from nearby buckets, so the new location can't read the old location data.
❌ C. Copy the data from the dataset in the source region to the dataset in the target region using BigQuery commands.
→ BigQuery doesn't provide a location-to-location move or copy command.
⭕ D. Export the data from BigQuery into a nearby bucket in Cloud Storage. Copy to a new regional bucket in Cloud Storage. Import into the new dataset in the new location.
→ In this scenario, dataset is in a regional location
→ BigQuery imports and exports data to local or multi-regional buckets in the same location. So you need to use Cloud Storage as an intermediary to transfer the data to the new location.

Bigquery - Dataset Location considerations

Locations or region types : 2 types of locations

region : a specific geographic place, such as London.
multi-region : a large geographic area, such as the United States, that contains two or more geographic places.

Dataset location

Specify a location for storing BigQuery data when creating a dataset.
- After you create the dataset, the location cannot be changed.
- But you can copy the dataset to a different location, or manually move (recreate) the dataset in a different location.
Processes queries in the same location as the dataset that contains the tables you're querying.
Stores your data in the selected location in accordance with the Service Specific Terms.

Bigquery - Location considerations

Dataset locations | BigQuery | Google Cloud

Colocate your Cloud Storage buckets for loading data.

If your BigQuery dataset is in a multi-regional location, the Cloud Storage bucket containing the data you're loading must be in a regional or multi-regional bucket in the same location.
- For example, if your BigQuery dataset is in the EU, the Cloud Storage bucket must be in a regional or multi-regional bucket in the EU.
If your dataset is in a regional location, your Cloud Storage bucket must be a regional bucket in the same location.
- For example, if your dataset is in the Tokyo region, your Cloud Storage bucket must be a regional bucket in Tokyo.
Exception: If your dataset is in the US multi-regional location, you can load data from a Cloud Storage bucket in any regional or multi-regional location.

Q 3.

Your client wants a transactionally consistent global relational repository. You need to be able to monitor and adjust node count for unpredictable traffic spikes.

❌ A. Use Cloud Spanner.~~Monitor storage usage and increase node count if more than 70% utilized.~~
→ SHOULD NOT USE STORAGE UTILIZATOIN AS A SCALING METRIC.
⭕ B. Use Cloud Spanner. Monitor CPU utilization and increase node count if more than 70% utilized for your time span.
→ Because of the requirement to globally scalable transactions—use Cloud Spanner. CPU utilization is the recommended metric for scaling, per Google best practices.
❌ C. Use ~~Cloud Bigtable~~. Monitor data stored and increase node count if more than 70% utilized.
❌ D. Use ~~Cloud Bigtable~~. Monitor CPU utilization and increase node count if more than 70% utilized for your time span.

2️⃣ Operationalizing Machine Learning Models

Q 4.

Quickly and inexpensively develop an application that sorts product reviews by most favorable to least favorable.

❌ A. ~~Train an entity classification model with TensorFlow.~~ Deploy the model using AI Platform. Use the entity to sort the reviews.
❌ B. Build an application that performs entity analysis using the Natural Language API. Use the entity to sort the reviews.
⭕ C. Build an application that performs sentiment analysis using the Natural Language API. Use the score and magnitude to sort the reviews.
→ Quickly and inexpensively : Use a Pre-trained model whenever possible. Creating models is expensive and time-consumming.
→ Sentiment Analysis using Natural Language API returns score and magnitude of sentiment.
❌ D. ~~Train a sentiment regression model with TensorFlow.~~ Deploy the model using AI Platform. Use the magnitude to sort the reviews.

Q 5.

Maximize speed and minimize cost of deploying a TensorFlow machine-learning model on Google Cloud.

⭕ A. Export your trained model to a SavedModel format. Deploy and run your model on AI Platform.
Google's recommended practices : Use each tool for the purpose for which it was desinged and built. JUST DEPLOY IT
❌ B. Export your trained model to a SavedModel format. ~~Deploy and run your model from a Google Kubernetes Engine cluster.~~
→ Google Kubernetes Engine isn't the right tool for this circumstance.
❌ C.~~Export 2 copies of your trained model to a SavedModel format.~~ Store artifacts in Cloud Storage. Run 1 version on CPUs and another version on GPUs.
❌ D. ~~Export 2 copies of your trained model to a SavedModel format.~~ Store artifacts in AI Platform. Run 1 version on CPUs and another version on GPUs.

3️⃣ Security, Policy and Reliability

Q 6.

Groups Analyst1 and Analyst2 should not have access to each other's BigQuery data.

❌ A. Place the data ~~in separate tables,~~ and assign appropriate group access.
→ BigQuery DOES NOT PROVIDE IAM ACCESS CONTROL TO THE INDIVIDUAL TABLE
❌ B. Analyst1 and Analyst2 ~~must be in separate projects~~, along with the data.
→ Analyst groups can be in the same project.
⭕ C. Place the data in separate datasets, and assign appropriate group access.
→ BigQuery data access is controlled at the dataset level.
❌ D. Place the data in separate tables, but ~~encrypt~~ each table with a different group key.
→ Encryption does not determine access

Q 7.

Provide Analyst3 secure access to BigQuery query results, but not the underlying tables or datasets.

❌ A. Export the query results to a public Cloud Storage bucket.
→ Not secure
⭕ B. Create a BigQuery Authorized View and assign a project-level user role to Analyst3.
❌ C. Assign the bigquery.resultsonly.viewer role to Analyst3.
→ The resultsonly viewer role does not exist.
❌ D. Create a BigQuery Authorized View and assign an organization-level role to Analyst3.
→ An organizational role is too broad and violates the principle of "least privilege."

Create an authorized view in BigQuery

Assign a project-level IAM role to your data analysts

To query the view, your data analysts need permission to run query jobs. The bigquery.user role includes permissions to run jobs, including query jobs, within the project. If you grant a user or group the bigquery.user role at the project level, the user can create datasets and can run query jobs against tables in those datasets. The bigquery.user role does not give users permission to query data, view table data, or view table schema details for datasets the user did not create.

Assigning your data analysts the project-level bigquery.user role does not give them the ability to view or query table data in the dataset containing the tables queried by the view. The bigquery.user role also does not grant users the ability to update your views. Most individuals (data scientists, business intelligence analysts, data analysts) in an enterprise should be assigned the project-level bigquery.user role.

When you add a group to an IAM role, the email address and domain must be associated with an active Google Account or Google Apps account.

Q 8.

Use Data Studio to visualize YouTube titles and aggregated view counts summarized over 30 days and segmented by Country Code in the fewest steps.

❌ A. Set up a YouTube data source for your channel data forData Studio. Set Views as the metric, and set Video Title as a report dimension. Set Country Code as a filter.
→ Cannot produce a summarized report that meets your business requirements using the options listed.
B. Set up a YouTube data source for your channel data for Data Studio. Set Views as the metric, and set Video Title and Country Code as report dimensions.
→ Use the existing YouTube data source.
→ Country Code is a dimension because it's a string and should be displayed as such, that is, showing all countries, instead of filtering.
❌ C. ~~Export your YouTube views to Cloud Storage.~~ Set up a Cloud Storage data source for Data Studio. Set Views as the metric, and set Video Title as a report dimension. Set Country Code as a filter.
→ no need to export
❌ D. ~~Export your YouTube views to Cloud Storage.~~ Set up a Cloud Storage data source for Data Studio. Set Views as the metric, and set Video Title and Country Code as report dimensions.
→ no need to export

Data Studio

Data Studio Dashboard
- dimensions and metrics
- Manage segments

저작자표시 비영리 변경금지 (새창열림)

'Certificate - DS > Data engineer' 카테고리의 다른 글

[PDE CERTIFICATE - EXAMTOPIC] DUMPS Q21-Q25 (0)	2022.01.25
[PDE CERTIFICATE - EXAMTOPIC] DUMPS Q6-Q10 (0)	2022.01.24
[PDE CERTIFICATE - EXAMTOPIC] DUMPS Q1-Q5 (0)	2022.01.24
[DATA ENGINEER LEARNING PATH] 1 - Building and Operationalizing Data Processing Systems (0)	2022.01.16
[Certificate] Google Professional Data Engineer (PDE) 자격증 (0)	2021.12.27

JINSTORY

[DATA ENGINEER LEARNING PATH] Sample Questions