Google Professional Data Engineer Certificate EXAMTOPIC DUMPS Q111-Q116
Q 111.
You have historical data covering the last three years in
BigQuery
and a data pipeline that delivers new data toBigQuery
daily. You have noticed that when the Data Science team runs a query filtered on a date column and limited to 30-90 days of data, the query scans the entire table. You also noticed that your bill is increasing more quickly than you expected. You want to resolve the issue as cost-effectively as possible while maintaining the ability to conduct SQL queries. What should you do?
- ⭕ A. Re-create the tables using DDL. Partition the tables by a column containing a TIMESTAMP or DATE Type.
→ PARTITIONING for reducing cost and time- ❌ B. Recommend that the Data Science team
export the table to a CSV file on Cloud Storage and use Cloud Datalabto explore the data by reading the files directly.
→ The most inefficient solution- ❌ C.
Modify your pipeline to maintain the last 30-90 days of data in one tableand the longer history in a different table to minimize full table scans over the entire history.
→ Cost ineffective; seperate table is maintained for last 30-90 days data, we end up creating a table on daily basis- ❌ D.
Write an Apache Beam pipeline that creates a BigQuery table per day.Recommend that the Data Science team use wildcards on the table name suffixes to select the data they need.
Q 112.
You operate a logistics company, and you want to improve event delivery reliability for vehicle-based sensors. You operate small data centers around the world to capture these events, but leased lines that provide connectivity from your event collection infrastructure to your event processing infrastructure are unreliable, with unpredictable latency. You want to address this issue in the most cost-effective way. What should you do?
- ❌ A. Deploy
small Kafka clustersin your data centers to buffer events.- ⭕ B. Have the data acquisition devices publish data to
Cloud Pub/Sub
.
→ The issue is between Events Collect Datacenters and Analytics Datacenters,
→ To skip processing and sending through multiple endpoints and ingest raw messages from sensors in GCP :Pub/Sub
- ❌ C. Establish a
Cloud Interconnect
between all remote data centers and Google.- ❌ D. Write a
Cloud Dataflow
pipeline that aggregates all data in session windows.
Pipeline using Cloud Pub/Sub for IoT
Cloud Interconnect
Cloud Interconnect overview | Google Cloud
- Cloud Interconnect provides low latency, high availability connections that enable you to reliably transfer data between your on-premises and Google Cloud Virtual Private Cloud (VPC) networks. Also, Interconnect connections provide internal IP address communication, which means internal IP addresses are directly accessible from both networks.
- Cloud Interconnect offers two options for extending your on-premises network:
- Dedicated Interconnect provides a direct physical connection between your on-premises network and Google's network.
- Partner Interconnect provides connectivity between your on-premises and VPC networks through a supported service provider.
Q 113.
You are a retailer that wants to integrate your online sales capabilities with different in-home assistants, such as Google Home. You need to interpret customer voice commands and issue an order to the backend systems. Which solutions should you choose?
- ❌ A. Cloud Speech-to-Text API
- ❌ B. Cloud Natural Language API
- ⭕ C. Dialogflow Enterprise Edition
- ❌ D. Cloud AutoML Natural Language
Dialogflow
Q 114.
Your company has a hybrid cloud initiative. You have a complex data pipeline that moves data between cloud provider services and leverages services from each of the cloud providers. Which cloud-native service should you use to orchestrate the entire pipeline?
- ❌
A. Cloud Dataflow- ⭕ B. Cloud Composer
- ❌
C. Cloud Dataprep- ❌
D. Cloud Dataproc
Cloud Composer
Fully managed workflow orchestration service that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers.
- Ease your transition to the cloud or maintain a hybrid data environment by orchestrating workflows that cross between on-premises and the public cloud.
- Can help create workflows that connect data, processing, and services across clouds, giving you a unified data environment.
- NOT suitable if low latency was required in between tasks.
- Which GCP service to use - Orchestration : Scheduler, Composer, Workflows
Q 115.
You use a dataset in
BigQuery
for analysis. You want to provide third-party companies with access to the same dataset. You need to keep the costs of data sharing low and ensure that the data is current. Which solution should you choose?
- ⭕ A. Create an authorized view on the BigQuery table to control data access, and provide third-party companies with access to that view.
- ❌ B.
Use Cloud Scheduler to export the data on a regular basis to Cloud Storage, and provide third-party companies with access to the bucket.
→ More COST- ❌ C.
Create a separate dataset in BigQuerythat contains the relevant data to share, and provide third-party companies with access to the new dataset.
→ More COST- ❌ D.
Create a Cloud Dataflow job that reads the data in frequent time intervals, andwrites it to the relevant BigQuery dataset or Cloud Storage bucket for third-party companies to use.
→ More COST and Not guarantee that the data is current
BigQuery - Create an authorized view
- Giving a view access to a dataset is also known as creating an authorized view in
BigQuery
.- An authorized view lets you share query results with particular users and groups without giving them access to the underlying tables.
- You can also use the view's SQL query to restrict the columns (fields) the users are able to query.
- Alternative to authorized views: You can also control access to tables and views with access control set at the table level, within the same dataset.
- how table-level access controls compare to authorized views.
'Certificate - DS > Data engineer' 카테고리의 다른 글
BigQuery - Partitioning, Clustering, Sharding (0) | 2022.02.21 |
---|---|
[PDE CERTIFICATE - EXAMTOPIC] DUMPS Q121-Q125 (0) | 2022.02.17 |
[PDE CERTIFICATE - EXAMTOPIC] DUMPS Q96-Q100 (0) | 2022.02.17 |
[PDE CERTIFICATE - EXAMTOPIC] DUMPS Q91-Q95 (0) | 2022.02.17 |
Professional Data Engineer 샘플 문제 정리 (0) | 2022.02.16 |