[PDE CERTIFICATE - EXAMTOPIC] DUMPS Q111-Q115

Google Professional Data Engineer Certificate EXAMTOPIC DUMPS Q111-Q116

Q 111.

You have historical data covering the last three years in BigQuery and a data pipeline that delivers new data to BigQuery daily. You have noticed that when the Data Science team runs a query filtered on a date column and limited to 30-90 days of data, the query scans the entire table. You also noticed that your bill is increasing more quickly than you expected. You want to resolve the issue as cost-effectively as possible while maintaining the ability to conduct SQL queries. What should you do?

  • A. Re-create the tables using DDL. Partition the tables by a column containing a TIMESTAMP or DATE Type.
    PARTITIONING for reducing cost and time
  • ❌ B. Recommend that the Data Science team export the table to a CSV file on Cloud Storage and use Cloud Datalab to explore the data by reading the files directly.
    The most inefficient solution
  • ❌ C. Modify your pipeline to maintain the last 30-90 days of data in one table and the longer history in a different table to minimize full table scans over the entire history.
    Cost ineffective; seperate table is maintained for last 30-90 days data, we end up creating a table on daily basis
  • ❌ D. Write an Apache Beam pipeline that creates a BigQuery table per day. Recommend that the Data Science team use wildcards on the table name suffixes to select the data they need.

Q 112.

You operate a logistics company, and you want to improve event delivery reliability for vehicle-based sensors. You operate small data centers around the world to capture these events, but leased lines that provide connectivity from your event collection infrastructure to your event processing infrastructure are unreliable, with unpredictable latency. You want to address this issue in the most cost-effective way. What should you do?

  • ❌ A. Deploy small Kafka clusters in your data centers to buffer events.
  • B. Have the data acquisition devices publish data to Cloud Pub/Sub.
    The issue is between Events Collect Datacenters and Analytics Datacenters,
    To skip processing and sending through multiple endpoints and ingest raw messages from sensors in GCP : Pub/Sub
  • ❌ C. Establish a Cloud Interconnect between all remote data centers and Google.
  • ❌ D. Write a Cloud Dataflow pipeline that aggregates all data in session windows.

Pipeline using Cloud Pub/Sub for IoT

Cloud Interconnect

Cloud Interconnect overview | Google Cloud

  • Cloud Interconnect provides low latency, high availability connections that enable you to reliably transfer data between your on-premises and Google Cloud Virtual Private Cloud (VPC) networks. Also, Interconnect connections provide internal IP address communication, which means internal IP addresses are directly accessible from both networks.
  • Cloud Interconnect offers two options for extending your on-premises network:
    • Dedicated Interconnect provides a direct physical connection between your on-premises network and Google's network.
    • Partner Interconnect provides connectivity between your on-premises and VPC networks through a supported service provider.

Q 113.

You are a retailer that wants to integrate your online sales capabilities with different in-home assistants, such as Google Home. You need to interpret customer voice commands and issue an order to the backend systems. Which solutions should you choose?

  • ❌ A. Cloud Speech-to-Text API
  • ❌ B. Cloud Natural Language API
  • C. Dialogflow Enterprise Edition
  • ❌ D. Cloud AutoML Natural Language

Dialogflow

Dialogflow Enterprise Edition

Q 114.

Your company has a hybrid cloud initiative. You have a complex data pipeline that moves data between cloud provider services and leverages services from each of the cloud providers. Which cloud-native service should you use to orchestrate the entire pipeline?

  • A. Cloud Dataflow
  • B. Cloud Composer
  • C. Cloud Dataprep
  • D. Cloud Dataproc

Cloud Composer

Fully managed workflow orchestration service that empowers you to author, schedule, and monitor pipelines that span across clouds and on-premises data centers.

  • Ease your transition to the cloud or maintain a hybrid data environment by orchestrating workflows that cross between on-premises and the public cloud.
  • Can help create workflows that connect data, processing, and services across clouds, giving you a unified data environment.
  • NOT suitable if low latency was required in between tasks.
  • Which GCP service to use - Orchestration : Scheduler, Composer, Workflows

Q 115.

You use a dataset in BigQuery for analysis. You want to provide third-party companies with access to the same dataset. You need to keep the costs of data sharing low and ensure that the data is current. Which solution should you choose?

  • A. Create an authorized view on the BigQuery table to control data access, and provide third-party companies with access to that view.
  • B. Use Cloud Scheduler to export the data on a regular basis to Cloud Storage, and provide third-party companies with access to the bucket.
    More COST
  • C. Create a separate dataset in BigQuery that contains the relevant data to share, and provide third-party companies with access to the new dataset.
    More COST
  • D. Create a Cloud Dataflow job that reads the data in frequent time intervals, and writes it to the relevant BigQuery dataset or Cloud Storage bucket for third-party companies to use.
    More COST and Not guarantee that the data is current

BigQuery - Create an authorized view

  • Giving a view access to a dataset is also known as creating an authorized view in BigQuery.
    • An authorized view lets you share query results with particular users and groups without giving them access to the underlying tables.
    • You can also use the view's SQL query to restrict the columns (fields) the users are able to query.
  • Alternative to authorized views: You can also control access to tables and views with access control set at the table level, within the same dataset.
  • how table-level access controls compare to authorized views.