[PDE CERTIFICATE - EXAMTOPIC] DUMPS Q121-Q125

Google Professional Data Engineer Certificate EXAMTOPIC DUMPS Q121-Q125

Q 121.

You currently have a single on-premises Kafka cluster in a data center in the us-east region that is responsible for ingesting messages from IoT devices globally. Because large parts of globe have poor internet connectivity, messages sometimes batch at the edge, come in all at once, and cause a spike in load on your Kafka cluster. This is becoming difficult to manage and prohibitively expensive. What is the Google-recommended cloud native architecture for this scenario?

  • ❌ A. Edge TPUs as sensor devices for storing and transmitting the messages.
  • ❌ B. Cloud Dataflow connected to the Kafka cluster to scale the processing of incoming messages.
  • C. An IoT gateway connected to Cloud Pub/Sub, with Cloud Dataflow to read and process the messages from Cloud Pub/Sub.
    Alternative to a single kafka cluster in Google Cloud native service is Pub/Sub
    Pub/Sub scales automatically based on demand.
  • ❌ D. A Kafka cluster virtualized on Compute Engine in us-east with Cloud Load Balancing to connect to the devices around the world.

Cloud Native = Pub/Sub + DataFlow

Q 124.

You are designing a cloud-native historical data processing system to meet the following conditions:

  • The data being analyzed is in CSV, Avro, and PDF formats and will be accessed by multiple analysis tools including Cloud Dataproc, BigQuery, and Compute Engine.
  • A streaming data pipeline stores new data daily.
  • Peformance is not a factor in the solution.
  • The solution design should maximize availability.

How should you design data storage for this solution?

  • ❌ A. Create a Cloud Dataproc cluster with high availability. Store the data in HDFS, and peform analysis as needed.
  • ❌ B. Store the data in BigQuery. Access the data using the BigQuery Connector on Cloud Dataproc and Compute Engine.
    BQ does not support PDF format.(UNSTRUCTURED)
  • ❌ C. Store the data in a regional Cloud Storage bucket. Access the bucket directly using Cloud Dataproc, BigQuery, and Compute Engine.
    To MAXIMIZE Availability : "multi/dual" > region
  • D. Store the data in a multi-regional Cloud Storage bucket. Access the data directly using Cloud Dataproc, BigQuery, and Compute Engine.
    To MAXIMIZE Availability : "multi/dual" > region

Cloud Storage Classes

Bigquery supports following formats :

Q 125.

You have a petabyte of analytics data and need to design a storage and processing platform for it. You must be able to perform data warehouse-style analytics on the data in Google Cloud and expose the dataset as files for batch analysis tools in other cloud providers. What should you do?

  • ⚠️ A. Store and process the entire dataset in BigQuery.
    We also need the tool for exposing the data as files : Cloud Storage
  • ❌ B. Store and process the entire dataset in Cloud Bigtable.
  • C. Store the full dataset in BigQuery, and store a compressed copy of the data in a Cloud Storage bucket.
    Data Warehouse-style analytics : BigQuery
    Exposing the data as files : Cloud Storage
  • ❌ D. Store the warm data as files in Cloud Storage, and store the active data in BigQuery. Keep this ratio as 80% warm and 20% active.
    Whether the data is warm data or not is not the key; The data should be available for other providers.

Warm data $vs.$ Hot data

What is Warm Data? - Definition from Techopedia

  • Warm data is a term for data that gets analyzed on a fairly frequent basis, but is not constantly in play or in motion.
  • By contrast, hot data is data that is used very frequently and data that administrators perceive to be always changing.