SOURCE/REFERENCE: Migration from Kafka to Pub/Sub
Pub/Sub
Pub/Sub
is an asynchronous messaging service.
- Pub/Sub decouples services that produce events from services that process events.
- Use Pub/Sub as messaging-oriented middleware or event ingestion and delivery for streaming analytics pipelines.
- A publisher application creates and sends messages to a topic.
- Subscriber applications create a subscription to a topic to receive messages from it.
- A subscription is a named entity that represents an interest in receiving messages on a particular topic.
- A publisher application creates and sends messages to a topic.
- Deployed in all Google Cloud regions for high availability and low latency.
Pub/Sub
directs publisher traffic to the nearest Google Cloud data center where data storage is allowed, as defined in the resource location restriction policy.- Pub/Sub can integrate with many Google Cloud services such as
Dataflow
,Cloud Storage
,Cloud Run
.- You can configure these services to serve as data sources that can publish messages to
Pub/Sub
, or as data sinks that can receive messages fromPub/Sub
.
- You can configure these services to serve as data sources that can publish messages to
- Pub/Sub exports metrics by using
Cloud Monitoring
.
Kafka
Apache Kafka is an open source, distributed, event-streaming platform, and it enables applications to publish, subscribe to, store, and process streams of events.
- The Kafka server is run as a cluster of machines that client applications interact with to read, write, and process events.
- Use Kafka to decouple applications, send and receive messages, track activities, aggregate log data, and process streams.
- Within the Kafka cluster, some nodes in the cluster are designated as brokers.
- Brokers receive messages from producers and store them on disk.
- Stored messages are organized by topic and partitioned across several different brokers in the cluster. New events published to a topic are appended to the end of one of the topic's partitions. Consumers can then fetch messages from brokers, which are read from disk and sent to the consumer.
Apache Kafka $vs.$ Pub/Sub
Kafka
brokers manage multiple ordered partitions of messages, represented by the horizontal rows of messages. Consumers read messages from a particular partition that has a capacity based on the machine that hosts that partition.- Message ordering within Partitions.
Pub/Sub
does not have partitions, and consumers instead read from a topic that autoscales according to demand.- Message ordering within topics.
- Pub/Sub scales automatically based on demand.
- You configure each Kafka topic with the number of partitions that you require to handle the expected consumer load.
EXAMTOPIC Q 117.
You are designing a data processing pipeline. The pipeline must be able to scale automatically as load increases. Messages must be processed at least once and must be ordered within windows of 1 hour. How should you design the solution?
- A. Use
Apache Kafkafor message ingestion and use Cloud Dataproc for streaming analysis. - B. Use
Apache Kafkafor message ingestion and use Cloud Dataflow for streaming analysis. - C. Use Cloud Pub/Sub for message ingestion and Cloud
Dataprocfor streaming analysis. - ⭕ D. Use
Cloud Pub/Sub
for message ingestion andCloud Dataflow
for streaming analysis.
→ Pub/Sub can integrate with many Google Cloud services such asDataflow
,Cloud Storage
,Cloud Run
.
→ You can configure these services to serve as data sources that can publish messages toPub/Sub
, or as data sinks that can receive messages fromPub/Sub
.
'Certificate - DS > Data engineer' 카테고리의 다른 글
BigQuery - Partitioning, Clustering, Sharding (0) | 2022.02.21 |
---|---|
[PDE CERTIFICATE - EXAMTOPIC] DUMPS Q121-Q125 (0) | 2022.02.17 |
[PDE CERTIFICATE - EXAMTOPIC] DUMPS Q111-Q115 (0) | 2022.02.17 |
[PDE CERTIFICATE - EXAMTOPIC] DUMPS Q96-Q100 (0) | 2022.02.17 |
[PDE CERTIFICATE - EXAMTOPIC] DUMPS Q91-Q95 (0) | 2022.02.17 |