Cloud Pub/Sub and Kafka

SOURCE/REFERENCE: Migration from Kafka to Pub/Sub

Pub/Sub

Pub/Sub is an asynchronous messaging service.

  • Pub/Sub decouples services that produce events from services that process events.
  • Use Pub/Sub as messaging-oriented middleware or event ingestion and delivery for streaming analytics pipelines.
    • A publisher application creates and sends messages to a topic.
      • Subscriber applications create a subscription to a topic to receive messages from it.
      • A subscription is a named entity that represents an interest in receiving messages on a particular topic.
  • Deployed in all Google Cloud regions for high availability and low latency.
  • Pub/Sub directs publisher traffic to the nearest Google Cloud data center where data storage is allowed, as defined in the resource location restriction policy.
  • Pub/Sub can integrate with many Google Cloud services such as Dataflow, Cloud Storage, Cloud Run.
    • You can configure these services to serve as data sources that can publish messages to Pub/Sub, or as data sinks that can receive messages from Pub/Sub.
  • Pub/Sub exports metrics by using Cloud Monitoring.

Kafka

Apache Kafka is an open source, distributed, event-streaming platform, and it enables applications to publish, subscribe to, store, and process streams of events.

  • The Kafka server is run as a cluster of machines that client applications interact with to read, write, and process events.
  • Use Kafka to decouple applications, send and receive messages, track activities, aggregate log data, and process streams.
  • Within the Kafka cluster, some nodes in the cluster are designated as brokers.
    • Brokers receive messages from producers and store them on disk.
    • Stored messages are organized by topic and partitioned across several different brokers in the cluster. New events published to a topic are appended to the end of one of the topic's partitions. Consumers can then fetch messages from brokers, which are read from disk and sent to the consumer.

Apache Kafka $vs.$ Pub/Sub

  • Kafka brokers manage multiple ordered partitions of messages, represented by the horizontal rows of messages. Consumers read messages from a particular partition that has a capacity based on the machine that hosts that partition.
    • Message ordering within Partitions.
  • Pub/Sub does not have partitions, and consumers instead read from a topic that autoscales according to demand.
    • Message ordering within topics.
    • Pub/Sub scales automatically based on demand.
  • You configure each Kafka topic with the number of partitions that you require to handle the expected consumer load.

EXAMTOPIC Q 117.

You are designing a data processing pipeline. The pipeline must be able to scale automatically as load increases. Messages must be processed at least once and must be ordered within windows of 1 hour. How should you design the solution?

  • A. Use Apache Kafka for message ingestion and use Cloud Dataproc for streaming analysis.
  • B. Use Apache Kafka for message ingestion and use Cloud Dataflow for streaming analysis.
  • C. Use Cloud Pub/Sub for message ingestion and Cloud Dataproc for streaming analysis.
  • D. Use Cloud Pub/Sub for message ingestion and Cloud Dataflow for streaming analysis.
    Pub/Sub can integrate with many Google Cloud services such as Dataflow, Cloud Storage, Cloud Run.
    You can configure these services to serve as data sources that can publish messages to Pub/Sub, or as data sinks that can receive messages from Pub/Sub.