Which GCP service to use - Orchestration : Scheduler, Composer, Workflows

3 GCP services for Orchestration : Scheduler, Composer, Workflows

  • Cloud Scheduler
    • Managed cron job service
    • for schedule driven single-service orchestration
  • Cloud Composer
    • Managed workflow orchestration service
    • for orchestration of your data workloads
  • Cloud Workflows
    • HTTP services orchestration
    • for complex multi-service orchestration
Decision tree

Composer $vs.$ Workflows

Orchestrating multiple services , Handling long running workflows ⇒ Cloud Composer & Workflows
Cloud Composer

Commonly used for orchestrating the transformation of data as part of ELT or data engineering or workflows

  • Handle a delay of a few seconds between task executions
  • Building a batch orchestration workflow for data engineering. (ETL)
  • Collection of tasks can be modeled as a Directed Acyclic Graph (DAG) Workflows.
  • Benefit from Airflow operators, especially strong for data engineering.
  • Have an existing investment or experience in Airflow DAGS.
  • Benefit from the open source nature of Apache Airflow project.
  • NOT suitable if low latency was required in between tasks
  • Need to specify how many workers you need for a given Composer environment
Workflows

Focused on the orchestration of microservices / HTTP-based services built with Cloud Functions, Cloud Run, SaaS, or other APIs.

  • Designed for latency sensitive use cases : low latency or have a high execution count.
  • Orchestrate microservices built with Cloud Functions. Cloud Run, SaaS, or other APIs.
  • Serverless : no infrastructure to manage or scale
    • No need to specify how many workers you need
    • Follow spiky traffic patterns and need to scale in a serverless way
  • Require loops and jumps to already executed steps (not a DAG)

EXAMTOPIC 1

Your data science team has requested a system that supports scheduled model retraining, Docker containers, and a service that supports autoscaling and monitoring for online prediction requests. Which platform components should you choose for this system?

  • A. Kubeflow Pipelines and App Engine
  • ⭕ B. Kubeflow Pipelines and Al Platform Prediction
    Kubeflow is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers. (probably the most commonly used functionality of kubeflow)
    → AI platform is a service supporting autoscaling and online prediction requests
  • C. Cloud Composer, BigQuery ML , and Al Platform Prediction
  • D. Cloud Composer, Al Platform Training with custom containers, and App Engine
  • Cloud Composer is NOT suitable if low latency was required in between tasks.
    • online prediction requests : latency sensitive usecases