Testing the new app version : Canary, A/B, Shadow

[PMLE-EXAMTOPIC] Comparison on testing pattern - Canary, A/B test

This post is about the specific examtopic on Google Professional Machine Learning Engineer Certificate.

Six Strategies for Application Deployment – The New Stack
- Recreate : Version A is terminated then version B is rolled out.
- Ramped (= rolling-update or incremental): Version B is slowly rolled out and replacing version A.
- Blue/Green : Version B is released alongside version A, then the traffic is switched to version B.
- Canary : Version B is released to a subset of users, then proceed to a full rollout.
- A/B testing : Version B is released to a subset of users under specific conditionTargeted users.
- Shadow : Version B receives real-world traffic alongside version A and doesn’t impact the response.

Testing the new app version : Canary, A/B, Shadow

Google Kubernetes Engine (GKE)

  • software deployment strategies : recreate, rolling update, and blue/green
  • testing strategies : canary, shadow, and A/B
Canary test pattern
Partially roll out the new version of your application to a subset of users
Evaluate its performance against a baseline deployment.
Deploy a new version of your application alongside the production version. → Split and route a percentage of traffic from the production version to the canary version and evaluate the canary's performance.
Recommended evaluation : Compare the canary against an equivalent baseline and not the live production environment.
Partial rollout can follow various partitioning strategies. (If the application has geographically distributed users, you can roll out the new version to a region or a specific location first.)
A/B test
Best used to measure the effectiveness of functionality in an application
Release the new version of application to a subset of users defined by specific conditions (e.g, location, browser version, or user agent)
Test a theory or hypothesis
  • Considerations
    • Complex setup : A/B tests need a representative sample that can be used to provide evidence that one version is better than the other.
      • Need to pre-calculate the sample size (e.g, by using an A/B test sample size calculator) → run the tests for a reasonable period to reach statistical significance of at least 95%.
    • Validity of results : Several factors can skew the test results, including false positives, biased sampling, or external factors (e.g, seasonality or marketing promotions).
    • Observability : When running multiple A/B tests on overlapping traffic, monitoring and troubleshooting can be difficult.
      • e.g, If testing product page A versus product page B, or checkout page C versus checkout page D, distributed tracing becomes important to determine metrics such as the traffic split between versions.
Shadow test
Deploy and run a new version alongside the current version, but in such a way that the new version is hidden from the users
An incoming request is mirroredand replayed in a test environment. : This process can happen either in real time or asynchronously after a copy of the previously captured production traffic is replayed against the newly deployed service.
Canary test A/B test Shadow test
Targeted Users
+ Ability to test live production traffic. + Several versions run in parallel. + Zero production impact : do not alter the existing production environment or the user state.
+ Fast rollback : redirecting the user traffic to the older version of the application + Full control over the traffic distribution. - Expensive as it requires double the resources.
+ Zero downtime - Complex setup : Requires intelligent load balancer. - Requires mocking service for certain cases.
- Slow rollout : Each incremental release requires monitoring for a reasonable period and, as a result, might delay the overall release. Canary tests can often take several hours. - Hard to troubleshoot errors for a given session; Mandatory distributed tracing. - Cost and operational overhead : complex to set up

Summary

Testing Pattern Zero downtime Real production traffic testing Releasing to users based on conditions Rollback duration Impact on hardware and cloud costs
Recreate Fast but disruptive because of downtime No extra setup required
Rolling update Slow Can require extra setup for surge upgrades
Blue/green Instant Need to maintain blue and green environments simultaneously
Canary Fast No extra setup required
A/B Fast No extra setup required
Shadow Does not apply Need to maintain parallel environments in order to capture and replay user requests