Pickle in the Middle: The Vertex AI SDK Vulnerability That Exposes ML Infrastructure's Weakest Link

A bucket-squatting flaw in Google's Vertex AI SDK let attackers hijack model uploads and execute arbitrary code. It's the second predictable-bucket bug in one year — and it reveals how ML toolchains inherit Python's oldest security mistakes.

Here’s a security vulnerability that should make every ML engineer running Python on Google Cloud check their SDK version right now: a flaw in the Vertex AI SDK for Python let an attacker hijack your model uploads and execute arbitrary code inside Google’s serving infrastructure — without ever having access to your project.

Palo Alto Networks Unit 42 discovered and reported the bug through Google’s bug bounty program. They gave it a memorable name: Pickle in the Middle. And as of publication, there’s no CVE assigned yet.

The attack doesn’t require sophisticated exploits or zero-day vulnerabilities. It exploits a design decision that seems harmless until you think about what it means in a multi-tenant cloud: predictable bucket names combined with missing ownership verification.

How the Attack Works

The vulnerable model upload workflow in Vertex AI SDK versions 1.139.0 and 1.140.0 used a staging bucket whose name was derived exclusively from the customer’s project ID and region. If you didn’t set an explicit staging_bucket parameter, the SDK generated a name like project-vertex-staging-region automatically.

That’s convenient. It’s also predictable.

Since no two buckets across all of Google Cloud can share the same name, an attacker who can predict your bucket name can preemptively create it in their own project. Here’s the kill chain:

  1. Predict the victim’s staging bucket name from their project ID and region (both often discoverable)
  2. Create a bucket with that exact name in the attacker’s own GCP project
  3. Wait for the victim’s SDK to upload model artifacts — the bucket existence check passes, but the SDK never verifies who owns it
  4. Replace the uploaded model with a poisoned version during a narrow race-condition window before Vertex AI’s service agent retrieves it
  5. Execute arbitrary code when the serving infrastructure loads the poisoned model

Step 5 is where Python’s oldest security ghost walks back into the room.

Why Pickle Is the Problem

Machine learning models in Python are commonly serialized using pickle or Joblib formats. If you’ve ever saved a scikit-learn model with joblib.dump() or loaded a PyTorch checkpoint, you’ve used pickle under the hood.

Pickle deserialization can execute arbitrary Python code through specially crafted objects. This is not a bug in pickle — it’s a documented behavior. Pickle was designed to serialize and reconstruct arbitrary Python objects, and reconstructing an object means executing its __reduce__ method. A malicious pickle payload just needs to define what code runs during reconstruction.

So the attack chain is: poisoned model file → uploaded to attacker-controlled bucket → retrieved by Vertex AI service agent → deserialized by pickle → arbitrary code execution inside Google’s serving infrastructure.

The researchers called this “Pickle in the Middle” because it’s a man-in-the-middle attack where the weapon is pickle deserialization, not network packet interception.

This Is the Second Time This Year

Here’s the part that should worry ML platform designers: this is the second predictable-bucket-name flaw to surface in Vertex AI this year.

In February 2026, Google patched CVE-2026-2473, a separate bucket-squatting bug in Vertex AI Experiments that also allowed cross-tenant code execution, model theft, and poisoning. Same pattern: predictable names, insufficient verification, cross-tenant impact.

When a platform has the same class of vulnerability twice in six months, it’s not coincidence. It suggests a systemic design pattern where convenience (auto-generated names, implicit resource sharing) is prioritized over security boundaries that matter in a multi-tenant environment.

Unit 42 has also previously traced related attack paths — from deployed AI agents into customer and tenant data — through Vertex AI’s default service-agent permissions. The surface area is expanding faster than the hardening.

What Versions Are Affected

The vulnerable workflow was introduced in SDK versions 1.139.0 and 1.140.0. Google deployed fixes in versions 1.144.0 and 1.148.0.

You need to be on 1.144.0 or later (or 1.148.0 or later) for the ownership check to be active.

How to Protect Your ML Pipeline

Three actions, in order of priority:

1. Upgrade the SDK everywhere

Check the google-cloud-aiplatform version everywhere it runs. That means:

  • Production serving environments
  • CI/CD pipelines and build jobs
  • Jupyter notebooks and development environments
  • Training scripts on remote compute
  • Any automated model deployment tooling

It’s common for development environments to lag behind production on SDK versions. An attacker doesn’t need to hit your production cluster — they just need to hit anywhere your models are uploaded.

2. Set an explicit staging bucket

If you’re not already doing it, specify your own Cloud Storage bucket for staging:

from google.cloud import aiplatform

aiplatform.init(
    project="your-project-id",
    location="us-central1",
    staging_bucket="gs://your-explicitly-controlled-bucket/"
)

When you control the bucket, you control the ownership. No prediction, no squatting.

3. Audit your model supply chain

This vulnerability highlights a broader problem in MLOps: model artifacts moving through infrastructure are treated as data, but they’re actually executable code. Every model file that passes through your pipeline — training outputs, registry entries, deployment packages — can carry arbitrary code if it uses pickle serialization.

Consider:

  • Using safer serialization formats where possible (e.g., Safetensors for model weights)
  • Verifying model file integrity with checksums before loading
  • Isolating model loading in sandboxes or containers with minimal privileges
  • Reviewing who has permissions to create buckets in your GCP organization

The Bigger Picture

The Pickle in the Middle vulnerability sits at the intersection of three problems that the ML engineering community needs to address together:

Python’s serialization model was designed for convenience, not security. Pickle has been known to be unsafe with untrusted data for decades, yet it remains the default for ML model serialization because it’s the most capable format for Python objects.

Cloud ML platforms are building increasingly complex multi-tenant architectures where resources from different customers can interact. The abstraction layers that make these platforms easy to use also obscure the security boundaries that matter.

ML supply chain security is still catching up to traditional software supply chain practices. Model registries, feature stores, and artifact repositories often lack the signing, verification, and access control that container registries and package managers take for granted.

What to Do Today

If you’re using google-cloud-aiplatform for Python:

pip show google-cloud-aiplatform

If the version is below 1.144.0, update it:

pip install --upgrade "google-cloud-aiplatform>=1.148.0"

And if you’ve been using the default staging bucket behavior, switch to an explicitly controlled bucket. The convenience of auto-generated names isn’t worth the attack surface it creates.

The Pickle in the Middle vulnerability is a reminder that in ML infrastructure, your models aren’t just data — they’re code. And the pipelines that move them need to be secured accordingly.

Spread The Article

Share this guide

Send this article to your network or keep a copy of the direct link.

X Facebook LinkedIn Reddit Telegram

Discussion

Leave a comment

No comments yet

Be the first to start the conversation.