Your data pipeline hums along in Databricks, but your API tasks live on Cloud Run. The moment you try to connect them, you hit a wall of permissions, tokens, and identity sprawl. Sound familiar? This is where Cloud Run Databricks integration either shines or eats your weekend.
Cloud Run runs stateless services that autoscale and play nicely in Google Cloud. Databricks, meanwhile, rules the analytics world with distributed compute built on Spark. Used together, they let you orchestrate scalable data transformations triggered by APIs, events, or cron jobs, all without managing clusters full time. The magic is in making identity and data flow cleanly between the two.
Here’s how it fits together. Cloud Run acts as the execution layer for lightweight workloads: a job trigger, a webhook receiver, or a batch orchestrator. Databricks holds your heavy compute: ML training, ETL, or streaming analytics. You configure Cloud Run to invoke Databricks jobs using OAuth or a service principal, authenticate through an identity provider like Google or Okta, and ensure the Databricks token refreshes automatically. The result is a secure bridge between ephemeral services and long-running data clusters.
You do not need to micromanage secrets or build custom token logic. Instead, use fine-grained IAM policies. Map Cloud Run’s service identity to a Databricks workspace role that limits access to specific jobs or notebooks. Rotate access tokens on a short TTL schedule, and keep logs in Cloud Logging for audit trails. If something fails, check the Databricks REST job status rather than debugging broken webhooks.
Quick answer: To connect Cloud Run and Databricks, create a Databricks service principal, store its token in Secret Manager, assign that secret to Cloud Run, and have your app call the Databricks Jobs API. That’s the cleanest and most secure pattern for Cloud Run Databricks integration.