You just need one nightly sync to fail to ruin the next morning. Dashboards stall, alerts trigger, someone blames IAM again. If that sounds familiar, it’s probably time to look at how Dataflow and Redshift really flow together.
Dataflow is Google’s managed pipeline service that moves and transforms data at scale. Redshift is AWS’s columnar data warehouse built for analytical queries. They live in different clouds, speak different dialects, and prefer different types of keys. Yet, teams insist on connecting them because when it works, it’s magic.
Combining Dataflow and Redshift gives you cross-cloud analytics without hand stitching ETL jobs. Dataflow handles the heavy lifting: scaling compute, parallelizing transformations, and managing retries. Redshift does what it does best: serve clean, compressed data for fast queries. The catch is identity. You need the right credentials, tokens, and permissions to keep the flow secure — something only half the internet docs bother to clarify.
The modern workflow looks like this: OAuth to your identity provider, temporary AWS STS tokens mapped to Dataflow’s service account, then a JDBC or Python connector streaming data into Redshift. Each step must align: roles in IAM, policies in Redshift, and access scopes in Dataflow. If even one mismatches, you get “AccessDenied” in red and another Slack thread nobody wants to read.
Here is the 60-second summary most engineers search for: To connect Dataflow to Redshift, create short-lived credentials via AWS STS, store them securely with your pipeline metadata, and rotate them automatically after each job. The key is automation that respects both sides’ security models.