Using FFmpeg inside Databricks requires more than dropping in a binary. Databricks Access Control manages who can run commands, access storage, and interact with cluster resources. Without the right configuration, even simple media processing pipelines will fail.
1. Understand the constraints
Databricks clusters do not ship with FFmpeg by default. You must install it at runtime or bake it into a custom cluster image. At the same time, Databricks Access Control enforces permissions at the workspace, cluster, and table level. If your job or notebook does not have the rights to install packages, read input files, and write results, your FFmpeg calls will be blocked.
2. Enable cluster-level permissions
Set up cluster policies that allow library installs and script execution. In Databricks, navigate to Admin Console > Access Control, then grant your user or service principal “Can Attach To” and “Can Restart” permissions for the target cluster. Without these, you cannot fully control the runtime environment.
3. Install FFmpeg in Databricks
On runtime start, run %sh apt-get update && apt-get install -y ffmpeg. For managed environments with locked-down networking, use a private package repo or store the FFmpeg binary in DBFS. Ensure your Databricks Access Control settings permit executing shell commands from notebooks or jobs.