Big Query (bq cli)
This guide shows how to create a layer of control on top of the bq command line
Big Query Setup
- Setup the Bigquery following the topics bellow
- Create a service account with the following permissions:
- BigQuery User
- BigQuery Data Viewer
- Create a JSON type service account key and store in a safe place
For the sake of this guide, we’re following the gcp guide to interact with bigquery.
In case you already have a testing dataset, you can skip this part
In your local machine, start hoop
Interacting with Datasets
bq queryquery Connection
Add the JSON service account into the
This will allow interacting with the bq command line
- Try to execute a query
hoop exec -v bqquery -i 'SELECT * FROM babynames.names_2014 LIMIT 1'
Interacting with Jobs
- Try to fetch it with
bq head -j <session-id>
The previous command was adding the session id as the job id of executed queries on bigquery, thus we could use the previous session id to fetch it.
hoop exec bqhead -- -j <PREVIOUS_SESSION_ID>
The bq command line has other actions that could be used as patterns with Hoop.
Allow jobs to be async
$ bq query --use_legacy_sql=false --nosynchronous_mode
This allow the pattern of consulting the previous job with
bq head --job_id .... It’s useful for queries that take too long to execute.
Dry run allows showing how much of bytes will be consumed
$ bq query --use_legacy_sql=false 'SELECT * FROM babynames.names_2014 limit 5' --dry_run Query successfully validated. Assuming (...), running this query will process 654791 bytes of data.
A plugin could allow queries that has passed by the
Command Line Quota
bq allows specifying the maximum bytes billed, this option could be used as a retriction mechanism
$ bq query \ --use_legacy_sql=false \ --maximum_bytes_billed 1 \ 'SELECT * FROM babynames.names_2014 limit 100;' BigQuery error in query operation: Error processing job 'myproject:bqjob_r107f63c4f31d3d37_000001860d50d5f8_1': Query exceeded limit for bytes billed: 1. 10485760 or higher required.