Using BigQuery Omni with AWS

Using BigQuery Omni with AWS

Introduction

As businesses increasingly adopt multi-cloud strategies, it's becoming more important for cloud providers to work together seamlessly. In this context, BigQuery Omni, a multi-cloud analytics solution from Google Cloud, offers an interesting option for organizations that use both Google Cloud and AWS. BigQuery Omni allows users to analyze data across multiple clouds, including AWS, without having to move data around or manage complex data pipelines.

Use cases

  • Analytics: With BigQuery Omni, organizations can perform advanced analytics on their data by leveraging its powerful SQL-based querying capabilities. They can easily combine data from AWS with data from other sources to gain deeper insights into their business operations.

  • Data Warehousing: Organizations can use BigQuery Omni with AWS to create a data warehouse solution that spans across their multi-cloud environment. They can store, manage and analyze large volumes of data in a cost-effective and efficient manner.

  • Machine Learning: Organizations can use BigQuery Omni with AWS to train and deploy machine learning models at scale. By leveraging the powerful machine learning capabilities of BigQuery, they can develop predictive models that can be applied to a variety of business scenarios.

Implementation

The first step was to set up a connection between BigQuery and AWS, to do this first I created an IAM policy for BigQuery, I did this by navigating to AWS IAM and selecting "bigquery-omni-connection-policy" and editing the policy by replacing it with this code and saving it.

Next, I added the data to Google Cloud by going to the google cloud console selecting BigQuery then "SQL workspace" then "ADD" and choosing "Connections to external data sources"

and filling out the information. The AWS role id was gotten from the ARN section of the bigquery-omni-connection in the IAM roles tab in the AWS console.

The next step was to Add a Trust Relationship to the AWS role. The trust relationship lets the BigQuery AWS connection assume the role and access the S3 data as specified in the roles policy. This is done in the roles section of AWS IAM console, I selected bigquery-omni-connection and verified the settings were correct which they were

Running Queries

BigQuery Omni does not manage data stored in Amazon S3. To access S3 data, you would have to define an external table.

To do this I created a BigQuery dataset in the same region and gave it the ID "bq_omni_demo".

Then I created a table in the dataset

Creating an external table

BigQuery Omni lets you query the external table like any BigQuery table. Next, I queried the table I just created in a new tab using

SELECT * FROM `S3 bucket name.bq_omni_demo.bq-omni-table`

and got this result

Exporting

BigQuery Omni also let you export the result of a query made against an external table to Amazon S3. BigQuery Omni writes to the specified Amazon S3 location irrespective of any existing content, it could overwrite existing data or mix the query result with existing data

To do this I created and ran the following statement

EXPORT DATA WITH CONNECTION `aws-us-east-1.bq-omni-aws-connector`
OPTIONS(uri="s3://S3 bucket name/exports/*", format="CSV")
AS SELECT * FROM `S3 bucket name.bq_omni_demo.bq-omni-table`

which successfully created a file in my s3 bucket

Conclusion

In conclusion, BigQuery Omni offers an excellent multi-cloud analytics solution for organizations that use both Google Cloud and AWS. It allows users to analyze data across multiple clouds without having to move data around or manage complex data pipelines, enabling them to gain deeper insights into their business operations. Additionally, organizations can create a data warehouse solution that spans across their multi-cloud environment, store and manage large volumes of data in a cost-effective and efficient manner, and train and deploy machine learning models at scale.

While implementing BigQuery Omni, I learned how to set up a connection between BigQuery and AWS and how to create an external table to query S3 data. I also learned how to export data to Amazon S3 from BigQuery Omni. Overall, this experience has given me a better understanding of how multi-cloud analytics solutions work and how they can benefit organizations. In the future, I believe I am now capable of implementing BigQuery Omni for clients who use both Google Cloud and AWS to help them improve their analytics capabilities and gain deeper insights into their business operations.