❄️ Latest: Snowflake customers — Learn how to connect Snowflake to everything! ❄️

follow or visit us on
Learning

Replicate tables from PostgreSQL to Snowflake

Glenn Gillen
Glenn Gillen
VP of Product, GTM
LearningReplicate tables from PostgreSQL to Snowflake

Your applications are already writing data to PostgreSQL and now you need to get that data, and any changes to it, into your Snowflake Data Cloud. I can show you how to avoid any complicated batch ETL process! In less than 15 minutes we'll use PostgreSQL's built-in replication features to update our data in real-time with Snowflake. And, we'll do it using a private point-to-point connection that doesn't require you to manage IP allow lists, open firewall ports, or setup services like PrivateLink.

Introducing the Snowflake Pull to PostgreSQL Connector!

🐘 + ❄️

Connecting to your PostgreSQL database to Snowflake can be a real pain depending on your network topology. It would be convenient if both systems were accessible via a public address, but that's a significant increase in risk for a systems that handle a lot of important data. Managing IP allow lists and updating firewall ingress rules improves security but can be cumbersome to manage. Alternatives like PrivateLink are better, but they too can be cumbersome to setup and require your systems to be on the same public cloud and in the same region.

In this post I'm going to show you how to securely connect Snowflake to your private PostgreSQL database, in just a few minutes. We will:

  • Setup a PostgreSQL database in AWS with logical replication enabled
  • Connect PostgreSQL to Snowflake with a private encrypted connection
  • Configure Snowflake to receive and store the data from PostgreSQL

Snowflake push to PostgreSQL

Amazon Relational Database Service (RDS) for PostgreSQL

We're going to provision an Amazon RDS PostgreSQL Database so we can see an end-to-end experience of data moving from PostgreSQL to Snowflake. If you have an existing PostgreSQL database you're able to use you can skip this step.

Create a PostgreSQL database

Within your AWS Console search for RDS in the search field at the top and select the matching result. Visit the Databases screen, and then click Create Database.

The Standard Create option provides a good set of defaults for creating a RDS database, so unless you've previous knowledge or experience to know you might want something different I'd suggest choosing "PostgreSQL" and confirming the details and then clicking Create database at the bottom of the screen.

Once you've started the database creation it may take about 15 minutes for provisioning to complete and for your database to be available.

Create a database

Enable logical replication on PostgreSQL

To enable logical replication on your PostgreSQL database, you'll need to modify some configuration parameters. In the RDS console:

  1. Go to the "Configuration" tab of your RDS instance
  2. Click on the parameter group linked to your instance
  3. Modify the following parameters:
    • Set rds.logical_replication to 1
  4. Save changes and reboot your RDS instance

Create a table

Create a table in your PostgreSQL database to store the data, that we will then replicate to Snowflake. In addition, you may also need to set the replica identity to FULL for the table.


_10
CREATE TABLE public.customers
_10
(
_10
id INTEGER,
_10
customer_name VARCHAR
_10
);
_10
_10
ALTER TABLE customers REPLICA IDENTITY FULL;

Connect PostgreSQL to Snowflake

We've created a PostgreSQL database with logical replication enabled. It's now time to connect it to Snowflake! The next stage is going to complete the picture below, creating a point-to-point connection between the two systems — without the need to expose any systems to the public internet!

PostgreSQL connecting to Snowflake

Get the app

The Snowflake Pull from PostgreSQL Connector by Ockam is available in the Snowflake Marketplace.

Select a warehouse

The first screen you're presented with will ask you to select the warehouse to use, you can also choose to change the application name.

Grant account privileges

Click the Grant button to the right of this screen. The app will then be automatically granted permissions to create a warehouse and create a compute pool.

Activate app

Once the you've granted the permissions, an Activate button will appear. Click it and the activation process will begin.

Launch app

After the app activates you'll see a page that summarizes the privileges that the application now has. There's nothing we need to review or update on these screens, so proceed by clicking the Launch app button.

Get the appSelect a warehouseGrant account privilegesActivate appLaunch app

Launch Ockam node for Amazon RDS

The Ockam Node for Amazon RDS is a streamlined way to provision a managed Ockam Node within your private AWS VPC.

To deploy the node that will allow Snowflake to reach your Amazon RDS PostgreSQL database visit the Ockam Node for Amazon RDS PostgreSQL listing in the AWS Marketplace, and click the Continue to Subscribe button, and then Continue to Configuration.

On the configuration page choose the region that your Amazon RDS cluster is running in, and then click Continue to Launch followed by Launch.

Enter stack details

The initial Create Stack screen pre-fills the fields with the correct information for your node, so you can press Next to proceed.

Enter node configuration

This screen has important details to you need to fill in:

  • Stack name: Give this stack a recognisable name, you'll see this in various locations in the AWS Console. It'll also make it easier to clean these resources up later if you wish to remove them.
  • VPC ID: The ID of the Virtual Private Cloud network to deploy the node in. Make sure it's the same VPC where you've deployed your RDS instance.
  • Subnet ID: Choose same subnet where your RDS instance is deployed.
  • Enrollment ticket: Copy the contents of the postgres.ticket file we created earlier and paste it in here.
  • RDS PostgreSQL Database Endpoint: In the Connectivity & security section for your Amazon RDS database you will find Endpoint details. Copy the Endpoint value for the Private RDS database that's in the same subnet you chose above.
  • JSON Node Configuration: Copy the contents of the postgres.json file we created earlier and paste it in here.

Once the stack has completed, you will need to modify the security group on your RDS instance to allow access on port 5432 from the instance that was just created.

We've now completed the highlighted part of the diagram below, and our Amazon RDS PostgreSQL node is waiting for another node to connect.

Amazon RDS PostgreSQL setup

Ockam node for Amazon RDS PostgreSQLOckam node for Amazon RDS - create stack screenOckam node for Amazon RDS PostgreSQL - node configuration screen

Create a table in Snowflake

Create a table in Snowflake to store the data that will be replicated from PostgreSQL.


_10
CREATE OR REPLACE TABLE customers (
_10
id INTEGER,
_10
customer_name VARCHAR
_10
);

Configure connection details

Click "Get started" to open the Snowflake setup screen.

Take the contents of the file snowflake.ticket that we just created and paste it into "Provide the above Enrollment Ticket" form field in the "Configure app" setup screen in Snowflake.

Grant privileges

To be able to authenticate with Ockam Orchestrator and then discover the route to our outlet, the Snowflake app needs to allow outbound connections to your Ockam project.

Toggle the Grant access to egress and reach your Project and approve the connection by pressing Connect.

Toggle the Grant access to your Postgres database and enter the username and password for your PostgreSQL database and store it as a secret in snowflake.

Toggle the Grant access to the tables that will store the replicated data and select the table(s) that will store the replicated data from Snowflake.

Map Snowflake stream to PostgreSQL table

Snowflake receives each stream of changes to a table in PostgreSQL database, and we need to define the database and mapping between each. Enter the name of the tables in PostgreSQL matching with the tables in Snowflake.

Tables with same schema should exist in both the PostgreSQL and Snowflake databases.

Create Snowflake ticketGrant egressMap Snowflake tables to PostgreSQL tables

With that, we've completed the last step in the setup. We've now got a complete point-to-point connection that allows our Snowflake warehouse to securely receive table replication data through to our private PostgreSQL database.

Snowflake push to PostgreSQL setup complete

Next steps

Any updates to your data in your PostgreSQL table will now sync those changes to your Snowflake table, with all changes sent over your end-to-end encrypted Ockam portal. To see it in action insert a row into your PostgreSQL table, then use your usual tooling (for example, a Snowflake Worksheet) to query the corresponding table in Snowflake.

If you'd like to explore some other capabilities of Ockam I'd recommend:

Previous Article

Run Queries on Remote PostgreSQL from Snowflake

Next Article

Sync tables from Snowflake to PostgreSQL

Edit on Github

Build Trust

Learn

Get Started

Ockam Command

Programming Libraries

Cryptographic & Messaging Protocols

Documentation

Blog

© 2024 Ockam.io All Rights Reserved