Redshift tips

I am using Redshift since two years ago, and as every database it has its SQL dialect and its secrets. I will write here everything I discover and it is worth to be annotated.

Sep 28 2017

AWS

SQL

psql

Use psql! It is a great tool: I use it for ETL scheduled processes and on a daily basis to interact with Redshift.

See how to install [AWS Redshift compatible PostgreSQL client]/2016/05/aws-redshift-compatible-psql/ article.

Temporary tables

Prepend an hash (#) to table name to create a temporary table.

CREATE TABLE #mytable AS SELECT 1;

Table #mytable will be clean up when session will be closed.

Two simultanei sessions can create a temporary table with the same name with no conflicts.

You can also create a tmp schema, launching the following statement

/**
 * Statement above must be executed as admin
 *
 * NOTA BENE: replace <myuser> with the user that actually will
 *            use the tmp schema.
 */
CREATE SCHEMA tmp;
GRANT CREATE, USAGE ON SCHEMA tmp TO <myuser>;

The following bash script can ne scheduled to clean up the tmp schema.

#!/bin/bash

psql -A -t -q -c "
SELECT 'DROP TABLE tmp.' || table_name || ';'
FROM information_schema.tables
WHERE table_schema = 'tmp'
;
" | psql

yyyymmdd

I use an integer field to store dates in the YYYYMMDD format. It is really handy cause it can be sorted naturally, even in other contexts like folder or file naming conventions. It is true that first two digits 20xxxxxx are redundant but they make it really easy to humans to recognize it is a date, like 20170928 the day this article was published. So I use this format in many other databases, it crosses contexts easily and in this section I will write the snippets I use to convert it or manipulate it in Redshift, however…

Consider also to use the YYYY-MM-DD string format that has the same benefits and has also the advantage to be easier to cast in Redshift, for instance SELECT SYSDATE::DATE returns current timestamp in this format.

Convert timestamp to yyyymmdd

SELECT TO_NUMBER(TO_CHAR(SYSDATE, 'YYYYMMDD'), 99999999);
SELECT TO_CHAR(SYSDATE, 'YYYYMMDD')::INTEGER;

First day of last week

 SELECT TO_CHAR(DATE_ADD('day', -7, DATE_TRUNC('week', SYSDATE)), 'YYYYMMDD');

Read given date from environment, for example in your bash script set YYYYMMDD to yesterday

YYYYMMDD=$(date --date="yesterday" +%Y%m%d)

Then in your psql session

\set yyyymmdd `echo \'$YYYYMMDD\'`
SELECT CAST(:yyyymmdd AS INTEGER) AS yyyymmdd;

How to connect via SSH from AWS CloudShell to EC2 instance: Just a quick list of actions and tricks to write down how to connect from AWS CloudShell to an EC2 instance
AWS Lambda npm scripts: AWS Lambda is great! But even better, there is no need to add any framework on top for management. You can use npm scripts.
Optimize Maxmind database loaded on Redshift using Analytical functions: If you need to associate an IP address to a country or a city probably you will use MaxMind data. If you load it in a relational database you will write a SQL statement that joins your traffic data with MaxMind data, which can be really heavy. This is an attempt to optimize queries by reducing the number of MaxMind data rows.
Getting started with PostGIS: PostGIS is a PostgreSQL extension that adds support for geographic objects allowing location queries to be run in SQL.
AWS Redshift compatible PostgreSQL client: How to install a PostgreSQL client (psql) that is compatible with AWS Redshift
S3 bucket public by default: How to make an Amazon S3 bucket public by default.
S3 to RedShift loader: Load data from S3 to RedShift using Lambda, powered by apex. Our goal is: every time the AWS Elastic load balancer writes a log file, load it into RedShift.
How to drop a user on Netezza: You are trying to drop a user but Netezza complains cause it "owns objects"? This article shows how to solve this problem.
How to collect Netezza history: Collecting your Netezza query history is a mandatory step before optimization. Read this article to know how to collect Netezza history easily.
Use nzpassword!: How to authenticate securely on Netezza.
sqlplus tips: Tricks and tips about your everyday Oracle sqlplus usage.
How to install DBD::Oracle: I am going to put here all the steps required to install DBD::Oracle CPAN module, which is not a straightforward installation. The purpose is to reduce headache and turn it into a repeatable process.