The Twelve Days of AWS: Redshift

Saturday, January 04, 2020

12 Days of AWS Day 12 written around snowflakes with a penguin wearing a candy cane sweater

To talk about Redshift we probably need to talk about Data Warehousing, as Redshift is a fully managed Data Warehouse package. This is not putting your data in a ‘digital warehouse’ to go and gather virtual dust on the back of a virtual shelf somewhere and is more akin to a single source of truth for the state of a business and its information. Whilst Redshift is based on PostgreSQL and can be queried with normal SQL, it is not meant to be a DataBase per se. Data Warehousing is usually less normalized data, not used for daily operations of a system, but can also contain multiple different sources of data, and is meant to be used for Analytics, to give the Business a view of what is happening, without having to resort to multiple different data sources.

Data Warehousing solutions are designed to be able to run complex queries over Terabytes or even Petabytes of information, and to consolidate that information for operations such as Data Mining and Market Research, and are not optimized for high concurrency or transactional data.

Whilst in a small business, a copy of the RDBMS used offline might be good enough, when it comes to a large amount of historical data from various product databases, web analytics, search logs, and pricing data, a solution like Redshift, which does not require any extra server expense or complication and is self-managed, will be a lot more cost-effective.