Migrating from MySQL to PostgreSQL

Jeroen Ruigrok van der Werven

Copyright information

MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.

2019-11-07

Abstract

This reference guide describes how to migrate from MySQL to PostgreSQL on all levels.


Table of Contents

Work in Progress

Please note that this document is undergoing revision. Be advised that some information might be outdated or not fully fleshed out. And the structure absolutely sucks right now.

2019-11-07 update: After a long time I am updating this document by first overhauling the HTML. No content updates just yet.

1. Thanks

Thanks goes out to these individuals for their comments and suggestions (alphabetical order) :

  • David Fetter

  • Robert Treat

Preface

This guide hopes to detail the changes a person must make to his/her database design in order to switch from MySQL to PostgreSQL. The database versions used in the comparison/conversion testing are MySQL 5.x and PostgreSQL 8.3.x. No doubt a lot of the information presented here also applied and applies to past and future versions.

2. Audience

This guide is aimed at people who are migrating from MySQL to PostgreSQL.

3. Prerequisite Knowledge

You should be familiar with the operating system(s) on which you are running MySQL and PostgreSQL. You should also be familiar with relational database concepts.

4. Conventions Used

Abstract

This section describes the conventions used within this guide for text and code examples. It describes:

  • Text Conventions Used

  • Code Examples Conventions Used

4.1. Text Conventions Used

Convention Meaning Example
Bold The bold typeface indicates terms that are defined in the text, that appear in the glossary, or both. When you use this argument, you create a unique index.
Italics The italics typeface indicates book titles, emphasis, placeholders, or syntax clauses. PostgreSQL 8.3 Users' Guide
UPPERCASE monospace The uppercase monospace typeface indicates system-supplied elements, such as SQL keywords, commands, datatypes, and privileges. The NUMERIC datatype allows for arbitrary precision.
lowercase monospace The lowercase monospace typeface indicates user-supplied elements and executables. Such elements include database names, column names, user names, and parameter values. Use psql to start the command line client to PostgreSQL

4.2. Code Examples Conventions Used

The code examples show typical use of SQL, PL/pgSQL, or other command-line statements. They are displayed using a monospace typeface and clearly separated from the normal text.

SELECT firstname FROM customers WHERE lastname = 'Smith';

The following is a list of conventions used in the code examples, their meaning, and an example.

Convention Meaning Example
[] Indicates that the enclosed arguments are optional. CREATE SCHEMA schemaname [ AUTHORIZATION username ]
{} Indicates that one of the enclosed arguments is required. CREATE { TEMPORARY | TEMP } TABLE table_name
| Separates alternative items that may be optional or required. [ ENCRYPTED | UNENCRYPTED ] { BEFORE | AFTER }
... Indicates that the preceding item can be repeated. In code fragments an ellipsis means that code not relevant to the discussion has been omitted. CREATE USER username [ [ WITH ] option [ ... ] ] - SELECT column1, column2, ..., column/n/ FROM table;
Italics Indicates variables that you must supply values for. column/n/
UPPERCASE    
lowercase    

5. Overview

5.1. Features

5.1.1. MySQL Features

TBD

5.1.2. PostgreSQL Features

  • inheritance

  • data types

  • functions

Other features provide additional power and flexibility:

  • constraints

  • triggers

  • rules

  • transactional integrity

5.2. So, exactly why are you changing?

There are some reasons a person might want to switch from MySQL to PostgreSQL. Some are based on technical merit, others on personal feelings.

A big difference is that PostgreSQL is BSD-licensed. This makes a lot of difference for some people. This is but a small reason to make a change. The true reasoning for switching databases comes from the technical side. PostgreSQL, unlike MySQL, is a full-fledged relational database management system (RDBMS). This means PostgreSQL is pushing its way up there with the big names such as: Oracle, IBM, and Sybase.

The reason MySQL took off over PostgreSQL is that it was blazingly fast on the SELECT query side. And given this and the average deployment of MySQL as a backend for websites allowed for a big boost in use. There comes a time, however, when a (beginning) DBA starts to feel the shortcomings of MySQL and desires/needs a real RDBMS.

Both MySQL and PostgreSQL are nowadays available as a native Windows package as well.

6. Differences

6.1. Differences that stand out

PostgreSQL makes a difference between single quotes (') and double quotes (") as the ANSI standard dictates. MySQL does not follow this behaviour and allows you to use both single and double quotes to quote values and tends to favour the backtick (`), whereas it should only be single quotes. Double quotes are meant to be used on quoting field names, system identifiers, table names, and so on.

MySQL tends to ship with a database locale/collation set to be case insensitive. This means that doing a SELECT query on a lastname like: WHERE lastname='Johnson' would match 'Johnson', 'JOHNSON', or 'johnson'. This is solved easily, but depends on what you want to do:

  • use the correct case in the query, like in the example above,

  • convert the string to lowercase, e.g.: WHERE lower(lastname)='johnson',

  • use a case-insensitive query operator, e.g. ILIKE or ~*.

ANSI only allows '--' as a comment delimiter, whereas MySQL supports the shell/perl's '#' use of a comment delimiter. Just use '--' since both understand this.

6.2. Data Types

PostgreSQL has 3 integer types, namely SMALLINT (2 bytes big), INTEGER (INT, 4 bytes big) and BIGINT (8 bytes big). MySQL supports 5, namely: TINYINT (1 byte big), SMALLINT (2 bytes big), MEDIUMINT (3 bytes big), INT (4 bytes big), BIGINT (8 bytes big). So just take the equivalent or the one closest to it to contain the other in. Per the standard, PostgreSQL does not have unsigned integer types, so you'll need to go up to the next-larger type if you have unsigned data in MySQL big enough to overflow the signed types.

You can use PostgreSQL's TEXT type to replace all of MySQL's *TEXT types (TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT).

PostgreSQL's DECIMAL and NUMERIC are equivalent to each other, just like MySQL's.

MySQL supports, as an extension to the SQL92 standard, a synonym for DOUBLE PRECISION using just DOUBLE.

6.3. What's The Time?

MySQL's DATETIME is an ISO-8601-like date and timestamp format in the form of: YYYY-MM-DD HH:MM:SS. This excludes any timezone information whatsoever. Within PostgreSQL use TIMESTAMP. This automatically makes it TIMESTAMP WITHOUT TIME ZONE.

To confuse matters MySQL also has a date/time type called TIMESTAMP. This type is updated under a number of conditions. PostgreSQL does not support something like this. Most often you can get away with using the PostgreSQL TIMESTAMP type with a default of now() to have the field receive the current date and time. To get the creation time of the table you would need to use DEFAULT TIMESTAMP.

6.4. Constraints -- Hold What?

MySQL's schema (and Sybase too) often includes declarations like:

CREATE TABLE blah (
  id SMALLINT NOT NULL PRIMARY KEY
);

PostgreSQL (and Oracle) also support this syntax. Another way to have this working is to define a constraint, e.g.:

CREATE TABLE blah (
  id SMALLINT NOT NULL
);
ALTER TABLE ONLY blah
  ADD CONSTRAINT blah_pk PRIMARY KEY (id);

MySQL also knows a concept of enumerators (enum). Basically you specify an enumerated list of allowed values, e.g.:

CREATE TABLE blah (
  sentence enum('This', 'is', 'a', 'test')
);

With PostgreSQL 8.3 or later, you can use enums, but you need to create the type first. The bright side is, you can then use that type elsewhere without repeating yourself.

CREATE TYPE sentence as ENUM('This', 'is', 'a', 'test');
CREATE TABLE blah (
  foo sentence
);

6.5. More specialized differences

MySQL has an argument to fields, auto_increment, which automatically increases the value of the field whenever a new entry is made. PostgreSQL does not work like this. Within PostgreSQL (and Oracle) you normally use a SERIAL. See below for an example:

MySQL definition:

CREATE TABLE test (
  id    AUTO_INCREMENT,
  name  TEXT
);
INSERT INTO test (name) VALUES ('a name');

PostgreSQL definition:

CREATE SEQUENCE test_sequence;
CREATE TABLE test (
  id    SERIAL PRIMARY KEY
  name  TEXT
);
INSERT INTO test (name) VALUES ('a name');

MySQL also support the SERIAL datatype nowadays.

6.6. Portability reminders

ANSI SQL-92 reserves a lot of words that cannot/should not be used as table or column names. Please avoid using these names as this increases portability between different database backends. Different RDBMSes treat the reserved keyword list differently though, so check with the available documentation.

6.7. Random notes

MySQL's IFNULL(x, y) is definitely not equivalent with PostgreSQL's NULLIF(x, y). The first returns x if x is not NULL, otherwise it returns y. PostgreSQL's NULLIF returns a NULL only if both x and y are equal. MySQL also knows a NULLIF which is the same as PostgreSQL's. The equivalent to MySQL's IFNULL would be COALESCE(x, y).

Do note that while MySQL supports N/0 to be equivalent to NULL, apparently, PostgreSQL reports this as a floating point operation division by zero.

MySQL does not do any date checking. February is allowed to have 30 or 31 days and even leap years are not checked so you could also have a day 29 every year. PostgreSQL enforces correct dates.

MySQL has a UNIX_TIMESTAMP('date') which converts date back to seconds since the Unix epoch. PostgreSQL can also do this through an EXTRACT(EPOCH from TIMESTAMP 'date').

MySQL's LENGTH(string) operator is the same as PostgreSQL's CHAR_LENGTH(string);

PostgreSQL does not really have anything like MySQL's LAST_INSERT_ID(). Instead you need to use the sequences. If you have inserted in the same session you can use SELECT currval(pg_get_serial_sequence('your_table','your_column')); or otherwise you can use SELECT last_value from sequence;.