Migrating from MySQL to PostgreSQL ---------------------------------- By: Jeroen Ruigrok van der Werven NOTE: This is a pre-DocBook draft. There's some initial markup which is my own personal style. Abstract -------- This reference guide describes how to migrate from MySQL to PostgreSQL on all levels. Copyright information --------------------- MySQL is a registered trademark of MySQL AB in the United States, the European Union and other countries. Preface [preface] ======= This guide hopes to detail the changes a person must make to his/her database design in order to switch from MySQL to PostgreSQL. The database versions used in the comparison/conversion testing are MySQL 4.x and PostgreSQL 7.3.x. No doubt a lot of the information presented here also applied to past and future versions. This chapter has the following sections: - Audience - Prerequisite Knowledge - Structure of this Guide - Conventions Used Audience [section] -------- This guide is aimed at people who are migrating from MySQL to PostgreSQL. Prerequisite Knowledge [section] ---------------------- You should be familiar with the operating system(s) on which you are running MySQL and PostgreSQL. You should also be familiar with relational database concepts. Structure of this Guide [section] ----------------------- This reference guide is organized as follows: Chapter 1, "Overview" Describes MySQL and PostgreSQL and their respective features. Chapter 2?, "Data Types" Describes the data types from MySQL to PostgreSQL. Also describes reserved words. Conventions Used [section] ---------------- This section describes the conventions used within this guide for text and code examples. It describes: - Text Conventions Used - Code Examples Conventions Used Text Conventions Used [subsection] - - - - - - - - - - - Convention Meaning Example ------------------------------------------------------------------------------ Bold The bold typeface When you use this argument, you create indicates terms that a -unique index-. are defined in the text, that appear in the glossary, or both. /Italics/ The italic typeface PostgreSQL 7.3 User's Guide indicates book titles, emphasis, placeholders, or syntax clauses. UPPERCASE The uppercase monospace The N U M E R I C datatype allows for monospace typeface indicates arbitrary precision. system-supplied elements, such as: SQL keywords, commands, datatypes, and privileges. lowercase The lowercase monospace Use p s q l to start the command line monospace typeface indicates client to PostgreSQL. user-supplied elements and executables. Such elements include database names, column names, user names, and parameter values. Code Examples Conventions Used [subsection] - - - - - - - - - - - - - - - The code examples show typical use of SQL, PL/pgSQL, or other command-line statements. They are displayed using a monospace typeface and clearly separated from the normal text. SELECT firstname FROM customers WHERE lastname = 'Smith'; The following is a list of conventions used in the code examples, their meaning, and an example. Convention Meaning Example ------------------------------------------------------------------------------ [] Indicates that the CREATE SCHEMA schemaname enclosed arguments are [ AUTHORIZATION username ] optional. {} Indicates that one of CREATE { TEMPORARY | TEMP } the enclosed arguments TABLE table_name is required. | Separates alternative [ ENCRYPTED | UNENCRYPTED ] items that may be optional or required. { BEFORE | AFTER } ... Indicates that the CREATE USER username preceding item can be [ [ WITH ] option [ ... ] ] repeated. In code fragments an ellipsis SELECT column1, column2, ... , means that code not column/n/ FROM table; relevant to the discussion has been omitted. Italics Indicates variables that you must supply values for. UPPERCASE lowercase Chapter 1, Overview [chapter] =================== Features [section] -------- MySQL Features [subsection] - - - - - - - PostgreSQL Features [subsection] - - - - - - - - - - heritance data types functions Other features provide additional power and flexibility: constraints triggers rules transactional integrity So, exactly why are you changing? --------------------------------- There are some reasons a person might want to switch from MySQL to PostgreSQL. Some are based on technical merit, others on personal feelings. A big difference is that PostgreSQL is BSD-licensed. This makes a whole of a lot difference for some people. This is but a small reason to make a change. The true reasoning for switching databases comes from the technical side. PostgreSQL, unlike MySQL, is a full-fledged relational database management system (RDBMS). This means PostgreSQL is slowly pushing its way up there with the big names such as: Oracle, IBM, and Sybase. The reason MySQL took off over PostgreSQL is that it is (was?) blazingly fast on the select query side. And given this and the average deployment of MySQL (as a backend for websites) allowed for a big boost in use. There comes a time however when a (beginning) DBA starts to feel the shortcomings of MySQL and desires/needs a real RDBMS. One of the big advantages MySQL has is the availability of a native Windows binary. This means a lot of people can get a pretty good database for no cost, whatsoever. Chapter 2 ========= Differences that stand out -------------------------- PostgreSQL makes a difference between single quotes (') and double quotes (") as the ANSI standard dictates. MySQL does not follow this behaviour and allows you to use both single and double quotes to quote values, whereas it should only be single quotes. Double quotes are meant to be used on quoting field names, system identifiers, table names, and so on. MySQL is case insensitive. This means that doing a SELECT query on a lastname like: WHERE lastname='Johnson' would match 'Johnson', 'JOHNSON', or 'johnson'. This is solved easily, but depends on what you want to do: - use the correct case in the query, like in the example above, - convert the string to lowercase, e.g.: WHERE lower(lastname)='johnson', - use a case-insensitive query operator, e.g. ILIKE or *~. ANSI only allows '--' as a comment delimiter, whereas MySQL supports the shell/perl's '#' use of a comment delimiter. Just use '--' since both understand this. Data Types ---------- PostgreSQL has only 3 integer types, namely SMALLINT (2 bytes big), INTEGER (INT, 4 bytes big) and BIGINT (8 bytes big). MySQL supports 5, namely: TINYINT (1 byte big), SMALLINT (2 bytes big), MEDIUMINT (3 bytes big), INT (4 bytes big), BIGINT (8 bytes big). So just take the equivalent or the one closest to it to contain the other in. MySQL's TINYTEXT can be substituted by a VARCHAR(255). For TEXT, MEDIUMTEXT, and LONGTEXT just use PostgreSQL's TEXT type. PostgreSQL's DECIMAL and NUMERIC are equivalent to eachother just like MySQL's. MySQL supports, as an extension to the SQL92 standard, a synonym for DOUBLE PRECISION using just DOUBLE. What's The Time? ---------------- MySQL's DATETIME is an ISO-8601-like date and timestamp format in the form of: YYYY-MM-DD HH:MM:SS. This excludes any timezone information whatsoever. Within PostgreSQL use TIMESTAMP. This automatically makes it TIMESTAMP WITHOUT TIME ZONE. To confuse matters MySQL also has a date/time type called TIMESTAMP. This type is updated under a number of conditions. PostgreSQL does not support something like this. Most often you can get away with using the PostgreSQL TIMESTAMP type with a default of 'now' or now() to have the field received the current date and time. To get the creation time of the table you would need to use DEFAULT TIMESTAMP 'now'. Constraints -- Hold What? ------------------------- MySQL's schema (and Sybase too) often includes declarations like: CREATE TABLE blah ( id SMALLINT NOT NULL PRIMARY KEY); PostgreSQL (and Oracle) also support this syntax. Another way to have this working is to define a constraint, e.g.: CREATE TABLE blah ( id SMALLINT NOT NULL); ALTER TABLE ONLY blah ADD CONSTRAINT blah_pk PRIMARY KEY (id); MySQL also knows a concept of enumerators (enum). Basically you specify an enumerated list of allowed values, e.g.: CREATE TABLE blah ( sentence enum('This', 'is', 'a', 'test')); Within PostgreSQL you would solve this like this: CREATE TABLE blah ( sentence CHARACTER VARYING(50) CONSTRAINT blah_sentence_cstr CHECK (sentence IN ('This', 'is', 'a', 'test'); More specialized differences ---------------------------- MySQL has an argument to fields, auto_increment, which automatically increases the value of the field whenever a new entry is made. PostgreSQL does not work like this. Within PostgreSQL (and Oracle) you normally use a 'sequence' where you get the next value from. See below for an example: MySQL definition: CREATE TABLE test ( id AUTO_INCREMENT, name VARCHAR(30)); INSERT INTO test (name) VALUES ('a name'); PostgreSQL definition: CREATE SEQUENCE test_sequence; CREATE TABLE test ( id int not null default nextval('test_sequence'), name VARCHAR(30)); INSERT INTO test (name) VALUES ('a name'); PostgreSQL also allows you to use SERIAL to shorten the code above: CREATE TABLE test ( id SERIAL, name CARCHAR(30)); MySQL also support the SERIAL datatype nowadays. Portability reminders --------------------- ANSI SQL-92 reserves a lot of words that cannot/should not be used as table or column names. Please avoid using these names as this increases portability between different database backends. Different RDBMSes treat the reserved keyword list differently. Random notes ------------ MySQL's IFNULL(x, y) is definately not equivalent with PostgreSQL's NULLIF(x, y). The first returns x if x is not NULL, otherwise it returns y. PostgreSQL's NULLIF returns a NULL only if both x and y are equal. MySQL also knows a NULLIF which is the same as PostgreSQL's. The equivalent to MySQL's IFNULL would be COALESCE(x, y). Do note that while MySQL supports N/0 to be equivalent to NULL, apparently, PostgreSQL reports this as a floating point operation division by zero. MySQL does not do any date checking. February is allowed to have 30 or 31 days and even leap years are not checked so you could also have a day 29 every year. PostgreSQL enforces correct dates. MySQL has a UNIX_TIMESTAMP('date') which converts date back to seconds since the Unix epoch. PostgreSQL can also do this through a EXTRACT(EPOCH from TIMESTAMP 'date'). MySQL's LENGTH(string) operator is the same as PostgreSQL's CHAR_LENGTH(string); PostgreSQL does not really know anything like MySQL's LAST_INSERT_ID(). Instead you need to use the sequences. If you have inserted in the same session you can use SELECT currval('sequence'); or otherwise you can use SELECT last_value from sequence;.