Photo on foter.com

A Suggestion on Managing Data Changes in Spring Boot Using Flyway

How to apply global schema changes and environment-specific data changes with Flyway’s versioned and repeatable migrations?

Mohamed Al Sayadi
Published in
16 min readAug 17, 2020

--

Introduction

Database migrations are essential for the development of an application to progress and scale in any meaningful way.

Among other things, migrations keep changes to the database recorded in version control, automated, stateful ¹, and, perhaps most importantly, reproducible.

[1]: I’m not sure if stateful is a common term for this. For me, migrations being stateful means that looking at any database, you can tell what ‘state’ the database is in; that is, what migrations have been applied to it so far. Flyway accomplishes this by creating a history table to record all attempted migrations.

The Issue

Photo by Arthur Yeti on Unsplash

While schema changes are “respected” enough by developers to be put into migrations, data changes are often not that lucky; data changes are more likely to be performed on an ad-hoc basis on individual environments.

The downsides of such an approach are, in essence, the opposite of the migration approach’s benefits; these data changes are [often] not committed to version control, not part of the automated deployment pipeline, stateless ², and hard to track and reproduce.

These disadvantages are especially apparent when such data changes pertain to what’s often referred to as master data; that is, records that the application assumes, depends on, and might not have a graceful way of dealing with their absence. Such data may include user types (user, admin, etc…), content categories, or localization-related records, to name some.

Data changes are as essential to the end result as the application code or schema changes and, thus, should be treated equally.

[2]: In the sense that looking at the history table of migrations, there’s no way to tell if certain data changes were applied.

What We Can Do

A quick workaround could be to commit any data changes done through SQL to a dedicated location in the codebase and apply them manually.

While this is not precisely stateful or automatically reproducible, it does help to get such changes into version control and might make tracking and reproducing them a bit more manageable.

But can we do better?

The Suggestion

A cleaner approach would be to treat data changes similar to schema changes; as first-class database migrations. After all, data changes are as essential to the end result as the application code or schema changes and, thus, should be treated equally.

In the context of Flyway, this translates into putting data changes in SQL files alongside other migrations, either versioned or repeatable.

Photo by William Rouse on Unsplash

A Crash Course on Flyway Migrations

In Flyway, a versioned migration is one that 1- starts with a V, 2- gets applied once, and 3- cannot be modified after it’s been applied. This type is the most common and is often used for schema migrations.

A repeatable migration, on the other hand, is one that 1- starts with an R, 2- may be applied multiple times, and 3- can be modified after it’s been applied. This type is less know and is suggested for creating views, procedures, functions, etc.

The challenge with repeatable migrations is writing the SQL code. Code in such migrations should be safe to rerun multiple times without producing unintended side effects such as erroring or duplicating records.

The Choice for Shared Data Migrations

For shared data migrations across all environments, the choice is largely practical and personal.

Writing versioned migrations means that your master data is potentially going to be split across multiple files with schema migration files in between. It does free you from having to write repeatable code, though.

On the other hand, writing repeatable migrations means that your master data can all sit in one file or a number of data files. It also means that data changes stay out of the way of, and are quite distinguishable from, schema changes. However, it does require you to write repeatable code (which for insert statements is quite manageable as we’ll see later for MySQL at least).

The Choice for Environment-specific Data Migrations

For data migrations that should only be applied to a specific environment (think test data for a test environment or dummy login credentials for dev), the same tradeoffs between versioned and repeatable migrations apply. However, versioned migrations have one more disadvantage in this case; environments other than the one receiving the versioned migration will seem to have “skipped” some versions.

Photo on foter.com

My Suggestion

If I haven’t already given it away, I prefer repeatable migrations for all data changes to keep them visibly different, maintain continuous versioning for schema changes across all environments, and keep things consistent for both shared and environment-specific data migrations.

The Details

The specifics of how to, exactly, accomplish this suggestion for three environments; dev, staging, and prod in Spring Boot 2.x can be summarized as:

1- Create the following properties files:
* application.properties
* application-non-prod.properties
* application-dev.properties
* application-staging.properties
* application-prod.properties

2- Create the following directories at the classpath. In Spring Boot as you’re probably aware, anything under /src/main/resources is added to the classpath:
* db/migration/shared
* db/migration/non-prod
* db/migration/dev
* db/migration/staging
* db/migration/prod

3- In the properties file for each environment, override the spring.flyway.locations property, which accepts a comma-separated list of locations. In Spring Boot, the default location is db/migration.

We want to modify the migrations locations in such a way that:
dev reads migrations from the shared, non-prod, and dev directories.
staging reads migrations from the shared, non-prod, and staging directories.
prod reads migrations from the shared, and prod directories only.

This can be accomplished by setting the locations property as follows:

* In application-dev.properties, set:

spring.flyway.locations=classpath:/db/migration/shared,classpath:/db/migration/non-prod,classpath:/db/migration/dev

* In application-staging.properties , set:

spring.flyway.locations=classpath:/db/migration/shared,classpath:/db/migration/non-prod,classpath:/db/migration/staging

* In application-prod.properties , set:

spring.flyway.locations=classpath:/db/migration/shared,classpath:/db/migration/prod

4- For any database change, follow these conventions:
* Is it a schema change? Create a versioned migration in the shared directory.
* Is it a master data change? Create a repeatable migration or update an existing one in the shared directory.
* Is it a data change that shouldn’t make its way to production but is ok on other environments? Create a repeatable migration or update an existing one in the non-prod directory.
* Is it a data change specific to the dev, staging, or prod environment? Create a repeatable migration or update an existing one in the dev, staging, or prod directory, respectively.

Photo on foter.com

An Example Implementation

To create a minimal application to test this setup, we can start with an empty Spring Boot project with at least the following dependencies; spring-boot-starter-web, spring-boot-starter-data-jpa, flyway-core, and mysql-connector-java and walk through the following steps ³:

1- Make sure you have a MySQL server running locally on the default port 3306 with a username of root and no password or change values across this tutorial according to your individual setup.

1.1- Access the MySQL server:

mysql -u root

1.2- Then create the dev, staging, and prod databases with:

CREATE DATABASE managing_flyway_migrations_dev;
CREATE DATABASE managing_flyway_migrations_staging;
CREATE DATABASE managing_flyway_migrations_prod;

2- Create properties files under /src/main/resources.
2.1- Create a file called application.properties with the following content:

spring.jpa.open-in-view=false# Change this value to 1.2 for the Third Run or remove all together
spring.flyway.target=1.1

2.2- Create an empty file called application-non-prod.properties

2.3- Create a file called application-dev.properties with the following content:

spring.profiles.include=non-prodserver.port=8081spring.datasource.url=jdbc:mysql://localhost:3306
spring.datasource.username=root
spring.datasource.password=
spring.datasource.name=managing_flyway_migrations_dev
spring.flyway.locations=classpath:/db/migration/shared,classpath:/db/migration/non-prod,classpath:/db/migration/dev
spring.flyway.schemas=${spring.datasource.name}

2.4- Create a file called application-staging.properties with the following content:

spring.profiles.include=non-prodserver.port=8082spring.datasource.url=jdbc:mysql://localhost:3306
spring.datasource.username=root
spring.datasource.password=
spring.datasource.name=managing_flyway_migrations_staging
spring.flyway.locations=classpath:/db/migration/shared,classpath:/db/migration/non-prod,classpath:/db/migration/staging
spring.flyway.schemas=${spring.datasource.name}

2.5- Create a file called application-prod.properties with the following content:

server.port=8083spring.datasource.url=jdbc:mysql://localhost:3306
spring.datasource.username=root
spring.datasource.password=
spring.datasource.name=managing_flyway_migrations_prod
spring.flyway.locations=classpath:/db/migration/shared,classpath:/db/migration/prod
spring.flyway.schemas=${spring.datasource.name}

3- Create the migrations files under src/main/resources/db/migration.
3.1- Under shared, create a file called V1_1__Initial_Schema.sql with the following content:

CREATE TABLE roles (
id BIGINT NOT NULL AUTO_INCREMENT,
name VARCHAR(100),
PRIMARY KEY (id)
);
CREATE TABLE users (
id BIGINT NOT NULL AUTO_INCREMENT,
name VARCHAR(255),
role_id BIGINT NOT NULL,
PRIMARY KEY (id),FOREIGN KEY (role_id) REFERENCES roles (id)
);
CREATE TABLE content_categories (
id BIGINT NOT NULL,
name VARCHAR(255) NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE content (
id BIGINT NOT NULL AUTO_INCREMENT,
name VARCHAR(255),
category_id BIGINT NOT NULL,
PRIMARY KEY (id),FOREIGN KEY (category_id) REFERENCES content_categories(id)
);

3.2- Under shared, create a file called V1_2__Add_content_topic.sql with the following content:

CREATE TABLE content_topics (
id BIGINT NOT NULL,
name VARCHAR(255) NOT NULL,
PRIMARY KEY (id)
);
ALTER TABLE content
ADD COLUMN topic_id BIGINT,
ADD CONSTRAINT FOREIGN KEY (topic_id) REFERENCES content_topics(id);

3.3- Under shared, create a file called R__1_Master_Data.sql with the following content:

INSERT INTO
roles (id, name)
VALUES (1, 'System'), (2, 'Admin'), (3, 'User')
ON DUPLICATE KEY UPDATE id=id, name=VALUES(`name`);
INSERT INTO
content_categories (id, name)
VALUES (1, 'Category 1'), (2, 'Category 2')
ON DUPLICATE KEY UPDATE id=id, name=VALUES(`name`);
# INSERT statements for the First Run
# Comment out for the Third Run
####################################################################
INSERT INTO
content (id, name, category_id)
VALUES (1, 'Content 1.1', 1), (2, 'Content 1.2', 1),
(3, 'Content 2.1', 2), (4, 'Content 2.2', 2)
ON DUPLICATE KEY UPDATE id=id, name=VALUES(`id`), category_id=VALUES(`category_id`);
####################################################################
# INSERT statements for the Third Run
# Uncomment for the Third Run
####################################################################
# INSERT INTO
# content_topics (id, name)
# VALUES (1, 'Topic 1'), (2, 'Topic 2')
# ON DUPLICATE KEY UPDATE id=id, name=VALUES(`name`);
#
# INSERT INTO
# content (id, name, category_id, topic_id)
# VALUES (1, 'Content 1.1', 1, 1), (2, 'Content 1.2', 1, 2),
# (3, 'Content 2.1', 2, 2), (4, 'Content 2.2', 2, 1)
# ON DUPLICATE KEY UPDATE id=id, name=VALUES(`name`), category_id=VALUES(`category_id`), topic_id=VALUES(`topic_id`);
####################################################################

Note: Notice that this migration starts with an R. This is what makes it a repeatable migration.

Note 2: Notice that the number in the name of the migration is neither necessary nor does it mean a specific version. I’m using numbers in the description of data migrations simply to order their execution. Flyway executes versioned migrations according to their versions first, then repeatable migrations alphabetically according to their descriptions.

3.4- Under non-prod, create a file called R__2_Non_prod_login.sql with the following content:

INSERT INTO
users (id, name, role_id)
VALUES (201, 'SuperAdmin', 1)
ON DUPLICATE KEY UPDATE id=id, name=VALUES(`name`), role_id=VALUES(`role_id`);

3.5- Under dev, create a file called R__3_Dev_login.sql with the following content:

INSERT INTO
users (id, name, role_id)
VALUES (301, 'Developer 1', 2), (302, 'Developer 2', 2)
ON DUPLICATE KEY UPDATE id=id, name=VALUES(`name`), role_id=VALUES(`role_id`);

3.6- Under staging, create a file called R__3_Staging_login.sql with the following content:

INSERT INTO
users (id, name, role_id)
VALUES (301, 'QA 1', 3), (302, 'QA 2', 3)
ON DUPLICATE KEY UPDATE id=id, name=VALUES(`name`), role_id=VALUES(`role_id`);

By now, we have all the code and configurations we need for our first run.

Photo by Fabio Comparelli on Unsplash

First Run

In this first run, we’ll execute the 1.1 versioned migration alongside the other repeated ones to simulate a deployment with the initial database schema and data.

The two important locations to differentiate between runs in these tests are the Flyway target in application.properties and the INSERT statements in R__1_Master_Data.sql. If you’re using the accompanying GitHub repository, make sure you’re on the first-run branch.

Testing the `dev` Profile

Run the application with the dev profile active:

mvn spring-boot:run -Dspring.profiles.active=dev

By now, we expect a number of things:
* The versioned migration V1_1__Initial_Schema.sql was applied.
* The versioned migration V1_2__Add_content_topic.sql was not applied (since we have the target set to 1.1).
* The shared repeated migration R__1_Master_Data.sql was applied.
* The non-prod repeated migration R__2_Non_prod_login.sql was applied.
* The dev repeated migration R__3_Dev_login.sql was applied.

Use the Dev Database

USE  managing_flyway_migrations_dev;

Check the History Table

To verify this, we check the flyway_schema_history that Flyway would’ve created by running the following inside our database server:

SELECT installed_rank, version, description, success FROM flyway_schema_history;

The output of the SELECT statement above should look as follows:

+----------------+---------+------------------+---------+
| installed_rank | version | description | success |
+----------------+---------+------------------+---------+
| 1 | 1.1 | Initial Schema | 1 |
| 2 | NULL | 1 Master Data | 1 |
| 3 | NULL | 2 Non prod login | 1 |
| 4 | NULL | 3 Dev login | 1 |
+----------------+---------+------------------+---------+

Check the Content Table

The content table should contain the records we inserted in the Master Data as follows:

SELECT * FROM content;

This should output:

+----+-------------+-------------+
| id | name | category_id |
+----+-------------+-------------+
| 1 | Content 1.1 | 1 |
| 2 | Content 1.2 | 1 |
| 3 | Content 2.1 | 2 |
| 4 | Content 2.2 | 2 |
+----+-------------+-------------+

Check the Users Table

In the users table, we should see records from both the Non-production Login (SuperAdmin) as well as the Dev Login migrations we defined. Run the command:

SELECT * FROM users;

This should output:

+-----+-------------+---------+
| id | name | role_id |
+-----+-------------+---------+
| 201 | SuperAdmin | 1 |
| 301 | Developer 1 | 2 |
| 302 | Developer 2 | 2 |
+-----+-------------+---------+

I leave testing the staging profile to the reader as it’s almost identical to the tests described in this section.

Now, let’s move on to testing production!

Testing the `prod` Profile

Run the application with the prod profile active:

mvn spring-boot:run -Dspring.profiles.active=prod

By now, we expect a number of things:
* The versioned migration V1_1__Initial_Schema.sql was applied.
* The versioned migration V1_2__Add_content_topic.sql was not applied.
* The shared repeated migration R__1_Master_Data.sql was applied.

Use the Prod Database

USE  managing_flyway_migrations_prod;

Check the History Table

To verify this, we check the flyway_schema_history that Flyway would’ve created by running the following inside our database server:

SELECT installed_rank, version, description, success FROM flyway_schema_history;

The output of the SELECT statement above should look as follows:

+----------------+---------+----------------+---------+
| installed_rank | version | description | success |
+----------------+---------+----------------+---------+
| 1 | 1.1 | Initial Schema | 1 |
| 2 | NULL | 1 Master Data | 1 |
+----------------+---------+----------------+---------+

Notice that in production, we don’t have any of the test credentials defined in non-prod, dev, or staging.

Check the Content Table

The content table should contain the records we inserted in the Master Data similar to what we saw in dev:

SELECT * FROM content;

This should output:

+----+-------------+-------------+
| id | name | category_id |
+----+-------------+-------------+
| 1 | Content 1.1 | 1 |
| 2 | Content 1.2 | 1 |
| 3 | Content 2.1 | 2 |
| 4 | Content 2.2 | 2 |
+----+-------------+-------------+

Check the Users Table

The users table should be empty:

SELECT * FROM users;

This should output:

Empty set (0.00 sec)
Photo by Chander R on Unsplash

Second Run

In this run, we simulate an application restart or a deployment that contains no migration changes. Here, we’re really testing Flyway’s behavior.

Testing the `dev` Profile

Stop and rerun the dev application:

mvn spring-boot:run -Dspring.profiles.active=dev

Testing the `prod` Profile

Stop and rerun the prod application as well:

mvn spring-boot:run -Dspring.profiles.active=prod

Results

In the stream of logs that Spring generates in each run, look for a line containing the following:

Schema `managing_flyway_migrations_dev` is up to date. No migration necessary.

This tells us that Flyway detected that no changes were made to the migrations, and therefore it took no action. Notice that even repeatable migrations were’s re-applied by Flyway; this is because the files, and therefore their checksums, didn’t change.

Feel free to explore records in the tables after the restart to verify that nothing actually changed before moving on to the Third Run.

Photo by Andrew Tanglao on Unsplash

Third Run

Now, let’s simulate a deployment with changes made to both versioned and repeatable migrations.

In the application.propertiesfile, change the spring.flyway.target property to 1.2 or delete it altogether. This will cause Flyway to apply the second versioned migration we have; V1_2__Add_content_topic.sql.

In the R__1_Master_Data.sql, comment out the block for the First Run, and uncomment the one for the Third Run, as mentioned. Notice that in an actual application, you’d really be modifying the existing code instead of the commenting stuff.

If you’re using the GitHub repository, simply checkout the third-run branch.

Testing the `dev` Profile

Similar to the previous steps, stop and rerun the dev application:

mvn spring-boot:run -Dspring.profiles.active=dev

By now, we expect a number of things:
* The versioned migration V1_1__Initial_Schema.sql was not re-applied.
* The versioned migration V1_2__Add_content_topic.sql was applied.
* The shared repeated migration R__1_Master_Data.sql was re-applied (since it was changed).
* The non-prod repeated migration R__2_Non_prod_login.sql was not re-applied (since it was not changed).
* The dev repeated migration R__3_Dev_login.sql was not re-applied.

Use the Dev Database

USE  managing_flyway_migrations_dev;

Check the History Table

Similar to the previous tests, run:

SELECT installed_rank, version, description, success FROM flyway_schema_history;

The output of the SELECT statement above should look as follows:

+----------------+---------+-------------------+---------+
| installed_rank | version | description | success |
+----------------+---------+-------------------+---------+
| 1 | 1.1 | Initial Schema | 1 |
| 2 | NULL | 1 Master Data | 1 |
| 3 | NULL | 2 Non prod login | 1 |
| 4 | NULL | 3 Dev login | 1 |
| 5 | 1.2 | Add content topic | 1 |
| 6 | NULL | 1 Master Data | 1 |
+----------------+---------+-------------------+---------+

Notice the 1.2 versioned migration at the rank 5 as well as the re-application of the modified Master Data migration at 6.

Check the Content Table

The content table should contain the records we inserted in the modified Master Data, including the new topic_id column introduced by the new migrations:

SELECT * FROM content;

This should output:

+----+-------------+-------------+----------+
| id | name | category_id | topic_id |
+----+-------------+-------------+----------+
| 1 | Content 1.1 | 1 | 1 |
| 2 | Content 1.2 | 1 | 2 |
| 3 | Content 2.1 | 2 | 2 |
| 4 | Content 2.2 | 2 | 1 |
+----+-------------+-------------+----------+

Check the Users Table

The users table wouldn’t have changed since the last run:

SELECT * FROM users;

This should output:

+-----+-------------+---------+
| id | name | role_id |
+-----+-------------+---------+
| 201 | SuperAdmin | 1 |
| 301 | Developer 1 | 2 |
| 302 | Developer 2 | 2 |
+-----+-------------+---------+

Testing the `prod` Profile

Similarly, stop and rerun the prod application:

mvn spring-boot:run -Dspring.profiles.active=prod

By now, we expect the following:
* The versioned migration V1_1__Initial_Schema.sql was not re-applied.
* The versioned migration V1_2__Add_content_topic.sql was applied.
* The shared repeated migration R__1_Master_Data.sql was re-applied.

Use the Prod Database

USE  managing_flyway_migrations_prod;

Check the History Table

Two new records should show in the flyway_schema_historytable:

SELECT installed_rank, version, description, success FROM flyway_schema_history;

The output of the SELECT statement above should look as follows:

+----------------+---------+-------------------+---------+
| installed_rank | version | description | success |
+----------------+---------+-------------------+---------+
| 1 | 1.1 | Initial Schema | 1 |
| 2 | NULL | 1 Master Data | 1 |
| 3 | 1.2 | Add content topic | 1 |
| 4 | NULL | 1 Master Data | 1 |
+----------------+---------+-------------------+---------+

Check the Content Table

Similar to dev, the content records should contain the new topic_id column:

SELECT * FROM content;

This should output:

+----+-------------+-------------+----------+
| id | name | category_id | topic_id |
+----+-------------+-------------+----------+
| 1 | Content 1.1 | 1 | 1 |
| 2 | Content 1.2 | 1 | 2 |
| 3 | Content 2.1 | 2 | 2 |
| 4 | Content 2.2 | 2 | 1 |
+----+-------------+-------------+----------+

Check the Users Table

The users table should still be empty:

SELECT * FROM users;

This should output:

Empty set (0.00 sec)
Photo by Aw Creative on Unsplash

Clean Run

Finally, let’s simulate a fresh installation of the application at this stage. This simulates how migrations would work in a fresh environment with an empty database or when the project is freshly set up on a new local development machine.

Let’s use the staging environment here for a change. We’ll pretend that we’re just now setting up staging with the current code and a fresh database:

DROP DATABASE managing_flyway_migrations_staging;
CREATE DATABASE managing_flyway_migrations_staging;

Now run the staging application:

mvn spring-boot:run -Dspring.profiles.active=staging

By this, we expect the following:
* The versioned migration V1_1__Initial_Schema.sql was applied.
* The versioned migration V1_2__Add_content_topic.sql was applied.
* The shared repeated migration R__1_Master_Data.sql was applied.
* The non-prod repeated migration R__2_Non_prod_login.sql was applied.
* The staging repeated migration R__3_Staging_login.sql was applied.

Use the Staging Database

USE  managing_flyway_migrations_staging;

Check the History Table

Two new records should show in the flyway_schema_historytable:

SELECT installed_rank, version, description, success FROM flyway_schema_history;

The output of the SELECT statement above should look as follows:

+----------------+---------+-------------------+---------+
| installed_rank | version | description | success |
+----------------+---------+-------------------+---------+
| 1 | 1.1 | Initial Schema | 1 |
| 2 | 1.2 | Add content topic | 1 |
| 3 | NULL | 1 Master Data | 1 |
| 4 | NULL | 2 Non prod login | 1 |
| 5 | NULL | 3 Staging login | 1 |
+----------------+---------+-------------------+---------+

Note: Notice that the Master Data migration is applied once. Also, notice that this works fine because repeated migrations are always run after all versioned migrations have been applied.

Check the Content Table

The content table should already have all changes, including the topic_id column:

SELECT * FROM content;

This should output:

+----+-------------+-------------+----------+
| id | name | category_id | topic_id |
+----+-------------+-------------+----------+
| 1 | Content 1.1 | 1 | 1 |
| 2 | Content 1.2 | 1 | 2 |
| 3 | Content 2.1 | 2 | 2 |
| 4 | Content 2.2 | 2 | 1 |
+----+-------------+-------------+----------+

Check the Users Table

The users table should have records from both the Non-production Login and Staging Login migrations:

SELECT * FROM users;

This should output:

+-----+------------+---------+
| id | name | role_id |
+-----+------------+---------+
| 201 | SuperAdmin | 1 |
| 301 | QA 1 | 3 |
| 302 | QA 2 | 3 |
+-----+------------+---------+

The Code

If you’d like a test drive with minimal effort, I’ve put together the same setup handy on GitHub: https://github.com/sayadi/managing-flyway-migrations-spring-boot.

A Final Word

While this setup works, it is something I dreamt up for this article as a solution to a repeated problem I see. However, we haven’t had a chance to test this on a production application yet.

If you do end up using it in a real setup, do let me know in the comments below what works for you, what doesn’t, and what changes you might have had to do. Thanks for reading!

--

--