If you have a task to prepare Docker database image with pre-populated data it seems to be pretty easy. It actually is, but you need to know how to preserve data between runs. If your database stores data in directory that is a volume (I’m sure it’s true for MySQL/MariaDB and Postgres) it’s going to be empty in committed image. That was my first mistake.
After that I started the database image using docker-compose, started importing script and after few hours I had https://www.imdb.com database in my docker image.
Then I committed the image and created a tag. However, after killing that image and starting the new one, tagged one I lost my whole work cause database was empty. Why? Because of the volume mechanism that I wasn’t aware of.
Now let’s do it right.
Steps to create prepopulated MariaDB/MySQL image:
- Prepare Dockerfile that configures MariaDB/MySQL to store data in different folder (see the code listing below)
- Start container and import the data, you can use e.g. importing scripts mounted as volume to your container
- Obtain id of an image
- Commit image with your tag, for me it was:
docker commit <ID> djagielo/mariadb-employees:latest
- Login to your Docker registry:
docker login -u <USER>
- Push image:
docker push djagielo/mariadb-employees:latest
Dockerfile and docker-compose:
RUN cp -r /var/lib/mysql /var/lib/mysql-no-volume
CMD ["--datadir", "/var/lib/mysql-no-volume"]
If you’re looking for quite big dataset, you may be interested in datasets of IMDB service. Dumps are available for free right here: https://datasets.imdbws.com