MariaDB

From Cheaha
Jump to navigation Jump to search


Attention: Research Computing Documentation has Moved
https://docs.rc.uab.edu/


Please use the new documentation url https://docs.rc.uab.edu/ for all Research Computing documentation needs.


As a result of this move, we have deprecated use of this wiki for documentation. We are providing read-only access to the content to facilitate migration of bookmarks and to serve as an historical record. All content updates should be made at the new documentation site. The original wiki will not receive further updates.

Thank you,

The Research Computing Team

Summary

MariaDB is a free and open-source database engine intended to mirror the functionality of MySQL. Server daemon entities are generally not allowed to be installed and run directly on Cheaha nodes. The proper way to run an SQL server is through a container. The user can then connect to the server using an appropriate connection method.

Optimization for Single User Access on Cheaha

While MariaDB is a mature and useful system, the default configuration of MariaDB can be inefficient for large, not-in-memory databases, leading to poor performance and longer runtimes. The primary intended use-case for MariaDB is serving database information to many remote users, like with UAB Oracle AdminSys. It is assumed for this page that science-related SQL jobs on Cheaha serve only a single user. That assumption allows us to configure the system for higher performance when querying large databases. There are a few potential fixes for these issues that are available to Cheaha users. Options include changing the default buffer pool size to reduce paging, storing the buffer state on disk to reduce warmup time, leaving a MariaDB terminal open on a long-running job, and requesting access to the large memory nodes on Cheaha. The latter can be done by sending a ticket to [support@listserv.uab.edu].

The flags listed below can also be modified at runtime at the MariaDB terminal. A flag formatted as --flag-name=value can be changed using SET GLOBAL flag_name=value. Note that the hyphens have changed to underscores. The information below is only helpful if the user intends to access the same large database multiple times over multiple sessions. If more complicated queries need to be run on large databases, it might be more helpful to think about using strategies and tools developed specifically for big data, such as Apache Spark, or a high-level language with SQL interfaces like Python. Python can be onnected to MariaDB using the associated MySQL Connector Conda package.

Modifying the Buffer Pool

Having a small buffer pool means that flushing occurs more frequently. Flushing is when the machine moves data to and from the buffer, which involves moving data between memory and storage. Communication with storage is orders of magnitude slower than communication between processor and memory, so more frequent flushing can reduce performance. MariaDB uses a collection of variables to control how much of a database is in memory at one time, in a buffer pool. Fine control of the InnoDB buffer pool involves a number of variables. Two are relevant for the typical scientific use-case of MariaDB. The flags below are recommendations for use with the mysqld command. More information can be found here.

--innodb_buffer_pool_instances=1
--innodb_buffer_pool_size=[80% of total allocated memory]
--innodb_buffer_pool_chunk_size=[80% of total allocated memory]

The number of instances is set to one using the innodb-buffer-pool-instances flag because there is only one user. Multiple instances are useful when there is concurrent access by many users. The buffer pool size is set to 80% of total allocated memory using the innodb-buffer-pool-size because MariaDB has an additional 10% for internal buffers. It is important to stay below the memory allocation to avoid accidentally killing the job and destroying hard work.

The MariaDB server must be run inside of a container on Cheaha. If the container can't be modified, then the three above flags can't be used when starting the server daemon mysqld. Instead they must be modified at runtime in the MariaDB terminal. See the note in the summary for how to change the flags into SQL queries that modify the global variables.

Storing the Buffer Pool State

The buffer pool is stored in memory, so it is destroyed when the MariaDB server is shut down. When it is restarted at a later time, the buffered indices stored in the pool must be rebuilt. The nature of shared computing resources means that the job, and the server, are shut down periodically. This means the cache must be rebuilt for each new job, even with the same database. For large databases, this can result in significant and repeated warmup time. It is possible to mitigate this by writing the buffer pool state to disk on shutdown and reading it on startup. The fraction of buffer pool to store may also be controlled, with a default of 40%. The following flags enable loading and saving of the buffer pool. Loading and saving are enabled by default.

--innodb_buffer_pool_load_at_startup=ON
--innodb_buffer_pool_dump_at_shutdown=ON
--innodb_buffer_pool_dump_pct=100

The buffer pool file may be a significant fraction of the size of the original database, which could increase storage requirements. Depending on the use-case and sizes involved, it may not be a good choice to store the entire buffer pool. An alternative is to simply request a VNC job on a partition with a long time, and then leave the MariaDB terminal open.