The S3 storage engine has been available since MariaDB 10.5.4.
The S3 storage engine is read only and allows one to archive MariaDB tables in Amazon S3, or any third-party public or private cloud that implements S3 API (of which there are many), but still have them accessible for reading in MariaDB.
As of MariaDB 10.5.7, the S3 storage engine is currently gamma maturity, so the following step can be omitted.
On earlier releases, when it was alpha maturity, it will not load by default on a stable release of the server due to the default value of the plugin_maturity variable. Set to alpha (or below) in your config file to permit installation of the plugin:
[mysqld] plugin-maturity = alpha
and restart the server.
Now install the plugin library, for example:
INSTALL SONAME 'ha_s3';
If the library is not available, for example:
INSTALL SONAME 'ha_s3'; ERROR 1126 (HY000): Can't open shared library '/var/lib/mysql/lib64/mysql/plugin/ha_s3.so' (errno: 13, cannot open shared object file: No such file or directory)
you may need to install a separate package for the S3 storage engine, for example:
shell> yum install MariaDB-s3-engine
To move data from an existing table to S3, one can run:
ALTER TABLE old_table ENGINE=S3 COMPRESSION_ALGORITHM=zlib
To get data back to a 'normal' table one can do:
ALTER TABLE s3_table ENGINE=INNODB
S3_BLOCK_SIZE : Set to 4M as default. This is the block size for all index and data pages stored in S3. COMPRESSION_ALGORITHM : Set to 'none' as default. Which compression algorithm to use for block stored in S3. Options are: none or zlib. ALTER TABLE can be used on S3 tables as normal to add columns or change column definitions.
To be able to use S3 for storage one *must* define how to access S3 and where data are stored in S3:
If you are using an S3 service that is using HTTP to connect (like https://min.io/) you also need the set the following variables:
If you are going to use a primary-replica setup, you should look at the following variables:
TRUE. This allows the replica to replicate CREATE TABLE .. SELECT FROM s3_table even it the replica doesn't have access to the original s3_table. FALSE. The above defaults assume that the primary and replica don't share the same S3 instance.
Other, less critical options, are:
Last some options you probably don't have to ever touch:
[mariadb] s3=ON s3-bucket=mariadb s3-access-key=xxxx s3-secret-key=xxx s3-region=eu-north-1 s3-host-name=s3.amazonaws.com # The following is useful if you want to use minio as a S3 server. (https://min.io/) #s3-port=9000 #s3-use-http=ON # Primary and replica share same S3 tables. s3-slave-ignore-updates=1 [aria_s3_copy] s3-bucket=mariadb s3-access-key=xxxx s3-secret-key=xxx s3-region=eu-north-1 s3-host-name=s3.amazonaws.com # The following is useful if you want to use minio as a S3 server. (https://min.io/) #s3-port=9000 #s3-use-http=ON
[mariadb] s3=ON s3-host-name="127.0.0.1" s3-bucket=storage-engine s3-access-key=minio s3-secret-key=minioadmin s3-port=9000 s3-use-http=ON [aria_s3_copy] s3=ON s3-host-name="127.0.0.1" s3-bucket=storage-engine s3-access-key=minio s3-secret-key=minioadmin s3-port=9000 s3-use-http=ON
The typical use case would be that there exists tables that after some time would become fairly inactive, but are still important so that they can not be removed. In that case, an option is to move such a table to an archiving service, which is accessible through an S3 API.
Notice that S3 means the Cloud Object Storage API defined by Amazon AWS. Often the whole of Amazon’s Cloud Object Storage is referred to as S3. In the context of the S3 archive storage engine, it refers to the API itself that defines how to store objects in a cloud service, being it Amazon’s or someone else’s. OpenStack for example provides an S3 API for storing objects.
The main benefit of storing things in an S3 compatible storage is that the cost of storage is much cheaper than many other alternatives. Many S3 implementations also provide reliable long-term storage.
The S3 storage engine supports full MariaDB discovery. This means that if you have the S3 storage engine enabled and properly configured, the table stored in S3 will automatically be discovered when it's accessed with SHOW TABLES, SELECT or any other operation that tries to access it. In the case of SELECT, the .frm file from S3 will be copied to the local storage to speed up future accesses.
When an S3 table is opened for the first time (it's not in the table cache) and there is a local .frm file, the S3 engine will check if it's still relevant, and if not, update or delete the .frm file.
This means that if the table definition changes on S3 and it's in the local cache, one has to execute FLUSH TABLES to get MariaDB to notice the change and update the .frm file.
If partitioning S3 tables are used, the partition definitions will also be stored on S3 storage and will be discovered by other servers.
Discovery of S3 tables is not done for tables in the mysql databases to make mysqld boot faster and more securely.
S3 works with replication. One can use replication in two different scenarios:
aria_s3_copy is an external tool that one can use to copy Aria tables to and from S3. Use aria_s3_copy --help to get the options of how to use it.
mariadb-dump is run with the --copy-s3-tables option, the resulting file will contain a CREATE statement for a similar Aria table, followed by the table data and ending with an ALTER TABLE xxx ENGINE=S3. As of MariaDB 10.5.14, ANALYZE TABLE is supported for S3 tables. As the S3 tables are read-only, a normal ANALYZE TABLE will not do anything. However using ANALYZE TABLE table_name PERSISTENT FOR... will now work.
As of MariaDB 10.5.14, CHECK TABLE will work. As S3 tables are read only it is very unlikely that they can become corrupted. The only known way an S3 table could be corrupted if either the original table copied to S3 was corrupted or the process of copying the original table to S3 was somehow interrupted.
All ALTER PARTITION operations are supported on S3 partitioning tables except:
Depending on your connection speed to your S3 provider, there can be some notable slowdowns in some operations.
As S3 is supporting discovery (automatically making tables available that are in S3) this can cause some small performance problems if the S3 engine is enabled. Partitioning S3 tables also support discovery.
There are no performance degradation's when accessing existing tables on the server. Accessing the S3 table the first time will copy the .frm file from S3 to the local disk, speeding up future accesses to the table.
If you have performance problems with the S3 engine, here are some things you can try:
Try also to execute the query twice to check if the problem is that the data was not properly cached. When data is cached locally the performance should be excellent.
If you get errors such as:
ERROR 3 (HY000): Got error from put_object(bubu/produkt/frm): 5 Couldn't connect to server
one reason could be that your system doesn't allow MariaDB to connect to ports other than 3306. To procedure to enable other ports is the following:
Search for the ports allowed for MariaDB:
$ sudo semanage port -l | grep mysqd_port_t mysqld_port_t tcp 1186, 3306, 63132-63164
Say you want to allow MariaDB to connect to port 32768:
$ sudo semanage port -a -t mysqld_port_t -p tcp 32768
You can verify that the new port, 32768, is now allowed for MariaDB:
$ sudo semanage port -l | grep mysqd_port_t mysqld_port_t tcp 32768,1186, 3306, 63132-63164
© 2023 MariaDB
Licensed under the Creative Commons Attribution 3.0 Unported License and the GNU Free Documentation License.
https://mariadb.com/kb/en/using-the-s3-storage-engine/