Joins with data stored in a Cassandra database are only possible on the MariaDB side. That is, if we want to compute a join between two tables, we will:
Either of the tables can be an InnoDB table, or a Cassandra table. In case the second table is a Cassandra table, the Cassandra Storage Engine allows to read matching records in an efficient way.
All this is targeted at running joins which touch small fraction of the tables. The expected typical use-case looks like this:
select * from user_accounts where username='joe')
Cassandra SE allows to grab some Cassandra data, as well. One can write things like this:
select user_accounts.*, cassandra_table.some_more_fields from user_accounts, cassandra_data where user_accounts.username='joe' and user_accounts.user_id= cassandra_table.user_id
which is much easier to do than to use Thrift API.
If the user wants to run huge joins that touch a big fraction of table's data, for example:
"What are top 10 countries that my website had visitors from in the last month"?
or
"Go through last month's orders and give me top 10 selling items"
then Cassandra Storage engine is not a good answer. Queries like this are answered in two ways:
It is possible to run Hive/Pig on Cassandra.
© 2019 MariaDB
Licensed under the Creative Commons Attribution 3.0 Unported License and the GNU Free Documentation License.
https://mariadb.com/kb/en/handling-joins-with-cassandra/