Table pullout is an optimization for Semi-join subqueries.
Sometimes, a subquery can be re-written as a join. For example:
select * from City where City.Country in (select Country.Code from Country where Country.Population < 100*1000);
If we know that there can be, at most, one country with with a given value of Country.Code
(we can tell that if we see that table Country has a primary key or unique index over that column), we can re-write this query as:
select City.* from City, Country where City.Country=Country.Code AND Country.Population < 100*1000;
If one runs EXPLAIN
for the above query in MySQL 5.1-5.6 or MariaDB 5.1-5.2, they'll get this plan:
MySQL [world]> explain select * from City where City.Country in (select Country.Code from Country where Country.Population < 100*1000); +----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+ | 1 | PRIMARY | City | ALL | NULL | NULL | NULL | NULL | 4079 | Using where | | 2 | DEPENDENT SUBQUERY | Country | unique_subquery | PRIMARY,Population | PRIMARY | 3 | func | 1 | Using where | +----+--------------------+---------+-----------------+--------------------+---------+---------+------+------+-------------+ 2 rows in set (0.00 sec)
It shows that the optimizer is going to do a full scan on table City
, and for each city it will do a lookup in table Country
.
If one runs the same query in MariaDB 5.3, they will get this plan:
MariaDB [world]> explain select * from City where City.Country in (select Country.Code from Country where Country.Population < 100*1000); +----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+ | 1 | PRIMARY | Country | range | PRIMARY,Population | Population | 4 | NULL | 37 | Using index condition | | 1 | PRIMARY | City | ref | Country | Country | 3 | world.Country.Code | 18 | | +----+-------------+---------+-------+--------------------+------------+---------+--------------------+------+-----------------------+ 2 rows in set (0.00 sec)
The interesting parts are:
select_type=PRIMARY
, and id=1
as if they were in one join. Indeed, if one runs EXPLAIN EXTENDED; SHOW WARNINGS, they will see that the subquery is gone and it was replaced with a join:
MariaDB [world]> show warnings\G *************************** 1. row *************************** Level: Note Code: 1003 Message: select `world`.`City`.`ID` AS `ID`,`world`.`City`.`Name` AS `Name`,`world`.`City`.`Country` AS `Country`,`world`.`City`.`Population` AS `Population` from `world`.`City` join `world`.`Country` where ((`world`.`City`.`Country` = `world`.`Country`.`Code`) and (`world`.`Country`. `Population` < (100 * 1000))) 1 row in set (0.00 sec)
Changing the subquery into a join allows feeding the join to the join optimizer, which can make a choice between two possible join orders:
as opposed to the single choice of
which we had before the optimization.
In the above example, the choice produces a better query plan. Without pullout, the query plan with a subquery would read (4079 + 1*4079)=8158
table records. With table pullout, the join plan would read (37 + 37 * 18) = 703
rows. Not all row reads are equal, but generally, reading 10
times fewer table records is faster.
UNIQUE
/PRIMARY
key definitions. There is no separate @@optimizer_switch flag for table pullout. Table pullout can be disabled by switching off all semi-join optimizations with SET @@optimizer_switch='semijoin=off'
command.
© 2019 MariaDB
Licensed under the Creative Commons Attribution 3.0 Unported License and the GNU Free Documentation License.
https://mariadb.com/kb/en/table-pullout-optimization/