IP address proxy lookup optimization For MySQL

In this tutorial, we will do a comparison between a standard IP range query vs a more optimized query. In this case, there may be gaps in the ranges with no data. For this comparison, we will use the IP2Proxy PX2 database to lookup proxy data from the visitor’s IP address. See IP2Proxy PX2 database for more info.

Creating the database and the tables for the comparison

Below are the steps to set up the database for the 2 tables being used in the comparison.

Run the below SQL code in MySQL.

CREATE DATABASE ip2proxy;
 
USE ip2proxy;
 
CREATE TABLE `ip2proxy_px2`(
    `ip_from` INT(10) UNSIGNED,
    `ip_to` INT(10) UNSIGNED,
    `proxy_type` VARCHAR(3),
    `country_code` CHAR(2),
    `country_name` VARCHAR(64),
    PRIMARY KEY (`ip_from`, `ip_to`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
 
CREATE TABLE `ip2proxy_px2_faster`(
    `ip_from` INT(10) UNSIGNED,
    `ip_to` INT(10) UNSIGNED,
    `proxy_type` VARCHAR(3),
    `country_code` CHAR(2),
    `country_name` VARCHAR(64),
    PRIMARY KEY (`ip_to`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

Creating the benchmarking codes

Save the following code into a PHP file and run it.

<?php
// MySQL connection information
$config['dbHost'] = 'myserver';
$config['dbUser'] = 'myuser';
$config['dbPass'] = 'mypassword';
$config['dbName'] = 'ip2proxy';
 
$pdo = new PDO('mysql:host=' . $config['dbHost'] . ';dbname=' . $config['dbName'] . ';charset=utf8', $config['dbUser'], $config['dbPass']);
$pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
 
// configure performance schema for profiling & clear old log
$setup = ['TRUNCATE TABLE performance_schema.events_statements_history_long',
'UPDATE performance_schema.setup_instruments SET ENABLED = \'YES\', TIMED = \'YES\' WHERE NAME LIKE \'%statement/%\'',
'UPDATE performance_schema.setup_consumers SET ENABLED = \'YES\' WHERE NAME LIKE \'%events_statements_%\''
];
 
foreach ($setup as $sql) {
    $st = $pdo->prepare($sql);
    $st->execute();
}
 
$totalqueries = 1000; // number of random IPs we will benchmark
$total = [0, 0];
 
$list = [];
 
// generate list of random IPs
for ($x = 0; $x < $totalqueries; ++$x) {
    array_push($list, mt_rand(1, 254) . '.' . mt_rand(1, 254) . '.' . mt_rand(1, 254) . '.' . mt_rand(1, 254));
}
 
// template SQL
$sql1 = 'SELECT `country_code` FROM `ip2proxy_px2` WHERE INET_ATON(:ip_address) BETWEEN ip_from AND ip_to LIMIT 1';
$sql2 = 'SELECT `country_code` FROM (SELECT `country_code`, `ip_from` FROM `ip2proxy_px2_faster` WHERE ip_to >= INET_ATON(:ip_address) LIMIT 1) AS mytable WHERE ip_from <= INET_ATON(:ip_address)';
 
// querying the IPs using both SQL statements
foreach ($list as $ip_address) {
    $st = $pdo->prepare($sql1);
    $st->bindParam(':ip_address', $ip_address, PDO::PARAM_STR);
    $st->execute();
 
    $st = $pdo->prepare($sql2);
    $st->bindParam(':ip_address', $ip_address, PDO::PARAM_STR);
    $st->execute();
}
 
// get profile results
$sql = 'SELECT EVENT_ID, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Duration, SQL_TEXT FROM performance_schema.events_statements_history_long WHERE SQL_TEXT LIKE \'%ip2proxy_px2%\' AND SQL_TEXT NOT LIKE \'%performance_schema%\'';
$st = $pdo->prepare($sql);
$st->execute();
 
// output profile results
while($rs = $st->fetchObject()) {
    $duration = $rs->Duration * 1000;
    if (strpos($rs->SQL_TEXT, 'ip2proxy_px2_faster') !== false) {
        $total[1] += $duration;
    }
    else if (strpos($rs->SQL_TEXT, 'ip2proxy_px2') !== false) {
        $total[0] += $duration;
    }
}
 
$average[0] = $total[0] / $totalqueries;
$average[1] = $total[1] / $totalqueries;
$improvement = (($average[0] - $average[1]) / $average[0]) * 100;
 
echo "\n\nAverage time (using between): " . $average[0] . " ms\n";
echo "\n\nAverage time (using subquery): " . $average[1] . " ms\n";
echo "\n\nImprovement: " . $improvement . " %\n";
 
?>

Details about the tables and how the optimization works

As shown in the codes, we are using 2 types of SQL queries. Each query is designed with specific table indexes in mind for the best performance in each case. Unfortunately, range queries are notoriously hard to optimize especially in cases where the IP ranges may have gaps.

For the table ip2proxy_px2, we employ a rather standard index in the form of a primary key containing both the ip_from and ip_to columns. For most people, this would be the obvious solution to improve range query performance. As for the SQL query in this case, we used the following:

SELECT `country_code` FROM `ip2proxy_px2` WHERE INET_ATON(IP_ADDRESS) BETWEEN ip_from AND ip_to LIMIT 1

Running our benchmark of 1000 random IP addresses, we managed to achieve an average query time of 57.033074 ms. For most people, this query time is certainly more than acceptable.

For the table ip2proxy_px2_faster, we employ a single column primary key which only contains the ip_to column. A quick glance will tell the most experienced database developers that this will NOT work well with the previous SQL query. As for the SQL query in this case, we used the following:

SELECT `country_code` FROM (SELECT `country_code`, `ip_from` FROM `ip2proxy_px2_faster` WHERE ip_to >= INET_ATON(IP_ADDRESS) LIMIT 1) AS mytable WHERE ip_from <= INET_ATON(IP_ADDRESS)

Running our benchmark of 1000 random IP addresses, we managed to achieve an average query time of 1.745722 ms.

That is a massive 96.939105894941% improvement in the query speed. While it may seem counter-intuitive to perform a sub-query when a simple SELECT statement would have suffice, the numbers don’t lie.

To explain how the optimization works, take a look at the first table. It may only have 1 primary key index but there are 2 columns in there. So when a query is executed, MySQL still has to basically perform 2 index lookups. Multi-index queries usually are slower compared to single-index queries.

In the second table, we only have a single column as the primary key. This means we can’t run the first query as it will be very slow. Hence the need for a different query. Take note that in the second query, there is a sub-query. This is the key optimization technique. We are getting only 1 record where the ip_to value is greater than or equal to our IP number. Then all we have to do is check that the ip_from is less than or equal to our IP number. If it is, then we have our record. Otherwise, the IP number does not exist in our proxy data.

Was this article helpful?

Related Articles