Advanced optimization for IP2Proxy database queries

IP2Proxy database queries

In this article, we’ll show IP2Proxy database users how to optimize the SQL queries to improve performance. The reason for this optimization is due to the fact that the IP2Proxy database does NOT contain contiguous ranges of IP addresses. There are gaps between the ranges whenever there is no proxy server IP found. Optimization can significantly enhance the speed and efficiency of database operations, leading to faster response times and improved overall system performance.

Our example code and benchmark below will be using PHP and MySQL 8 to store and query the PX11 database. This advanced optimization technique will come in handy if your IP2Proxy database experiences high number of queries or you just want to speed things up.

Normal way of creating table & querying the IP2Proxy data

Take a look at the PX11 FAQ page and you will see the SQL below to create the table for storing the PX11 data.

CREATE DATABASE ip2proxy;
USE ip2proxy;
CREATE TABLE `ip2proxy_px11`(
   `ip_from` INT(10) UNSIGNED,
   `ip_to` INT(10) UNSIGNED,
   `proxy_type` VARCHAR(3),
   `country_code` CHAR(2),
   `country_name` VARCHAR(64),
   `region_name` VARCHAR(128),
   `city_name` VARCHAR(128),
   `isp` VARCHAR(256),
   `domain` VARCHAR(128),
   `usage_type` VARCHAR(11),
   `asn` VARCHAR(10),
   `as` VARCHAR(256),
   `last_seen` INT(10),
   `threat` VARCHAR(128),
   `provider` VARCHAR(256),
   PRIMARY KEY (`ip_from`, `ip_to`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

Then for queries, you will see the below example.

SELECT
   `ip_from`,
   `ip_to`,
   `proxy_type`,
   `country_code`,
   `country_name`,
   `region_name`,
   `city_name`,
   `isp`,
   `domain`,
   `usage_type`,
   `asn`,
   `as`,
   `last_seen`,
   `threat`,
   `provider` 
FROM
   `ip2proxy_px11` 
WHERE
   INET_ATON([IP ADDRESS]) >= ip_from 
   AND INET_ATON([IP ADDRESS]) <= ip_to LIMIT 1

Now, there is nothing wrong with the above SQL statements. They will work just fine and for most people, that is good enough.

Optimized SQL for the IP2Proxy table and queries

If you want your IP2Proxy database queries to be faster, then please read on. There are 2 things we need to modify to improve the performance for the IP2Proxy PX11 queries. Firstly, we need to remove the composite primary key (ip_from, ip_to) and replace with a single ip_to primary key. Having both ip_from and ip_to in the primary key for the range search is inefficient since MySQL needs to compare both the ip_from and ip_to fields during the queries.

Let’s change the table creation SQL to the below.

CREATE DATABASE ip2proxy;
USE ip2proxy;
CREATE TABLE `ip2proxy_px11_faster`(
   `ip_from` INT(10) UNSIGNED,
   `ip_to` INT(10) UNSIGNED,
   `proxy_type` VARCHAR(3),
   `country_code` CHAR(2),
   `country_name` VARCHAR(64),
   `region_name` VARCHAR(128),
   `city_name` VARCHAR(128),
   `isp` VARCHAR(256),
   `domain` VARCHAR(128),
   `usage_type` VARCHAR(11),
   `asn` VARCHAR(10),
   `as` VARCHAR(256),
   `last_seen` INT(10),
   `threat` VARCHAR(128),
   `provider` VARCHAR(256),
   PRIMARY KEY (`ip_to`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

With the use of the ip_to field as the only primary key index, the original query SQL will become super slow. So, we’ll need to modify the SELECT statement to become the below.

SELECT
   * 
FROM
   (
      SELECT
         * 
      FROM
         `ip2proxy_px11_faster` 
      WHERE
         ip_to >= INET_ATON([IP ADDRESS]) LIMIT 1
   )
   AS mytable 
WHERE
   ip_from <= INET_ATON([IP ADDRESS])

Since we are only relying on the ip_to field as the index, we will just query using that. Note the use of a subquery where we search for the first row that matches the ip_to criteria. Once we have that row, we just check if the ip_from field matches the search criteria. The SELECT statement may look a bit more complicated but the improvement in performance is worth it.

Let’s run the benchmark code to see the improvement

In this part, we’ll show you the codes that we use to perform our benchmark. Then we can see how much faster the query has become.

Firstly, create the 2 tables to store the IP2Proxy PX11 data

Run the SQL below to create the normal table and the optimized table.

CREATE DATABASE ip2proxy;
USE ip2proxy;
CREATE TABLE `ip2proxy_px11`(
   `ip_from` INT(10) UNSIGNED,
   `ip_to` INT(10) UNSIGNED,
   `proxy_type` VARCHAR(3),
   `country_code` CHAR(2),
   `country_name` VARCHAR(64),
   `region_name` VARCHAR(128),
   `city_name` VARCHAR(128),
   `isp` VARCHAR(256),
   `domain` VARCHAR(128),
   `usage_type` VARCHAR(11),
   `asn` VARCHAR(10),
   `as` VARCHAR(256),
   `last_seen` INT(10),
   `threat` VARCHAR(128),
   `provider` VARCHAR(256),
   PRIMARY KEY (`ip_from`, `ip_to`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

CREATE TABLE `ip2proxy_px11_faster`(
   `ip_from` INT(10) UNSIGNED,
   `ip_to` INT(10) UNSIGNED,
   `proxy_type` VARCHAR(3),
   `country_code` CHAR(2),
   `country_name` VARCHAR(64),
   `region_name` VARCHAR(128),
   `city_name` VARCHAR(128),
   `isp` VARCHAR(256),
   `domain` VARCHAR(128),
   `usage_type` VARCHAR(11),
   `asn` VARCHAR(10),
   `as` VARCHAR(256),
   `last_seen` INT(10),
   `threat` VARCHAR(128),
   `provider` VARCHAR(256),
   PRIMARY KEY (`ip_to`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

Now, import the PX11 data into both tables

Run the below SQL to perform the data import. Note that the import statement is identical for both tables.

USE ip2proxy;
LOAD DATA LOCAL
   INFILE 'IP2PROXY-IP-PROXYTYPE-COUNTRY-REGION-CITY-ISP-DOMAIN-USAGETYPE-ASN-LASTSEEN-THREAT-RESIDENTIAL-PROVIDER.CSV'
INTO TABLE
   `ip2proxy_px11`
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n';

LOAD DATA LOCAL
   INFILE 'IP2PROXY-IP-PROXYTYPE-COUNTRY-REGION-CITY-ISP-DOMAIN-USAGETYPE-ASN-LASTSEEN-THREAT-RESIDENTIAL-PROVIDER.CSV'
INTO TABLE
   `ip2proxy_px11_faster`
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n';

Create the benchmark PHP code

Copy and paste our code into a file called test.php and edit the code to put in your MySQL password. Then save the file.

<?php
// MySQL connection information
$config['dbHost'] = 'localhost';
$config['dbUser'] = 'root';
$config['dbPass'] = 'YOUR_MYSQL_PASSWORD';
$config['dbName'] = 'ip2proxy';

$pdo = new PDO('mysql:host=' . $config['dbHost'] . ';dbname=' . $config['dbName'] . ';charset=utf8', $config['dbUser'], $config['dbPass']);
$pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

// configure performance schema for profiling & clear old log
$setup = ['TRUNCATE TABLE performance_schema.events_statements_history_long',
'UPDATE performance_schema.setup_instruments SET ENABLED = \'YES\', TIMED = \'YES\' WHERE NAME LIKE \'%statement/%\'',
'UPDATE performance_schema.setup_consumers SET ENABLED = \'YES\' WHERE NAME LIKE \'%events_statements_%\''
];

foreach ($setup as $sql) {
               $st = $pdo->prepare($sql);
               $st->execute();
}

$totalqueries = 1000; // number of random IPs we will benchmark
$total = [0, 0];
$list = [];

// generate list of random IPs
for ($x = 0; $x < $totalqueries; ++$x) {
               array_push($list, mt_rand(1, 254) . '.' . mt_rand(1, 254) . '.' . mt_rand(1, 254) . '.' . mt_rand(1, 254));
}

// template SQL
$sql1 = 'SELECT * FROM `ip2proxy_px11` WHERE INET_ATON(:ip_address) BETWEEN ip_from AND ip_to LIMIT 1';
$sql2 = 'SELECT * FROM (SELECT * FROM `ip2proxy_px11_faster` WHERE ip_to >= INET_ATON(:ip_address) LIMIT 1) AS mytable WHERE ip_from <= INET_ATON(:ip_address)';

// querying the IPs using both SQL statements
foreach ($list as $ip_address) {
               $st = $pdo->prepare($sql1);
               $st->bindParam(':ip_address', $ip_address, PDO::PARAM_STR);
               $st->execute();
               $st = $pdo->prepare($sql2);
               $st->bindParam(':ip_address', $ip_address, PDO::PARAM_STR);
               $st->execute();
}

// get profile results
$sql = 'SELECT EVENT_ID, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Duration, SQL_TEXT FROM performance_schema.events_statements_history_long WHERE SQL_TEXT LIKE \'%ip2proxy_px11%\' AND SQL_TEXT NOT LIKE \'%performance_schema%\'';
$st = $pdo->prepare($sql);
$st->execute();

// output profile results
while($rs = $st->fetchObject()) {
               $duration = $rs->Duration * 1000;
               if (strpos($rs->SQL_TEXT, 'ip2proxy_px11_faster') !== false) {
                              $total[1] += $duration
               }
               else if (strpos($rs->SQL_TEXT, 'ip2proxy_px11') !== false) {
                              $total[0] += $duration;
               }
}

$average[0] = $total[0] / $totalqueries;
$average[1] = $total[1] / $totalqueries;
$improvement = (($average[0] - $average[1]) / $average[0]) * 100;

echo "\n\nAverage time (using between): " . $average[0] . " ms\n";
echo "\n\nAverage time (using subquery): " . $average[1] . " ms\n";
echo "\n\nImprovement: " . $improvement . " %\n";
?>

In the above PHP code, we will configure the MySQL to enable profiling and clear old performance logs. Next, we’ll generate 1000 random IPv4 addresses for our test data. The test data is then queried via the 2 SELECT queries. Once the queries are done, we’ll check the profile results and calculate the performance gained.

Run the benchmark

Run the test.php on the command line

php test.php

The output we saw was the below:

Average time (using between): 586.4565 ms

Average time (using subquery): 11.965 ms

Improvement: 97.959780478177 %

That is a tremendous improvement over the normal query.

SQL Server and PostgreSQL optimizations

While our benchmark was specifically targeting MySQL, similar optimization will also benefit users of Microsoft SQL Server as well as PostgreSQL users. We won’t cover those here but essentially do the same thing as we did for the MySQL case. Just remove the ip_from from the index and modify the SELECT query to use a subquery.

Conclusion

As mentioned earlier, if you are happy with the performance of the normal table and query, then you don’t have to do anything. This article is geared toward users who prize performance. For them, making the changes to the table and query as we’ve shown will give them the huge boost in speed that they crave.

Was this article helpful?

Related Articles