MySQL is a common relational database being used by our customers to store the IP2Location Geolocation data. In this article, we’ll cover storing and querying IPv6 data in MySQL. In particular, we will look at the use of decimal and binary datatypes to store the IPv6 data for geolocation queries. We’ll also list out the pros and cons for both along with some benchmark numbers.
There are 2 types of CSV files being offered by IP2Location. One is the IPv4 CSV file which only contains the IPv4 address ranges. The other is the IPv6 CSV file which contains both IPv4 and IPv6 address ranges.
IP2Location CSV files store IP geolocation data as rows of IP ranges. The first 2 fields in a row are the IP_FROM and IP_TO which denote the range. These 2 fields are IP numbers and not IP addresses. The reason for this is that it is sortable as a number in relational databases like MySQL.
Tip: If you don’t know how to convert IP address to IP number, please see the IP2Location FAQs page.
Using UNSIGNED INT datatype to store the IP number in the IPv4 CSV
When we use the IP2Location IPv4 CSV files, the max IP number can be safely stored using an unsigned INT in a MySQL table. The table creation SQL will look like the below:
CREATE DATABASE ip2location; USE ip2location; CREATE TABLE `ip2location_db11_ipv4`( `ip_from` INT(10) UNSIGNED, `ip_to` INT(10) UNSIGNED, `country_code` CHAR(2), `country_name` VARCHAR(64), `region_name` VARCHAR(128), `city_name` VARCHAR(128), `latitude` DOUBLE, `longitude` DOUBLE, `zip_code` VARCHAR(30), `time_zone` VARCHAR(8), PRIMARY KEY (`ip_to`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
To import data, we use the below SQL.
LOAD DATA INFILE 'DB11LITEIPV4.CSV' INTO TABLE `ip2location_db11_ipv4` CHARACTER SET 'utf8' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n';
To query for the IP address 8.8.8.8 (IP number 134744072), we run the following SQL.
select * from `ip2location_db11_ipv4` where 134744072 <= ip_to limit 1;
You can also perform the same query using the below SQL.
select * from `ip2location_db11_ipv4` where inet_aton(‘8.8.8.8’) <= ip_to limit 1;
The MySQL INET_ATON function easily converts the IPv4 address to an integer that represents the numeric value of the address in network byte order (big endian).
Using DECIMAL(39,0) datatype for IP number in IPv6 CSV
On the other hand, the IP2Location IPv6 CSV files actually contain both IPv4 and IPv6 ranges. Note, the IPv4 ranges in the IPv6 CSV files are using IPv4-mapped IPv6 addresses. The max IP number contained in the IPv6 CSV files will exceed the maximum value of the MySQL’s INT or BIGINT data type.
In the IP2Location FAQs, our standard data type to be used for the IPv6 CSV files is a DECIMAL(39, 0) field in MySQL. This will be able to store the required IP numbers without loss of data. Using a decimal field allows the end users to query the table in a similar fashion to the IPv4 style. The table creation SQL will look like the below:
CREATE DATABASE ip2location; USE ip2location; CREATE TABLE `ip2location_db11_ipv6`( `ip_from` DECIMAL(39,0) UNSIGNED, `ip_to` DECIMAL(39,0) UNSIGNED, `country_code` CHAR(2), `country_name` VARCHAR(64), `region_name` VARCHAR(128), `city_name` VARCHAR(128), `latitude` DOUBLE, `longitude` DOUBLE, `zip_code` VARCHAR(30), `time_zone` VARCHAR(8), PRIMARY KEY (`ip_to`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
To import data, we use the below SQL.
LOAD DATA INFILE 'DB11LITEIPV6.CSV' INTO TABLE `ip2location_db11_ipv6` CHARACTER SET 'utf8' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n';
To query for the IP address 8.8.8.8, firstly convert the IPv4 address to IPv4-mapped IPv6 by prefixing “::ffff:” to the IP to become ::ffff:8.8.8.8 (IP number 281470816487432). Then, we run the following SQL.
select * from `ip2location_db11_ipv6` where 281470816487432 <= ip_to limit 1;
NOTE: Even though there is a INET6_ATON function in MySQL for working with IPv6, it does not return a number like the INET_ATON case. Given an IPv6 or IPv4 network address as a string, INET6_ATON returns a binary string that represents the numeric value of the address in network byte order (big endian). Binary string is not able to match against the decimal field, therefore it won’t work in our query.
Using a BINARY(16) field for the IP number in the IPv6 CSV
While the decimal datatype works for the IPv6 files, there is another option which is using a BINARY(16) field. As mentioned above, INET6_ATON does not work with decimals so we are using the BINARY datatype to be able to utilize INET6_ATON in our queries.
Below is what the modified table creation SQL will look like.
CREATE TABLE `ip2location_db11_binary`( `ip_from` BINARY(16), `ip_to` BINARY(16), `country_code` CHAR(2), `country_name` VARCHAR(64), `region_name` VARCHAR(128), `city_name` VARCHAR(128), `latitude` DOUBLE, `longitude` DOUBLE, `zip_code` VARCHAR(30), `time_zone` VARCHAR(8), PRIMARY KEY (`ip_to`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Before you can import data from the IP2Location IPv6 CSV files into the ip2location_db11_binary table, you will need to modify the IP_FROM and IP_TO in the CSV from decimal numbers into hexadecimals. For this, we’ll use the IP2Location Python CSV Converter command line.
NOTE: Python and pip are prerequisites for the converter.
After you’ve installed the IP2Location Python CSV Converter, you can just run the below command to convert the DB11 IPv6 CSV file to one that uses hexadecimals for the IP_FROM and IP_TO fields.
ip2location-csv-converter -hex6 -replace DB11LITEIPV6.CSV DB11LITEIPV6_HEX.CSV
The new file will contain the converted IP ranges in hexadecimal form, left padded with zeros till the string is 32-char in length.
Now, run the below to import the data into the binary table.
LOAD DATA INFILE 'DB11LITEIPV6_HEX.CSV' INTO TABLE `ip2location_db11_binary` CHARACTER SET 'utf8' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' (@var1, @var2, `country_code`, `country_name`, `region_name`, `city_name`, `latitude`, `longitude`, `zip_code`, `time_zone`) SET ip_from = UNHEX(@var1), ip_to = UNHEX(@var2);
Take note of the differences in the import SQL above compared to the previous ones. We will use the MySQL UNHEX function to convert the hexadecimal string into a binary string.
To query for the IP address 8.8.8.8, firstly convert the IPv4 address to IPv4-mapped IPv6 by prefixing “::ffff:” to the IP to become ::ffff:8.8.8.8 and there is no need to convert the IP address to IP number. Then, we run the following SQL.
select * from `ip2location_db11_binary` where inet6_aton(‘::ffff:8.8.8.8’) <= ip_to limit 1;
As you can see, querying IPv4 (IPv4-mapped IPv6 format) or IPv6 is more simplified without needing to convert the IP number first.
Using the binary field also means that you can use INET6_NTOA function to convert the binary IP_FROM and IP_TO back into human-readable IPv6 address form.
select inet6_ntoa(ip_from), inet6_ntoa(ip_to) from `ip2location_db11_binary` where inet6_aton(‘::ffff:8.8.8.8’) <= ip_to limit 1;
Pros and cons of using decimal vs. binary fields for the IPv6 data
For our benchmarking, we used the DB11 LITE IPv6 CSV file. The IP2Location Python CSV Converter took about 40 seconds to perform the conversion of the CSV data.
There was not much difference in the times taken to import the decimal and hexadecimal CSV data into the respective MySQL tables. Decimal data import took 69 seconds compared to the binary data import which took 59 seconds.
Our benchmark test data file contained a mix of 50,000 IPv4 and 50,000 IPv6 addresses in randomized order. Querying this 100,000 IPs using the decimal table took about 32 minutes while the binary table took about 18 minutes.
When comparing the table size, the decimal table used up 575MB while the binary one took up 509MB.
To summarize:
Decimal pros
- No need to convert the CSV data prior to import.
Decimal cons
- IPv4 address needs to be in IPv4-mapped IPv6 format.
- Need to convert IP address to IP number before querying.
- Queries took slightly longer for a large number of IPs.
- Disk space usage is a lot higher vs. the binary table.
Binary pros
- No need to convert IP address to IP number before querying.
- Queries are slightly faster when doing a large number of IPs.
- Disk space usage is a lot less compared to the decimal table.
Binary cons
- IPv4 address needs to be in IPv4-mapped IPv6 format.
- Need to convert CSV data to hexadecimal before import.
Conclusion
While using either the decimal or binary fields works for the IP2Location IPv6 CSV data, choosing which one to use boils down to the performance speed and disk usage. For this, the binary field option is best. Large number of queries can be completed faster than the decimal case. Not to mention, the huge disk space saved by using the binary field.
THE POWER OF IP GEOLOCATION
Find a solution that fits.