Importing IP2Proxy data into Cassandra and querying with PHP (IPv4)

Intro

The guide will demonstrate how to import IP2Proxy Proxy Detection data (PX11) in CSV form into Apache Cassandra and then query the data in a PHP page.

First of all, you will need to download the IP2Proxy PX11 CSV file.

Download commercial version at https://ip2location.com/download?code=PX11

Extract out the IP2PROXY-IP-PROXYTYPE-COUNTRY-REGION-CITY-ISP-DOMAIN-USAGETYPE-ASN-LASTSEEN-THREAT-RESIDENTIAL-PROVIDER.CSV file from the downloaded zipped file and store in the /mydata folder (our example, yours may differ).

Important Note

We will not cover the installation of Cassandra or PHP in this guide. We will assume you have already setup Cassandra and PHP on the localhost and are using PHP via Apache (also on the localhost). For this example, we are using a Debian machine.

You will also need to install the PHP Cassandra driver from https://pecl.php.net/package/cassandra

Pre-process the CSV data

Before we import the CSV data, we have to insert a dummy column into the data for the partition key. As we will be performing an ordered search, all of the rows will have the same partition key.

In Bash, run the following command to prefix every row in the CSV file with the dummy column and output the results into a new CSV file.

sed -e 's/^/"px11",/' /mydata/IP2PROXY-IP-PROXYTYPE-COUNTRY-REGION-CITY-ISP-DOMAIN-USAGETYPE-ASN-LASTSEEN-THREAT-RESIDENTIAL-PROVIDER.CSV > /mydata/IP2PROXY-IP-PROXYTYPE-COUNTRY-REGION-CITY-ISP-DOMAIN-USAGETYPE-ASN-LASTSEEN-THREAT-RESIDENTIAL-PROVIDER.CSV2

Importing the CSV data into Cassandra

In the cqlsh, run the following command to create the keyspace (equivalent of a database).

CREATE KEYSPACE IF NOT EXISTS ip2proxy WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};

After creating the keyspace, you need to select it by running the below command.

USE ip2proxy;

Next, run the following command to create the table.

DROP TABLE IF EXISTS ip2proxy_px11;

CREATE TABLE IF NOT EXISTS ip2proxy_px11 (
   dummy varchar,
   ip_from bigint,
   ip_to bigint,
   proxy_type varchar,
   country_code varchar,
   country_name varchar,
   region_name varchar,
   city_name varchar,
   isp varchar,
   domain varchar,
   usage_type varchar,
   asn varchar,
   as varchar,
   last_seen varchar,
   threat varchar,
   provider varchar,
   PRIMARY KEY (dummy, ip_to)
)
WITH CLUSTERING ORDER BY (ip_to ASC);

Now that we have a table, we will commence the import of data from our CSV file into the table.

COPY ip2proxy_px11 (dummy, ip_from, ip_to, proxy_type, country_code, country_name, region_name, city_name, isp, domain, usage_type, asn, as, last_seen, threat, provider)
FROM '/mydata/IP2PROXY-IP-PROXYTYPE-COUNTRY-REGION-CITY-ISP-DOMAIN-USAGETYPE-ASN-LASTSEEN-THREAT-RESIDENTIAL-PROVIDER.CSV2';

Querying the data in PHP

Now, create a PHP file called test.php in your website.

Paste the following PHP code into it and then run it in the browser:

<?php
$ip = '8.8.8.8';

function queryIP2Proxy($myip) {
	$keyspace  = 'ip2proxy';
	$cluster   = Cassandra::cluster()->build(); // localhost
	$session   = $cluster->connect($keyspace);
	
	$myipnum = sprintf('%u', ip2long($myip));
	
	$statement = new Cassandra\SimpleStatement('SELECT * FROM ip2proxy_px11 WHERE dummy = \'px11\' AND ip_to >= ' . $myipnum . ' ORDER BY ip_to LIMIT 1');
	
	$future = $session->executeAsync($statement);
	$result = $future->get();
	
	if ($result->count() == 0)
		die('No record found' . "<br>\n");
	
	return $result[0];
}

$myresult = queryIP2Proxy($ip);

echo 'proxy_type: ' . $myresult['proxy_type'] . "<br>\n";
echo 'country_code: ' . $myresult['country_code'] . "<br>\n";
echo 'country_name: ' . $myresult['country_name'] . "<br>\n";
echo 'region_name: ' . $myresult['region_name'] . "<br>\n";
echo 'city_name: ' . $myresult['city_name'] . "<br>\n";
echo 'isp: ' . $myresult['isp'] . "<br>\n";
echo 'domain: ' . $myresult['domain'] . "<br>\n";
echo 'usage_type: ' . $myresult['usage_type'] . "<br>\n";
echo 'asn: ' . $myresult['asn'] . "<br>\n";
echo 'as: ' . $myresult['as'] . "<br>\n";
echo 'last_seen: ' . $myresult['last_seen'] . "<br>\n";
echo 'threat: ' . $myresult['threat'] . "<br>\n";
echo 'provider: ' . $myresult['provider'] . "<br>\n";
?>

Was this article helpful?

Related Articles