
Intro #
The rapid rise of AI-powered crawlers, used to train large language models and power search, summarization, and recommendation systems has introduced a new kind of strain on websites. Unlike traditional web crawlers, which typically follow predictable patterns and respect rate limits, many AI crawlers aggressively scrape vast amounts of content at high frequency, often requesting entire archives rather than incremental updates.
This surge in automated traffic can overwhelm servers, significantly increase bandwidth costs, and degrade performance for legitimate users. What was once manageable background activity has evolved into a persistent operational challenge.
As a result, website operators are increasingly facing challenges in distinguishing legitimate human traffic from AI-driven requests, forcing them to rethink caching strategies, rate limiting, and access controls to maintain stability. For smaller organizations in particular, the added complexity can strain both financial resources and technical capacity, turning routine web management into a demanding and ongoing battle for stability and efficiency.
Blocking AI Crawlers #
Using the IP2Location and IP2Proxy databases #
To address this growing challenge, IP2Location now delivers advanced AI crawler detection through the IP2Location DB26 database and the IP2Proxy PX12 database. It helps businesses to easily identify and flag traffic from major AI platforms such as OpenAI, Gemini, and Anthropic. This empowers website operators to better protect their content, optimize resource allocation, and implement more effective access control strategies.
Option 1: Using IP2Location DB26 usage type to block AI crawlers #
For our example, we’ll be importing the DB26 IPv4 CSV data into a MySQL table called ip2location_db26. Then, we’ll query the usage_type field in the table for the IP address of the website visitor. If you see the usage type is SES/AIC then that IP address belongs to an AI crawler. It is up to you if you wish to block all traffic from this IP address or perhaps rate limit the incoming requests to save your bandwidth and server resources.
NOTE: Only DB26 or higher contains this SES/AIC usage type, lower ones will just show SES for the usage type.
First, let’s create the MySQL table and import the DB26 data into it by following these steps:
- If you don’t have a paid subscription to the DB26 database, you will need to subscribe to the DB26.
- Once you have an active subscription to the DB26 database, login to your user dashboard and download the DB26 IPv4 CSV zipped file.
- Extract the file IP-COUNTRY-REGION-CITY-LATITUDE-LONGITUDE-ZIPCODE-TIMEZONE-ISP-DOMAIN-NETSPEED-AREACODE-WEATHER-MOBILE-ELEVATION-USAGETYPE-ADDRESSTYPE-CATEGORY-DISTRICT-ASN.CSV from the zipped file and save it somewhere on your computer.
- Create the MySQL table called ip2location_db26 using the below SQL:
CREATE DATABASE ip2location; USE ip2location; CREATE TABLE `ip2location_db26`( `ip_from` INT(10) UNSIGNED, `ip_to` INT(10) UNSIGNED, `country_code` CHAR(2), `country_name` VARCHAR(64), `region_name` VARCHAR(128), `city_name` VARCHAR(128), `latitude` DOUBLE, `longitude` DOUBLE, `zip_code` VARCHAR(30), `time_zone` VARCHAR(8), `isp` VARCHAR(256), `domain` VARCHAR(128), `net_speed` VARCHAR(8), `idd_code` VARCHAR(5), `area_code` VARCHAR(30), `weather_station_code` VARCHAR(10), `weather_station_name` VARCHAR(128), `mcc` VARCHAR(256), `mnc` VARCHAR(256), `mobile_brand` VARCHAR(128), `elevation` INT(10), `usage_type` VARCHAR(11), `address_type` CHAR(1), `category` VARCHAR(10), `district` VARCHAR(128), `asn` VARCHAR(10), `as` VARCHAR(256), `as_domain` VARCHAR(128), `as_usage_type` VARCHAR(11), `as_cidr` VARCHAR(43), PRIMARY KEY (`ip_to`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin; - Import the CSV data into the table using the below SQL:
LOAD DATA LOCAL INFILE 'IP-COUNTRY-REGION-CITY-LATITUDE-LONGITUDE-ZIPCODE-TIMEZONE-ISP-DOMAIN-NETSPEED-AREACODE-WEATHER-MOBILE-ELEVATION-USAGETYPE-ADDRESSTYPE-CATEGORY-DISTRICT-ASN.CSV' INTO TABLE `ip2location_db26` FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n';
Now that the DB26 database is ready to be queried, let’s copy the below PHP code into a text file called test.php. Update the MySQL credentials in the code to match your server’s settings.
<?php
// Database config
$host = 'localhost';
$db = 'ip2location';
$user = 'your_db_user';
$pass = 'your_db_password';
function getVisitorIP() {
if (!empty($_SERVER['HTTP_CF_CONNECTING_IP'])) {
return $_SERVER['HTTP_CF_CONNECTING_IP']; // Cloudflare
} elseif (!empty($_SERVER['HTTP_X_FORWARDED_FOR'])) {
return explode(',', $_SERVER['HTTP_X_FORWARDED_FOR'])[0];
} else {
return $_SERVER['REMOTE_ADDR'];
}
}
try {
// Create PDO connection
$pdo = new PDO("mysql:host=$host;dbname=$db;charset=utf8", $user, $pass, [
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC,
]);
// Get visitor IP
$ip = getVisitorIP();
// Convert IP to unsigned integer
$ipLong = sprintf('%u', ip2long($ip));
// Query database
$stmt = $pdo->prepare('SELECT `usage_type` FROM `ip2location_db26` WHERE `ip_to` >= :ip LIMIT 1');
$stmt->execute(['ip' => $ipLong]);
$result = $stmt->fetch(PDO::FETCH_ASSOC);
// Check usage_type
if ($result && $result['usage_type'] === 'SES/AIC') {
// Block request
header('HTTP/1.1 403 Forbidden');
echo 'Access denied.';
exit;
}
// Continue normal page execution
echo 'Access allowed.';
} catch (PDOException $e) {
die('Database error: ' . $e->getMessage());
}
Upload the test.php to your web server and whenever an AI crawler hits that page, it will be blocked.
Option 2: Using IP2Proxy PX12 proxy type to block AI crawlers #
For our example, we’ll be importing the PX12 IPv4 CSV data into a MySQL table called ip2proxy_px12. Then, we’ll query the proxy_type field in the table for the IP address of the website visitor. If you see the proxy type is AIC then that IP address belongs to an AI crawler. It is up to you if you wish to block all traffic from this IP address or perhaps rate limit the incoming requests to save your bandwidth and server resources.
NOTE: Only PX12 or higher contains this AIC proxy type, lower ones will just show SES for the proxy type.
First, let’s create the MySQL table and import the PX12 data into it by following these steps:
- If you don’t have a paid subscription to the PX12 database, you will need to subscribe to the PX12.
- Once you have an active subscription to the PX12 database, login to your user dashboard and download the PX12 IPv4 CSV zipped file.
- Extract the file IP2PROXY-IP-PROXYTYPE-COUNTRY-REGION-CITY-ISP-DOMAIN-USAGETYPE-ASN-LASTSEEN-THREAT-RESIDENTIAL-PROVIDER-FRAUDSCORE.CSV from the zipped file and save it somewhere on your computer.
- Create the MySQL table called ip2proxy_px12 using the below SQL:
CREATE DATABASE ip2proxy; USE ip2proxy; CREATE TABLE `ip2proxy_px12`( `ip_from` INT(10) UNSIGNED, `ip_to` INT(10) UNSIGNED, `proxy_type` VARCHAR(3), `country_code` CHAR(2), `country_name` VARCHAR(64), `region_name` VARCHAR(128), `city_name` VARCHAR(128), `isp` VARCHAR(256), `domain` VARCHAR(128), `usage_type` VARCHAR(11), `asn` VARCHAR(10), `as` VARCHAR(256), `last_seen` INT(10), `threat` VARCHAR(128), `provider` VARCHAR(256), `fraud_score` INT(10), PRIMARY KEY (`ip_to`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin; - Import the CSV data into the table using the below SQL:
LOAD DATA LOCAL INFILE 'IP2PROXY-IP-PROXYTYPE-COUNTRY-REGION-CITY-ISP-DOMAIN-USAGETYPE-ASN-LASTSEEN-THREAT-RESIDENTIAL-PROVIDER-FRAUDSCORE.CSV' INTO TABLE `ip2proxy_px12` FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
Now that the PX12 database is ready to be queried, let’s copy the below PHP code into a text file called test.php. Update the MySQL credentials in the code to match your server’s settings.
NOTE: The query SQL for PX12 is slightly different from the DB26 because the IP ranges inside the PX12 database are not contiguous, hence the need for a subquery to optimize the query performance.
<?php
// Database config
$host = 'localhost';
$db = 'ip2proxy';
$user = 'your_db_user';
$pass = 'your_db_password';
function getVisitorIP() {
if (!empty($_SERVER['HTTP_CF_CONNECTING_IP'])) {
return $_SERVER['HTTP_CF_CONNECTING_IP']; // Cloudflare
} elseif (!empty($_SERVER['HTTP_X_FORWARDED_FOR'])) {
return explode(',', $_SERVER['HTTP_X_FORWARDED_FOR'])[0];
} else {
return $_SERVER['REMOTE_ADDR'];
}
}
try {
// Create PDO connection
$pdo = new PDO("mysql:host=$host;dbname=$db;charset=utf8", $user, $pass, [
PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION,
PDO::ATTR_DEFAULT_FETCH_MODE => PDO::FETCH_ASSOC,
]);
// Get visitor IP
$ip = getVisitorIP();
// Convert IP to unsigned integer
$ipLong = sprintf('%u', ip2long($ip));
// Query database
$stmt = $pdo->prepare('SELECT `proxy_type` FROM (SELECT `ip_from`, `proxy_type` FROM `ip2proxy_px12` WHERE :ip <= `ip_to` LIMIT 1) AS mytable WHERE `ip_from` <= :ip');
$stmt->execute(['ip' => $ipLong]);
$result = $stmt->fetch(PDO::FETCH_ASSOC);
// Check proxy_type
if ($result && $result['proxy_type'] === 'AIC') {
// Block request
header('HTTP/1.1 403 Forbidden');
echo 'Access denied.';
exit;
}
// Continue normal page execution
echo 'Access allowed.';
} catch (PDOException $e) {
die('Database error: ' . $e->getMessage());
}
Upload the test.php to your web server and whenever an AI crawler hits that page, it will be blocked.
API alternative #
Using the IP2Location.io API to block AI crawlers #
If maintaining an up-to-date database is too much of a hassle, there is an alternative which is to call the IP2Location.io REST API. There is no maintenance of any kind required so this would be a good option for a smaller organization with less IT personnel or resources.
NOTE: Only the Security plan or higher has access to the is_ai_crawler field.
If you don’t have a subscription to the IP2Location.io Security plan, please subscribe before you proceed further.
After your Security plan subscription is activated, you just login to the dashboard and retrieve your API key which we’ll use in our sample code below.
Now, copy the below PHP code into a text file called test.php. Update the API key in the code to match your API key from above.
<?php
$ch = curl_init();
$key = 'YOUR_API_KEY'; // IP2Location.io API key
function getVisitorIP() {
if (!empty($_SERVER['HTTP_CF_CONNECTING_IP'])) {
return $_SERVER['HTTP_CF_CONNECTING_IP']; // Cloudflare
} elseif (!empty($_SERVER['HTTP_X_FORWARDED_FOR'])) {
return explode(',', $_SERVER['HTTP_X_FORWARDED_FOR'])[0];
} else {
return $_SERVER['REMOTE_ADDR'];
}
}
// Get visitor IP
$ip = getVisitorIP();
curl_setopt($ch, CURLOPT_URL, 'https://api.ip2location.io/?' . http_build_query([
'ip' => $ip,
'key' => $key,
'format' => 'json',
]));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$response = curl_exec($ch);
$curl_errno = curl_errno($ch);
$curl_error = curl_error($ch);
curl_close($ch);
if ($curl_errno == 0) {
$myobj = json_decode((string) $response, true);
if ($myobj !== null) {
if (isset($myobj['proxy']) && isset($myobj['proxy']['is_ai_crawler'])) {
if ($myobj['proxy']['is_ai_crawler'] === true) {
// Block request
header('HTTP/1.1 403 Forbidden');
echo 'Access denied.';
exit;
}
}
}
}
// Continue normal page execution
echo 'Access allowed.';
Upload the test.php to your web server and whenever an AI crawler hits that page, it will be blocked.
Conclusion #
We’ve outlined three practical options above to help you block AI crawlers on your website. Choose the approach that best fits your technical setup and requirements.
The sample code provided is intentionally kept minimal to make it easy to understand and adapt. You can readily customize and extend it to suit your real-world implementation needs.
