Shady economics of proxy services
Residential proxies are the most demanded type of proxies on the proxy market. Their price increases each year.
In this article, I want to write down my understanding of the economics of proxy services. In particular, I describe types of proxy offerings, their typical clients, and why the majority of the market is supplied by malware.
There are two distinctive types of proxies on the market:
- Data center (server) proxies
- Residential proxies (broadband and mobile)
Data center proxies
As the name implies — such proxies reside in data centers. To create a pool of proxies, proxy providers lease or buy IPV4 subnets. Usually, they are relatively cheap when compared to residential proxies.
Due to IPv4 address exhaustion, providers of such proxies have a limited amount of IP addresses and fixed geographical locations. Buying an IPv4 subnetwork is also tricky nowadays. Hence, proxy providers usually lease them from small owners. Additionally, leasing gives flexibility and allows them to replace or abandon old IPv4 subnetworks.
The limited amount of proxies and their nature allows large companies such as Google, Facebook, LinkedIn, and Twitter to detect and block them.
For example, It can be done by:
- Checking the owner of a subnetwork
- Analyzing activity from a particular subnetwork. If all traffic from a particular IPv4 subnetwork looks fishy, that subnetwork can be blocked. The more clients a particular proxy service has, the higher the chance that some proxies are already blacklisted.
That is not what most of the clients of such services would want to encounter. Because of this, such proxies are famous for web scraping of websites that do not have advanced protection.
Another way to get data center proxies that some criminals use is vulnerability exploitation. Mass scanning of servers or websites can yield tens of thousands of vulnerable targets. Infecting them with malware that can act as a proxy server allows them to create pretty large proxy botnets.
WordPress CMS is the most known target for such attacks. There are more than 30M currently running self-hosted WordPress engines, and there are more than 50 000 of potentially vulnerable plugins. To create a proxy from a hacked Wordpress, attackers don't even need to get inside the operating system. A simple PHP proxy uploaded to WordPress installation is sufficient for most needs. Typically, you won't find this kind of proxies on the public market.
Here comes the interesting part.
Unlike data center proxies, residential proxies reside in home ISPs, usually, on real devices such as home routers and mobile devices. Proxy providers use various approaches to obtain such IPs.
For example, they:
- Encourage users to install a proxy server software and give them monetary compensation
- Encourage users to download something that silently installs proxy software along with a legitimate software
- Rent unused bandwidth and IPs from ISPs (it's possible in some countries)
- Find vulnerabilities in routers or IoT devices and infect them with malware
- Hack popular websites and Infect users by using an exploit kit
- Inject proxy software into free Android apps
- Buy a lot of sim-cards and put them into a sim server
I think the majority of proxy services pretend that they are using the first approach, but in reality, they usually use illegal options. It's a very challenging task to recruit millions of users that are willing to opt-in and share their devices for small benefits. Many customers of such services are not aware that residential proxies mostly come from infected devices, and thus, they are part of an illegal scheme. There is an excellent study (Resident Evil: Understanding Residential IP Proxy as a Dark Service) on this matter that provides many insights and proves this point. Proxy service is the most common way to monetize a botnet now.
Residential proxy services use reverse backconnect gateways to hide that fact. Instead of connecting to the proxy endpoints directly, their clients connect to a special gateway on provided ports that relay all requests to the infected devices from their pool. Each port relays to a particular device and automatically switches to another device after some time (1-15 minutes) or when it becomes unavailable.
Such gateways also manage proxy rotation, check that infected devices are still online, provide geographical filtering, and manage access to them.
There are more than 70 million residential proxies when combined from all services. That is a huge number of proxies that can't be achieved with data center proxies.
The monthly cost of such proxies/ports varies from $30 to $10000 depending on the quantity, quality, number of concurrent connections, geolocation, and amount of clients. Some services charge on a traffic basis in the range from $5 to $100 per GB of traffic.
Residential proxies are much harder to detect and usually have a good history since legitimate users use the same IPs for their needs. That is why they are popular.
Who can afford such expensive proxies, and why do they need them?
When it comes to the most expensive residential proxies, the number one use-case for them is AD fraud. That is a huge market that generates billions of losses for advertisers each year.
Explaining all fraud schemes would take a few articles, so here are the most popular AD fraud schemes:
- Click fraud
- Views fraud (ad views, pages vies, video views)
- Mobile app Installs fraud
Fraudsters are using real browsers to mimic real users. Having access to IPs that have a prior and real history in AD systems helps them avoid detection.
Some techniques that they are using to gain profit:
- Buying small websites with history, putting ads on them, and driving fake ad impressions to them
- Creating fake ad agencies
- Partnering with small ad agencies, websites, and mobile app owners
- Offering services to waste ad budgets of competitors by clicking on their ads
According to recent research that I've read, desktop fraud is slowly dying, but mobile AD fraud is growing.
Social networks bots
Almost every social network is targeted by various bots and they need a lot of IPs to bypass antispam and other security systems.
The most common purposes of bots:
- Bulk account registration that drives fake likes, views, and follows
- Spam. This is becoming obsolete because of modern antispam techniques.
Another popular market is bots for limited goods. They find and speed up the checkout process of limited goods. For example, people buy tens or hundreds of sneakers from limited collections and resell them with a 100-1000% margin. Big brands are trying hard to stop this, and such bots implement sophisticated systems to mimic legit transactions. Since the margins can be big, fraudsters use different identities, physical addresses, credit cards, and IPs.
During the peak of the pandemic, gaming consoles and graphic cards were popular targets too.
Web scraping and other markets
Since residential proxies are pretty expensive, web scrapers use them only for sophisticated crawlers. They usually crawl popular and well-protected resources such as Linkedin, Facebook, Twitter, AngelList, Crunchbase, and Google SERP.
There are also a lot of small use-cases. For example:
- AD and price intelligence. Marketing companies analyze competitor ADs and prices from different geographical locations.
- Online poll fraud