Site slow due to mass scraping / DDOS from Alibaba Singapore datacenter

overscan (PaulMM)

Administrator
Staff member
Joined
27 December 2005
Messages
17,816
Reaction score
27,043
Hi all - noticed site was sluggish and have discovered thousands of open connections emanating from Alibaba's Singapore datacentre.

Not sure if its intended to take down the forum, or they are cloning the forum posts or something,

I have completed a planned future transition to using Cloudflare WAF/DDOS protection for the site earlier than intended. This will progressively kick in over the next 24 hours.

As a temporary measure users from Singapore will have to solve a challenge to view the site. This is hopefully temporary while I figure out what is going on.
 
Last edited:
The Starship Modeler forum has been having what I expect is a similar issue.

If y'all will allow the liberty of sharing what I posted there in an attempt to perhaps increase general understanding,

Fri May 09, 2025 3:50 pm

...

That 'All' brings to mind things such as,

Google and OpenAI Are Slowing Down Your Website
A Smarter Hosting Fix
March 1, 2025
3 minutes read

The Silent Strain of AI-powered web crawlers.

Artificially intelligent web crawlers are also called bots or spiders and search the web to index content and gather data. Search engine bots like Googlebot have always existed on the internet but AI giants like OpenAI have pushed up crawler activity. These AI bots skim the surface too – they also scrape large amounts of data to feed machine learning models. This deep crawling can unintentionally cause problems.

Increased Bandwidth Consumption

Each visit by a crawler uses bandwidth. Some visits from search engine bots are manageable, AI crawlers are more aggressive. This excess traffic may soon exhaust your bandwidth limits – especially on shared hosting plans – and cause throttled performance or overage charges.

Overloading servers and performance lag

AI bots may send thousands of requests in a row and overload servers. That leads to slower page load times / timeouts / and sometimes full site outages. For small to mid-sized websites on shared or virtual private server hosting, this is disastrous.


‘Grey bots’ inundating websites to feed genAI
Real users crowded out as automated agents swarm.
By David Braue on Apr 10 2025 11:43 AM

Between December and the end of February alone, Barracuda reports, the Anthropic-owned ClaudeBot lodged up to 2.5 million requests for data in a day – with one application seeing more than 9.7 million requests in a month, and another fielding over 500,000 in a day.

The volume of requests remained relatively consistent over the course of a day, with an average of around 17,000 requests per hour, suggesting that the grey bots were running at a steady pace to find and download whatever data they could come across.

This is far more measured than the traffic floods created by bad bots, which a recent analysis found account for over a third of Australia’s Internet traffic as they pummel popular sites to snag choice concert tickets, harvest personal data, and commit ad fraud.

Yet “both scenarios – constant bombardment or unexpected, ad hoc traffic surges – present challenges for web applications,” Barracuda senior principal software engineer Rahul Gupta noted, with copyright, privacy and other legal issues only the tip of the iceberg.
...
With AI giants and startups introducing new genAI scrapers on a regular basis – and many using sneaky tactics to avoid detection – managing their impact remains a cat-and-mouse game.

Grey bots threaten to crowd out legitimate users as they inundate servers with requests for data, often ignoring website owners’ requests, using a widespread method called the Robots Exclusion Protocol (REP) and its robots.txt file, to move the bots on.


Bots now account for over half of all internet traffic
News
By Sead Fadilpašić
published 16 April 2025
With the proliferation of Generative AI, things are only going to get worse, Imperva further states. ByteSpider Bot alone is apparently responsible for more than half (54%) of all AI-enabled attacks. Other significant contributors include AppleBot (26%), ClaudeBot (13%), and ChatGPT User Bot (6%).

Not all bot traffic is malicious, though. There are many useful, and often essential bots, such as search engine crawlers, monitoring bots, social media bots, or data scraping bots. They are used to index websites for search engines, check websites for performance or downtime, schedule posts or respond automatically, or to aggregate sites and scrape valuable data.

Still, bad bots take up a hefty portion of all bot traffic, presenting a real challenge for the cybersecurity community.

These tools, whose popularity exploded roughly three years ago with the introduction of Chat-GPT, have simplified the creation and scaling of malicious bots, Imperva noted.


Mike Elgan
by Mike Elgan
Contributing Columnist

Inside the war between genAI and the internet
opinion
Mar 28, 20256 mins

While much digital ink has been spilled decrying the taking of content, it’s also important to know that the chatbot companies are overwhelming many of the sites they’re copying content from, much like a daily DDOS attack.
 
So - they were able to evade my protection by accessing the site directly by the old IP. I've now closed this loophole down (hopefully most people switched to Cloudflare IPs by now).

Its quite possible its a new AI scraping data rather than a deliberate attack but the quantity of requests was overwhelming.
 
Last edited:
Hi all - noticed site was sluggish and have discovered thousands of open connections emanating from Alibaba's Singapore datacentre.

Not sure if its intended to take down the forum, or they are cloning the forum posts or something,

Hmm, wonder if that Alibaba are the same as this Alibaba who appearo to have the Alibaba sales website?

Building a Custom Chatbot with Web-Scraping and Alibaba Cloud Model Studio
JwdShah
December 16, 2024


Chatbots are transforming how businesses interact with customers, offering instant and accurate responses tailored to specific needs. With Alibaba Cloud Model Studio and web-scraping techniques, creating a domain-specific chatbot has never been easier. This blog walks you through the process of building a chatbot that uses web-scraped data and advanced prompt engineering to deliver precise and relevant answers.
...
Step 1: Web-Scraping for Data Collection

In this blog, we will use web-scrapping that allow us to extract information directly from websites. This data forms the chatbot’s knowledge base. Using a simple Python script, you can scrape text content and save it for later use.

View the web-scraping code here.

All you need is to replace the "target_url" with your desire website URL. Once you run this file, it will create a text file containing the data of website.

The extracted data ensures that the chatbot's responses are not only accurate but also relevant to the latest information available online.

See for connection reference:

 
Every Perplexity Pro search related to obscure aerospace and defense subjects hits this site. Would be interesting to know if the posts from really high impact / knowledgeable users gets a heavier weight in the models.
 
Every Perplexity Pro search related to obscure aerospace and defense subjects hits this site. Would be interesting to know if the posts from really high impact / knowledgeable users gets a heavier weight in the models.

Lol good luck to the A.I trying to make sense of my posts. It will blow a few gaskets, I tell you.
 
Overall I'm pretty pleased with the transition. I did lose access to the forum from my mobile for a few hours until the DNS changes propagated.

View attachment 770263
36,000 +requests from Singapore, only 0.17% CAPCHA's solved.

According to Google that's 62 something. Do they represent real people ?
 
I lost access to the forum for most of the day. I've only just been able to log in. Every other attempt I made got a 'server is down' message.
Yeah the nameserver records take up to 24 hours to update in worst case. You must have been one of those unfortunate people.

I wouldn't have closed down access to the previous IP addresses this morning, except the bad traffic was still arriving and seriously overloading the server, so I decided it was better for 75% of people to get good speed access than 100% of people to get super slow access.

I actually think its improved the site speed generally, but more importantly I now have the ability to rate limit, block and otherwise defend the forum from such traffic.
 
Hi all - noticed site was sluggish and have discovered thousands of open connections emanating from Alibaba's Singapore datacentre.

Not sure if its intended to take down the forum, or they are cloning the forum posts or something,

I have completed a planned future transition to using Cloudflare WAF/DDOS protection for the site earlier than intended. This will progressively kick in over the next 24 hours.

As a temporary measure users from Singapore will have to solve a challenge to view the site. This is hopefully temporary while I figure out what is going on.
Good to know you're appreciated. ;)
 
Hi all - noticed site was sluggish and have discovered thousands of open connections emanating from Alibaba's Singapore datacentre.

Not sure if its intended to take down the forum, or they are cloning the forum posts or something,

I have completed a planned future transition to using Cloudflare WAF/DDOS protection for the site earlier than intended. This will progressively kick in over the next 24 hours.

As a temporary measure users from Singapore will have to solve a challenge to view the site. This is hopefully temporary while I figure out what is going on.
Yikes!
 
Yeah the nameserver records take up to 24 hours to update in worst case. You must have been one of those unfortunate people.

I wouldn't have closed down access to the previous IP addresses this morning, except the bad traffic was still arriving and seriously overloading the server, so I decided it was better for 75% of people to get good speed access than 100% of people to get super slow access.
I lost access from 19:45 UK time yesterday til midday-ish today, but if that let you keep the forum up in the face of a DDOS, intentional or not, then that works fine for me.
 
Many many years ago I was working on a very busy website that had a similar problem, before the terms "DDoS" or "botnet" were coined. It got to the point where the clients were difficult to block by IP, they would just start coming from someplace else. I realized that all of the clients were coming from Windows machines and appeared to be using some of the built-in software to make the requests.

After getting approval from the company owner I wrote a script that reliably identified the pattern of an attack. It then sent the client an HTTP redirect that went to a file URL on the local machine - \LPT1\LPT1 , the printer port. Their software would "open" the printer port like it was a file, which triggered a Windows Protection Error (crashing their Windows machines).

That stopped the attacks quickly and got the site running again. It took the bad guys a few months to figure out what was happening to them.
 
Then what happened?

They changed tactics, so did I. Eventually I found that they were coordinating through IRC and was able to join them there. Users in the channel could request attacks. So I requested they attack us, watched the attacks roll in, developed countermeasures. Eventually they blacklisted our sites for attacks. If you requested it in IRC a bot would deny the request and not trigger the attack.
 
Many many years ago I was working on a very busy website that had a similar problem, before the terms "DDoS" or "botnet" were coined. It got to the point where the clients were difficult to block by IP, they would just start coming from someplace else. I realized that all of the clients were coming from Windows machines and appeared to be using some of the built-in software to make the requests.

After getting approval from the company owner I wrote a script that reliably identified the pattern of an attack. It then sent the client an HTTP redirect that went to a file URL on the local machine - \LPT1\LPT1 , the printer port. Their software would "open" the printer port like it was a file, which triggered a Windows Protection Error (crashing their Windows machines).

That stopped the attacks quickly and got the site running again. It took the bad guys a few months to figure out what was happening to them.
I worked for Yellow Pages and an engineer used to block by IP for users scraping the White Pages website. He got tired of them swapping IPs in the same subnet and started using IP bans on entire network addresses. When he left, entire ISPs were blocked and we had loads of users complaining they couldn't access the site. It seemed like we were blocking a significant portion of NZ IP addresses.

We got rid of all the IP blocks and went with a more sophisticated Web Application Firewall approach instead.
 

Similar threads

Back
Top Bottom