Site slow due to mass scraping / DDOS from Alibaba Singapore datacenter

overscan (PaulMM) · Thursday at 07:31

Hi all - noticed site was sluggish and have discovered thousands of open connections emanating from Alibaba's Singapore datacentre.

Not sure if its intended to take down the forum, or they are cloning the forum posts or something,

I have completed a planned future transition to using Cloudflare WAF/DDOS protection for the site earlier than intended. This will progressively kick in over the next 24 hours.

As a temporary measure users from Singapore will have to solve a challenge to view the site. This is hopefully temporary while I figure out what is going on.

Grey Havoc · Thursday at 07:41

Ow. Good luck with the changeover!

overscan (PaulMM) · Thursday at 07:43

Its actually stabilised and back to okay speed even with 2500 connections open, though the Cloudflare protection isn't in effect yet. I'm still going ahead with it, because this kind of thing is likely to reoccur and I need to have tools to restrict traffic if needed.

lancer21 · Thursday at 08:06

I was wondering why the site was loading so slowly today.

Archibald · Thursday at 08:18

Guess what ? the PRC is downloading every single bit of this forum, for obvious reasons - lot of cool military stuff.

Rosdivan · Thursday at 10:24

Archibald said:
Guess what ? the PRC is downloading every single bit of this forum, for obvious reasons - lot of cool military stuff.

No, it’s been a general issue with forums lately that bad bots are just wreaking havoc with unintentional DDOS, probably from LLM training scraping.

southwestforests · Thursday at 10:34

The Starship Modeler forum has been having what I expect is a similar issue.

If y'all will allow the liberty of sharing what I posted there in an attempt to perhaps increase general understanding,

Fri May 09, 2025 3:50 pm

...

That 'All' brings to mind things such as,

Google and OpenAI Are Slowing Down Your Website
A Smarter Hosting Fix
March 1, 2025
3 minutes read

Google and OpenAI Are Slowing Down Your Website

Increasing website performance is a critical factor for user experience, search engine rankings and business success in today's digital ecosystem

aijourn.com

The Silent Strain of AI-powered web crawlers.

Artificially intelligent web crawlers are also called bots or spiders and search the web to index content and gather data. Search engine bots like Googlebot have always existed on the internet but AI giants like OpenAI have pushed up crawler activity. These AI bots skim the surface too – they also scrape large amounts of data to feed machine learning models. This deep crawling can unintentionally cause problems.

Increased Bandwidth Consumption

Each visit by a crawler uses bandwidth. Some visits from search engine bots are manageable, AI crawlers are more aggressive. This excess traffic may soon exhaust your bandwidth limits – especially on shared hosting plans – and cause throttled performance or overage charges.

Overloading servers and performance lag

AI bots may send thousands of requests in a row and overload servers. That leads to slower page load times / timeouts / and sometimes full site outages. For small to mid-sized websites on shared or virtual private server hosting, this is disastrous.

‘Grey bots’ inundating websites to feed genAI
Real users crowded out as automated agents swarm.
By David Braue on Apr 10 2025 11:43 AM

‘Grey bots’ inundating websites to feed genAI

Real users crowded out as automated agents swarm.

ia.acs.org.au

Between December and the end of February alone, Barracuda reports, the Anthropic-owned ClaudeBot lodged up to 2.5 million requests for data in a day – with one application seeing more than 9.7 million requests in a month, and another fielding over 500,000 in a day.

The volume of requests remained relatively consistent over the course of a day, with an average of around 17,000 requests per hour, suggesting that the grey bots were running at a steady pace to find and download whatever data they could come across.

This is far more measured than the traffic floods created by bad bots, which a recent analysis found account for over a third of Australia’s Internet traffic as they pummel popular sites to snag choice concert tickets, harvest personal data, and commit ad fraud.

Yet “both scenarios – constant bombardment or unexpected, ad hoc traffic surges – present challenges for web applications,” Barracuda senior principal software engineer Rahul Gupta noted, with copyright, privacy and other legal issues only the tip of the iceberg.

...

With AI giants and startups introducing new genAI scrapers on a regular basis – and many using sneaky tactics to avoid detection – managing their impact remains a cat-and-mouse game.

Grey bots threaten to crowd out legitimate users as they inundate servers with requests for data, often ignoring website owners’ requests, using a widespread method called the Robots Exclusion Protocol (REP) and its robots.txt file, to move the bots on.

Bots now account for over half of all internet traffic
News
By Sead Fadilpašić
published 16 April 2025

Bots now account for over half of all internet traffic

Plenty of the traffic falls on bad bots

www.techradar.com

With the proliferation of Generative AI, things are only going to get worse, Imperva further states. ByteSpider Bot alone is apparently responsible for more than half (54%) of all AI-enabled attacks. Other significant contributors include AppleBot (26%), ClaudeBot (13%), and ChatGPT User Bot (6%).

Not all bot traffic is malicious, though. There are many useful, and often essential bots, such as search engine crawlers, monitoring bots, social media bots, or data scraping bots. They are used to index websites for search engines, check websites for performance or downtime, schedule posts or respond automatically, or to aggregate sites and scrape valuable data.

Still, bad bots take up a hefty portion of all bot traffic, presenting a real challenge for the cybersecurity community.

These tools, whose popularity exploded roughly three years ago with the introduction of Chat-GPT, have simplified the creation and scaling of malicious bots, Imperva noted.

Mike Elgan
by Mike Elgan
Contributing Columnist

Inside the war between genAI and the internet
opinion
Mar 28, 20256 mins

Inside the war between genAI and the internet

Generative AI companies are not only taking data without permission, they're also sabotaging the sites they're stealing from.

www.computerworld.com

While much digital ink has been spilled decrying the taking of content, it’s also important to know that the chatbot companies are overwhelming many of the sites they’re copying content from, much like a daily DDOS attack.

overscan (PaulMM) · Thursday at 15:49

So - they were able to evade my protection by accessing the site directly by the old IP. I've now closed this loophole down (hopefully most people switched to Cloudflare IPs by now).

Its quite possible its a new AI scraping data rather than a deliberate attack but the quantity of requests was overwhelming.

GTX · Thursday at 16:04

And here I was thinking it was my problem.

southwestforests · Thursday at 16:22

overscan (PaulMM) said:
Hi all - noticed site was sluggish and have discovered thousands of open connections emanating from Alibaba's Singapore datacentre.

Not sure if its intended to take down the forum, or they are cloning the forum posts or something,

Hmm, wonder if that Alibaba are the same as this Alibaba who appearo to have the Alibaba sales website?

Building a Custom Chatbot with Web-Scraping and Alibaba Cloud Model Studio
JwdShah
December 16, 2024

Building a Custom Chatbot with Web-Scraping and Alibaba Cloud Model Studio

This blog describes the process of building a chatbot that uses web-scraped data and advanced prompt engineering to deliver precise and relevant answers.

www.alibabacloud.com

Chatbots are transforming how businesses interact with customers, offering instant and accurate responses tailored to specific needs. With Alibaba Cloud Model Studio and web-scraping techniques, creating a domain-specific chatbot has never been easier. This blog walks you through the process of building a chatbot that uses web-scraped data and advanced prompt engineering to deliver precise and relevant answers.

...

Step 1: Web-Scraping for Data Collection

In this blog, we will use web-scrapping that allow us to extract information directly from websites. This data forms the chatbot’s knowledge base. Using a simple Python script, you can scrape text content and save it for later use.

View the web-scraping code here.

All you need is to replace the "target_url" with your desire website URL. Once you run this file, it will create a text file containing the data of website.

The extracted data ensures that the chatbot's responses are not only accurate but also relevant to the latest information available online.

See for connection reference:

About Alibaba Cloud: The Pulse of Digitalization

Alibaba Cloud is a global leader in cloud computing and AI with reliable and secure cloud services for customers in more than 200 countries and regions. Come and explore the Digital World with Us!

www.alibabacloud.com

overscan (PaulMM) · Thursday at 22:28

Alibaba is a Chinese owned cloud infrastructure provider like Amazon AWS or Microsoft Azure. Anyone with a credit card can build stuff there.

Training_Dummy · Thursday at 22:37

Every Perplexity Pro search related to obscure aerospace and defense subjects hits this site. Would be interesting to know if the posts from really high impact / knowledgeable users gets a heavier weight in the models.

Archibald · 2025-05-16T03:19:14-0400

Training_Dummy said:
Every Perplexity Pro search related to obscure aerospace and defense subjects hits this site. Would be interesting to know if the posts from really high impact / knowledgeable users gets a heavier weight in the models.

Lol good luck to the A.I trying to make sense of my posts. It will blow a few gaskets, I tell you.

overscan (PaulMM) · 2025-05-16T04:09:32-0400

Overall I'm pretty pleased with the transition. I did lose access to the forum from my mobile for a few hours until the DNS changes propagated.

36,000 +requests from Singapore, only 0.17% CAPCHA's solved.

Grey Havoc · 2025-05-16T04:34:59-0400

Deploy the Black ICE!

Archibald · 2025-05-16T05:06:27-0400

overscan (PaulMM) said:
Overall I'm pretty pleased with the transition. I did lose access to the forum from my mobile for a few hours until the DNS changes propagated.

View attachment 770263
36,000 +requests from Singapore, only 0.17% CAPCHA's solved.

According to Google that's 62 something. Do they represent real people ?

overscan (PaulMM) · 2025-05-16T08:10:51-0400

Archibald said:
According to Google that's 62 something. Do they represent real people ?

Most likely yes.

Graham1973 · 2025-05-16T08:16:36-0400

I lost access to the forum for most of the day. I've only just been able to log in. Every other attempt I made got a 'server is down' message.

overscan (PaulMM) · 2025-05-16T08:16:38-0400

I'm seeing a lot of traffic from Brazil, but little registered users. Seems suspicious.

Apparently bots account for more than half of all internet traffic now.

overscan (PaulMM) · 2025-05-16T08:20:16-0400

Graham1973 said:
I lost access to the forum for most of the day. I've only just been able to log in. Every other attempt I made got a 'server is down' message.

Yeah the nameserver records take up to 24 hours to update in worst case. You must have been one of those unfortunate people.

I wouldn't have closed down access to the previous IP addresses this morning, except the bad traffic was still arriving and seriously overloading the server, so I decided it was better for 75% of people to get good speed access than 100% of people to get super slow access.

I actually think its improved the site speed generally, but more importantly I now have the ability to rate limit, block and otherwise defend the forum from such traffic.

sublight_ · 2025-05-16T10:06:10-0400

Paul, next time you change DNS set the TTL to five minutes a week in advance.

overscan (PaulMM) · 2025-05-16T10:09:57-0400

sublight_ said:
Paul, next time you change DNS set the TTL to five minutes a week in advance.

Sure, that was my plan, but I wasn't prepared to do it yet - this was an emergency response.

I've done DNS migrations many times at work without downtime.

Sferrin · 2025-05-16T10:12:26-0400

overscan (PaulMM) said:
Hi all - noticed site was sluggish and have discovered thousands of open connections emanating from Alibaba's Singapore datacentre.

Not sure if its intended to take down the forum, or they are cloning the forum posts or something,

I have completed a planned future transition to using Cloudflare WAF/DDOS protection for the site earlier than intended. This will progressively kick in over the next 24 hours.

As a temporary measure users from Singapore will have to solve a challenge to view the site. This is hopefully temporary while I figure out what is going on.

Good to know you're appreciated.

MarcusStarkiller · 2025-05-16T12:13:17-0400

overscan (PaulMM) said:
Hi all - noticed site was sluggish and have discovered thousands of open connections emanating from Alibaba's Singapore datacentre.

Not sure if its intended to take down the forum, or they are cloning the forum posts or something,

I have completed a planned future transition to using Cloudflare WAF/DDOS protection for the site earlier than intended. This will progressively kick in over the next 24 hours.

As a temporary measure users from Singapore will have to solve a challenge to view the site. This is hopefully temporary while I figure out what is going on.

Yikes!

martinbayer · 2025-05-16T12:55:05-0400

Archibald said:
Lol good luck to the A.I trying to make sense of my posts. It will blow a few gaskets, I tell you.

Keep on fighting the good fight

!

DWG · 2025-05-16T12:58:04-0400

overscan (PaulMM) said:
Yeah the nameserver records take up to 24 hours to update in worst case. You must have been one of those unfortunate people.

I wouldn't have closed down access to the previous IP addresses this morning, except the bad traffic was still arriving and seriously overloading the server, so I decided it was better for 75% of people to get good speed access than 100% of people to get super slow access.

I lost access from 19:45 UK time yesterday til midday-ish today, but if that let you keep the forum up in the face of a DDOS, intentional or not, then that works fine for me.

Scott Kenny · 2025-05-16T22:18:25-0400

Grey Havoc said:
Deploy the Black ICE!

The hell with Black ICE, let's start dropping the RFGs onto the sources of this crap...

quellish · 2025-05-16T22:27:19-0400

Many many years ago I was working on a very busy website that had a similar problem, before the terms "DDoS" or "botnet" were coined. It got to the point where the clients were difficult to block by IP, they would just start coming from someplace else. I realized that all of the clients were coming from Windows machines and appeared to be using some of the built-in software to make the requests.

After getting approval from the company owner I wrote a script that reliably identified the pattern of an attack. It then sent the client an HTTP redirect that went to a file URL on the local machine - \LPT1\LPT1 , the printer port. Their software would "open" the printer port like it was a file, which triggered a Windows Protection Error (crashing their Windows machines).

That stopped the attacks quickly and got the site running again. It took the bad guys a few months to figure out what was happening to them.

EmoBirb · 2025-05-16T23:01:05-0400

quellish said:
That stopped the attacks quickly and got the site running again. It took the bad guys a few months to figure out what was happening to them.

Modern problems require modern solutions lol

aim9xray · 2025-05-16T23:19:06-0400

quellish said:
It took the bad guys a few months to figure out what was happening to them.

Then what happened?

quellish · 2025-05-16T23:22:51-0400

aim9xray said:
Then what happened?

They changed tactics, so did I. Eventually I found that they were coordinating through IRC and was able to join them there. Users in the channel could request attacks. So I requested they attack us, watched the attacks roll in, developed countermeasures. Eventually they blacklisted our sites for attacks. If you requested it in IRC a bot would deny the request and not trigger the attack.

overscan (PaulMM) · 2025-05-17T00:12:13-0400

quellish said:
Many many years ago I was working on a very busy website that had a similar problem, before the terms "DDoS" or "botnet" were coined. It got to the point where the clients were difficult to block by IP, they would just start coming from someplace else. I realized that all of the clients were coming from Windows machines and appeared to be using some of the built-in software to make the requests.

After getting approval from the company owner I wrote a script that reliably identified the pattern of an attack. It then sent the client an HTTP redirect that went to a file URL on the local machine - \LPT1\LPT1 , the printer port. Their software would "open" the printer port like it was a file, which triggered a Windows Protection Error (crashing their Windows machines).

That stopped the attacks quickly and got the site running again. It took the bad guys a few months to figure out what was happening to them.

I worked for Yellow Pages and an engineer used to block by IP for users scraping the White Pages website. He got tired of them swapping IPs in the same subnet and started using IP bans on entire network addresses. When he left, entire ISPs were blocked and we had loads of users complaining they couldn't access the site. It seemed like we were blocking a significant portion of NZ IP addresses.

We got rid of all the IP blocks and went with a more sophisticated Web Application Firewall approach instead.

Site slow due to mass scraping / DDOS from Alibaba Singapore datacenter

Administrator

ACCESS: USAP

Administrator

ACCESS: Top Secret

ACCESS: USAP

ACCESS: Confidential

ACCESS: Top Secret

Administrator

All hail the God of Frustration!!!

ACCESS: Top Secret

Administrator

ACCESS: Confidential

ACCESS: USAP

Administrator

ACCESS: USAP

ACCESS: USAP

Administrator

ACCESS: Top Secret

Administrator

Administrator

ACCESS: Top Secret

Administrator

ACCESS: USAP

ACCESS: Restricted

ACCESS: Top Secret

ACCESS: Top Secret

ACCESS: USAP

I don’t read The Drive. The Drive reads me.

Don't drink the drone kool-aid | Certified Meanie

ACCESS: Top Secret

I don’t read The Drive. The Drive reads me.

Administrator

Similar threads