使用 ZIP Bombs 来保护我的服务器
作者通过使用 ZIP Bomb 有效地保护了自己的服务器,防止了恶意机器人的攻击。尽管这种方法有其局限性,但对于大多数简单的恶意机器人来说,仍然是一种有效的防护手段。
The majority of the traffic on the web is from bots. For the most part, these bots are used to discover new content. These are RSS Feed readers, search engines crawling your content, or nowadays AI bots crawling content to power LLMs. But then there are the malicious bots. These are from spammers, content scrapers or hackers. At my old employer, a bot discovered a wordpress vulnerability and inserted a malicious script into our server. It then turned the machine into a botnet used for DDOS. One of my first websites was yanked off of Google search entirely due to bots generating spam. At some point, I had to find a way to protect myself from these bots. That's when I started using zip bombs.
网络上的大部分流量来自机器人。这些机器人大多用于发现新内容。例如 RSS 订阅阅读器、抓取内容的搜索引擎,或者如今为 LLM 提供内容支持的人工智能机器人。但也存在恶意机器人。这些机器人来自垃圾邮件发送者、内容抓取者或黑客。在我以前的雇主那里,一个机器人发现了 WordPress 的一个漏洞,并在我们的服务器中插入了一个恶意脚本。然后,它将服务器变成了一个用于 DDOS 攻击的僵尸网络。我的早期网站之一就因为机器人生成垃圾邮件而完全从谷歌搜索结果中下架。后来,我不得不想办法保护自己免受这些机器人的侵害。就在那时,我开始使用 zip 炸弹。
A zip bomb is a relatively small compressed file that can expand into a very large file that can overwhelm a machine.
Zip 炸弹是一种相对较小的压缩文件,但它可以扩展为非常大的文件,从而压垮机器。
A feature that was developed early on the web was compression with gzip. The Internet being slow and information being dense, the idea was to compress data as small as possible before transmitting it through the wire. So an 50 KB HTML file, composed of text, can be compressed to 10K, thus saving you 40KB in transmission. On dial up Internet, this meant downloading the page in 3 seconds instead of 12 seconds.
早期在网络上开发的一个功能是使用 gzip 进行压缩。由于互联网速度慢且信息密集,其理念是在传输数据之前尽可能地压缩数据。因此,一个由文本组成的 50 KB HTML 文件可以压缩到 10 KB,从而节省 40 KB 的传输空间。在拨号上网的情况下,这意味着下载页面只需 3 秒,而不是 12 秒。
This same compression can be used to serve CSS, Javascript, or even images. Gzip is fast, simple and drastically improves the browsing experience. When a browser makes a web request, it includes the headers that signals the target server that it can support compression. And if the server also supports it, it will return a compressed version of the expected data.
同样的压缩技术也适用于 CSS、JavaScript 甚至图像。Gzip 快速、简单,并能显著提升浏览体验。浏览器发出 Web 请求时,会包含标头,告知目标服务器其支持压缩。如果服务器也支持压缩,则会返回预期数据的压缩版本。
Accept-Encoding: gzip, deflate
Bots that crawl the web also support this feature. Especially since their job is to ingest data from all over the web, they maximize their bandwidth by using compression. And we can take full advantage of this feature.
爬取网络的机器人也支持此功能。由于它们的任务是从网络各处获取数据,因此它们会使用压缩来最大化带宽。我们可以充分利用此功能。
On this blog, I often get bots that scan for security vulnerabilities, which I ignore for the most part. But when I detect that they are either trying to inject malicious attacks, or are probing for a response, I return a 200 OK response, and serve them a gzip response. I vary from a 1MB to 10MB file which they are happy to ingest. For the most part, when they do, I never hear from them again. Why? Well, that's because they crash right after ingesting the file.
在这个博客上,我经常会遇到扫描安全漏洞的机器人,而我通常都会忽略它们。但当我检测到它们试图注入恶意攻击或探测响应时,我会返回 200 OK 响应,并向它们发送 gzip 压缩包。我接收的文件大小从 1MB 到 10MB 不等,它们很乐意接收。大多数情况下,即使它们接收了,我也再也没有收到任何消息。为什么?因为它们在接收文件后就崩溃了。
Content-Encoding: deflate, gzip
What happens is, they receive the file, read the header that instructs them that it is a compressed file. So they try to decompress the 1MB file to find whatever content they are looking for. But the file expands, and expands, and expands, until they run out of memory and their server crashes. The 1MB file decompresses into a 1GB. This is more than enough to break most bots. However, for those pesky scripts that won't stop, I serve them the 10MB file. This one decompresses into 10GB and instantly kills the script.
实际情况是,他们收到文件后,读取文件头,发现这是一个压缩文件。于是他们尝试解压这个 1MB 的文件,寻找所需的内容。但文件不断膨胀,直到内存耗尽,服务器崩溃。1MB 的文件解压后变成了 1GB。这足以摧毁大多数机器人。不过,对于那些不停歇的烦人脚本,我会给他们提供 10MB 的文件。这个文件解压后变成了 10GB,脚本瞬间就被干掉了。
Before I tell you how to create a zip bomb, I do have to warn you that you can potentially crash and destroy your own device. Continue at your own risk. So here is how we create the zip bomb:
在告诉你如何制作 Zip 炸弹之前,我必须先警告你,你的设备可能会崩溃甚至被毁。继续操作需自行承担风险。以下是制作 Zip 炸弹的方法:
dd if=/dev/zero bs=1G count=10 | gzip -c > 10GB.gz
Here is what the command does:
该命令的作用如下:
dd: The dd command is used to copy or convert data.
dd :dd 命令用于复制或转换数据。
if: Input file, specifies /dev/zero a special file that produces an infinite stream of zero bytes.
if :输入文件,指定 /dev/zero 一个产生无限零字节流的特殊文件。
bs: block size, sets the block size to 1 gigabyte (1G), meaning dd will read and write data in chunks of 1 GB at a time.
bs :块大小,将块大小设置为 1 千兆字节 (1G),这意味着 dd 将一次以 1 GB 的块读取和写入数据。
count=10: This tells dd to process 10 blocks, each 1 GB in size. So, this will generate 10 GB of zeroed data.
count=10 :这告诉 dd 处理 10 个块,每个块大小为 1 GB。因此,这将生成 10 GB 的零数据。
We then pass the output of the command to gzip which will compress the output into the file 10GB.gz. The resulting file is 10MB in this case.
然后,我们将命令的输出传递给 gzip,它会将输出压缩成 10GB.gz 文件。在本例中,生成的文件大小为 10MB。
On my server, I've added a middleware that checks if the current request is malicious or not. I have a list of black-listed ips that try to scan the whole website repeatedly. I have other heuristics in place to detect spammers. A lot of spammers attempt to spam a page, then come back to see if the spam has made it to the page. I use this pattern to detect them. It looks something like this:
在我的服务器上,我添加了一个中间件,用于检查当前请求是否恶意。我设置了一个黑名单 IP 地址列表,这些 IP 地址会反复尝试扫描整个网站。我还设置了其他启发式方法来检测垃圾邮件发送者。许多垃圾邮件发送者会尝试向某个页面发送垃圾邮件,然后再回来查看垃圾邮件是否已经到达该页面。我使用以下模式来检测它们。它看起来像这样:
if (ipIsBlackListed() || isMalicious()) {
header("Content-Encoding: deflate, gzip");
header("Content-Length: "+ filesize(ZIP_BOMB_FILE_10G)); // 10 MB
readfile(ZIP_BOMB_FILE_10G);
exit;
}
That's all it takes. The only price I pay is that I'm serving a 10MB file now on some occasions. If I have an article going viral, I decrease it to the 1MB file, which is just as effective.
就这些。我唯一的代价就是现在有时需要提供 10MB 的文件。如果我的文章要火,我会把它压缩到 1MB,效果一样好。
One more thing, a zip bomb is not foolproof. It can be easily detected and circumvented. You could partially read the content after all. But for unsophisticated bots that are blindly crawling the web disrupting servers, this is a good enough tool for protecting your server.
还有一点,Zip 炸弹并非万无一失。它很容易被检测到并规避。毕竟,你只能读取部分内容。但对于那些盲目爬取网页、扰乱服务器的不熟练机器人来说,Zip 炸弹已经足够保护你的服务器了。