About all those -k.html backlinks to your website

Since the end of last year, most of my backlinks have originated from websites without any relevant connection to my content. However, they’re suspiciously similar, identified by the suffix of the referring URL being -k.html. Another distinct feature of the backlinks is that they’re only targeting (as in hotlinking) images.

Ahref's listing of *-k.html backlinks.

Should you follow the backlink, you’ll end up on the referring website without seeing any trace of your content. It’s like the backlink never existed in the first place. What’s going on? Are your competitors trying to hurt your website’s reputation by generating toxic backlinks, or is there another sinister scheme at play here?

The websites showing up in your statistics with those -k.html backlinks are compromised. They are now used by scammers to drive traffic to their fake online stores. After a website has been compromised, it will be used to generate fake product listings.

The reason you won’t see your content on those referring websites is that those pages are only for Google and other search engines. They are not meant to be viewed by actual people. Any attempt to access those resources in your browser will only return the “legitimate” website and not the scammer’s dynamically generated content.

To get a look at the actual page behind the backlink, I used cURL to impersonate Googlebot and trick the website into returning what was only meant for search engines. By examining the returned document, I could see the actual product listing with my image on it:

A *-k.html page targeted at search engines

This is what the *-k.html referrer page looks like. If you happen to be Googlebot.

What’s the point of making content only available to search engines?

This is quite the popular approach among criminals. For instance, compromising a WordPress-based website allows you to manipulate and control visitor traffic. You might then redirect inbound traffic based on the user-agent (as in this case) or visitor IP addresses, and serve content based on a preset of rules. Imagine how easily you could identify and subvert popular online safety scanners like Securi’s SiteCheck as can be seen below.

A compromised website easily bypasses the Securi's site check scanner.

It comes down to this

I believe we have established that these compromised websites are being used to dynamically generate content that the perpetrator only wants to be seen and browsed (indexed) by search engines. And that finally brings us to the point of the matter. The main objective here is to inject malicious links into search engine result pages. Whenever someone enters a product query in Google or other search engines, the scammer wants their product links to show up in the result pages.

When our innocent and unknowing visitor clicks on a link to the compromised website, the user gets redirected to the scammer’s online store instead. Interestingly, this webshop follows the same pattern of dynamically generating content based on query string in the URL. Anyhow, should you be unfortunate enough to complete a purchase from the store, PayPal will happily collect your money on behalf of the scammer.

Let’s see how it’s all supposed to work with this little animated gif:

Search redirected from the compromised website to the scammer's online store.

One detail worth noting is that the old original *-k.html backlink suffix has since been randomized.

Who’s behind it all?

Who knows, there are lots of people wanting to make a quick buck on the interwebs. However, the scammer’s webshop injects a javascript that tries to identify any Chinese visitors. If found, they’re redirected to a suspended page instead of being exposed to the scam.

I guess we also could look into who’s providing infrastructure and domains for these scammers, but maybe that’s something for another day.