from Hacker News

The mermaid is taking over Google search in Norway

by oarth on 7/29/21, 12:12 AM with 346 comments

by Ueland on 7/29/21, 5:33 AM
I have some experience on this field. Around two years ago i was a DevOp for the company running Dagbladet, Norways #2 newspaper. One of the things I did was keep an eye on mysterious traffic.
I managed to find a huge spam network that set up a proxy service that delivered normal content, but injected "you can win an iPhone!" spam to all users visiting them.
Since I was in the position of being able to monitor their proxy traffic towards many sites I managed. I could easily document their behaviour.
In the same time, I wrote a crawler that visited their sites over a long, long time. I learned that they kept injecting hidden links to other sites in their network, so I did let my bot look at those also.
By this time, I also got a journalist with me that started to look at the money flow to try and find the organisation behind it.
My bot found in excess of 100K domains being used for this operation, targeting all of westeren Europe. All the 100K sites contained proxied content and was hidden behind Cloudflare, but thanks to the position I had, I managed to find their backend anyways.
We reported the sites to both CF and Google, and to my knowledge, not a single site were removed before the people behind it took it down.
Oh, and the journalist? He did find a Dutch company that was not happy to see neither him or the photographer :)
by keyme on 7/29/21, 6:58 AM
Google search has progressively deteriorated in quality over the last 10 years, to the point where I see it becoming useless in the relatively near future. And it's mainly not even their fault.
I've been using Google search for all kinds of research for 15 years. There used to be a time when you could find the answer to pretty much anything. I could find leaked source codes on public FTP servers, links to pirated software and keygens, detailed instructions for a variety of useful things. That was the golden age of the web.
These days, all the "interesting" data on the Internet is all inside closed Telegram chats, facebook groups, Discords or the rare public website here and there that Google doesn't want to index (like sci-hub, or other piracy sites).
The data that remains on SERPs is now also heavily censored for arbitrary reasons. "For your health", "For your protection". Google search is done.
by janmo on 7/29/21, 1:45 AM
I've seen the same here in Germany but they do appear only if you use the results within the last 24h functionality. It looks like the German content is generated through GPT2 or 3. It makes no real sense if you read it. If you go on the page you are immediately redirected to a scam just like the article mentions. Interestingly they use ".it" domains here. It also looks like the domains might have been hacked or are expired domains that have been bought.
For example if you check havfruen4220.dk on archive.org you can see that it appears to have been a legitimate business website before. https://web.archive.org/web/20181126203158/https://havfruen4...
How do they rank so well?
I've checked the domain on ahref and it has almost no backlinks. But if you look closely you will see that all the results that rank very well have been added very recently. On the screenshots in the article you can see things like "for 2 timer siden" which means 2 hours ago. It looks like google is ranking pages that have a very recent publishing date higher.
Edit: Here is what the content of such a site looks like: https://webcache.googleusercontent.com/search?q=cache:Bk0VsM...
by ricardo81 on 7/29/21, 10:10 AM
Poor man's cloaking
curl -A 'Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0' 'https://havfruen4220.dk' > 1.html
curl -A 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' 'https://havfruen4220.dk' > 2.html
diff 1.html 2.html 7d6 < <script>var b="https://havfruen4220.dk/3_5_no_14_-__1627553323/gotodate"; ( /google|yahoo|facebook|vk|mail|alpha|yandex|search|msn|DuckDuckGo|Boardreader|Ask|SlideShare|YouTube|Vimeo|Baidu|AOL|Excite/.test(document.referrer) && location.href.indexOf(".") != -1 ) && (top.location.href = b); </script>
by NorwegianDude on 7/29/21, 1:07 AM
I've noticed this daily.
Would be interesting to see the actual content. Based on the small snippets in the search results, it takes content from other sites, like large Norwegian news sites, and somehow outranks them hard.
I wonder what the Google Search Console looks like for that domain, considering that it's probably getting millions worth of free traffic.
EDIT: After looking more at it, it's insane how much it ranks for and how well. Straight up brand names seems to be the hardest to compere with, at least larger ones. Those seems to be around page 4-5 for me.
Some brands I was unable to find at all, but ironically another .dk domain showed up in it's place that did the same thing. There is also some .it domains using the same content.
I've found that it takes contents from multiple sources and glues it together in sometimes great ways. Like one sentence from this page, another thing from that page.
Maybe this is some ML that collects content and pieces a lot of it together sentences or half sentences to one large article? It's clearly from completely different sources, but about the same thing.
Example: "wash car"
Result in google: "A dark winter with snow and salt is hard on the car, and it's extra important to wash the car" - Collected from one article.
<some other text>
"Keep the pressure washer at 30-50 cm from the car..." - From another article.
Ironically, there is like 11 results all tied to this thing outranking the original articles(those are last), even if it's medium to large sized well known companies selling for billion(s) of dollars each year in Norway.
Sometimes it goes from one thing and switches to something completely unrelated, so I guess the spammers still have something to improve.
Weird.
by weird-eye-issue on 7/29/21, 2:32 AM
Some data on their traffic from some SEO tools I pay for:
Ahrefs: 230k organic traffic valued at $124k SEMRush: 558k organic traffic valued at $355k
These are estimates and can be widely under or overestimated but they show that this is happening on a very large scale.
For a quick idea on how this is possible I looked at their top pages (according to Ahrefs). Their top page is ranking #2 for the keyword "interia" which has 207k searches per month in Norway and is rated as 0 (out of 100) for being easy to rank for. Usually when a keyword has that amount of searches it would be incredibly hard to rank for, I've never seen anything like this. So what is happening here looks like they are just taking advantage of a market with really low competition keywords.
by gnyman on 7/29/21, 6:35 AM
Pet theory (disclaimer that I know very little about SEO) would be that the website with the cloned content loads fast and does not load 4 MiB of javascript, thus beating the original content in ranking mostly because of the speed, which is I believe a important factor in Google rankings (and getting more important).
And add to that the some link spam and preventing the visitors to return not get any bounce back...
Either way, I can't help to be a bit impressed by the SEO spammers outsmarting the people at Google. (Edit: and I don't mean to say they are smarter or anything, just that they only need to find one weakness in the algorithm while the people working to improve it needs to make it works for everything.)
by monday_ on 7/29/21, 5:43 AM
Not sure how relevant this is, but the animal characters in the top image are from a Russian children hit cartoon "The Smesharicks" (literally "The Laughballs").
by Schnurpel on 7/29/21, 1:47 PM
If I would run a global infrastructure company like Cloudflare, I also would not take any sides, and leave my service open to anyone. The world is full of people who get upset about something. However, if I declare a hands-off policy, it must be truly hands-off. Cloudflare kicked off Switter https://www.theverge.com/2018/4/19/17256370/switter-cloudfla..., it banned 8Chan https://blog.cloudflare.com/terminating-service-for-8chan/ , it banned the Hacker News https://mobile.twitter.com/thehackersnews/status/66900183605... . That’s not how hands-off works.
by bigpeopleareold on 7/29/21, 5:22 AM
I hate dealing with this and now refuse to use Google now when I saw patterns in search results while I was researching common things (like housing) in Norwegian, here in Norway. I rarely use Google these days, but I thought for a second that Google might be better with search results than DDG in Norwegian, but this stuff is aggravating. This is one of those where they screw around with history that you just have to start fresh again on whatever you were doing instead of going back.
edit: one other thing I have seen, but it doesn't mean it is always spam. All The Words In A Title Are Capitalized - it's something to pay attention to whether it is spam or not. Conventionally, titles are usually not like that in Norwegian.
by the_biot on 7/29/21, 6:48 AM
For all that Google search has been utterly crap for going on a decade now, I have to admit part of the reason is that they get targetted relentlessly by SEO spam operations like this. I like DuckDuckGo for now, but I imagine as they get bigger they're going to be a target for these kinds of spam just the same.
by dhosek on 7/29/21, 2:42 AM
The ones thing I want more than anything from google or DuckDuckGo or anyone really is the ability to give a list of domains and never have their results show up in my searches. I know I can do this on a per search basis but I want it to be a configurable setting.
by matsemann on 7/29/21, 7:22 AM
Yeah, I've seen this domain a lot lately. But I've complained about the Norwegian results for years [0]. For most searches there will be a result that's just keyword spam ranking high. Retried my "pes anserinus bursitt" search now 2 years later, and two results are spam from havfruen, and there are some other results from https://no.amenajari .org which is also just translated and scraped content for all languages google seems to love, as I've seen it for years. A third domain I often see as well is "nem-varmepumper". Apparently a site about heat pumps has content on everything.
Can't fathom Google not catching this..
[0]: https://news.ycombinator.com/item?id=21621099
by bash-j on 7/29/21, 2:13 AM
The last time I accidentally installed malware on my computer was when the top Google result pointed me to a site masquerading as the official site for the software. That thought me a lesson to pay attention to the domain name.
by hayksaakian on 7/29/21, 12:52 AM
Interesting because it shows that bounce-back is a more significant ranking factor than before.
It seems like they've manipulated rankings by locking people in to reduce their bounce-back stats (in addition to keyword-stuffed content)
by rwmj on 7/29/21, 8:01 AM
I've also seen this, but from a different side. I have Google Alerts for many open source projects that I run, but in the past few years these alerts have become all but useless. Spammers scrape genuine pages from all over the place (including ones containing references to my projects) and put them into scammy ".it" domains. These appear both in Google Alerts and high up in Google Search. So alerts and search both become useless. The scam appears to be that when you visit these web pages they say you're the billionth (or whatever) visitor to Google and you've won a prize, just type in your bank details.
This has been going on for years now, so I don't have much confidence that Google is able or willing to fix it.
by l0b0 on 7/29/21, 12:46 AM
WHOIS shows it's registered four weeks ago by someone in Riga, Latvia.
by hoppla on 7/29/21, 9:53 AM
The recaptcha process should be reversed. The sites should prove to humans that it’s content is not generated by bots.
by wdrw on 7/29/21, 1:45 AM
Interesting, the image seems to contain characters from a Russian childrens' cartoon ( https://en.wikipedia.org/wiki/Kikoriki )
by fny on 7/29/21, 1:46 AM
Somewhat related: has anyone else noticed a massive change in breadth of results? I was searching for reviews for diving equipment and some less niche items and I feel like I'm being spoonfed results from the same comparison engines. Since when did algo content become king?
by mmaunder on 7/29/21, 1:25 AM
Catch22 though. If you eliminate bounce back, you have to rank to get the ranking signal into Google. So how did they rank in the first place? I haven’t tried to reverse what they’re doing but I don’t think the author quite figured it out. Interesting phenomenon though.
by nolito on 7/29/21, 8:20 AM
According to DK-hostmaster (https://www.dk-hostmaster.dk/da/find-domaenenavn) its registered to Ance Dzerina. Ieriku iela 37, dz. 32, LV-1084 Riga, Letland
At 2. juli 2021
Thats pretty fast to work so well. But i see lots of this, with other domains, when searching and have done for years so nothing new here i think.
by Matsta on 7/29/21, 3:51 PM
I had a look at this, and it looks to me like it's a 301 from another domain. Typically when domains get a manual penalty (primarily for spam), they drop in rankings overnight. So to counter this, you register a new domain and redirect it and overnight, your rankings bounce back. This technique is super common for blackhat sites like illegal streaming sites.
If the redirect is done as a meta refresh, then you can block it in your robots.txt from being picked up from SEO tools like Ahrefs, SEMRush etc.
These types of sites are called doorway pages and have been around for ages. They are most popular in Russia and on Yandex, but you do see them on Google for super longtail keywords with 0 competition.
The other important thing to remember is that doing SEO in any language that's not English is a walk in the park. Lots of SEO influencer types have case studies showing how much extra traffic they get by translating their content. [1]
[1] https://neilpatel.com/blog/seo-trend/
by belter on 7/29/21, 10:57 AM
The mermaid mentioned in the article seems to be either a terribly amateurish operation or a very sophisticated sting.
They can be easily traced to a block of flats in Latvia but since their registered phone its a Toy Store in Riga...I am going to go with probably stolen identify operation and a sense of humour on their part instead of the real operation of some 12 year in Riga...
by agency on 7/29/21, 2:00 AM
This is only tangentially related but has anyone else started getting more obviously spam emails in their gmail inbox lately? I feel like for a long time I never got spam in my inbox but lately I’ll get ones that seem like they should be easy to detect, talking about gifts and stuff and uSiNg wEirD capitals or s p a c i n g. Is it just me?
by kostecki on 7/29/21, 1:20 AM
Interesting that Latvians picked a danish domain for norwegian content. Especially since you can't just hide behind domain privacy protection.
by ocdtrekkie on 7/29/21, 3:27 PM
My guess is they get away with it because it's a non-English query and most of the people working on these problems aren't looking at their localization. A big issue in general for global tech companies is that they don't usually handle things outside the US/English context particularly well. This often crops up in that political space, where for instance, something contentious like gun sales might get pulled from Google globally even though the political concern with them is mostly limited to the US.
An SEO-fighting Googler might at a glance have no reason not to think that could be a really relevant or popular site in your country.
by rapind on 7/29/21, 6:31 AM
> I think that Google uses stats on whether the user continued checking more results for that specific search query to determine if the visited result answered the user.
God I hope not. If Google does do this, it sounds like a really dumb idea, which will ultimately create widespread usability issues. I can already envision SEO consultants recommending this for their clients if this is believed.
Doesn’t look like it according to https://www.seroundtable.com/google-browser-back-button-rank...
by evolve2k on 7/29/21, 1:49 AM
Before I accessed the article I was hopeful from the title that “The Mermaid” was some hot new search engine out of Norway.
by franze on 7/29/21, 11:40 AM
In a similar note: https://www.autosuggest.net/ currently approaching a lot of websites in the german market.
"We help you to receive high-quality visitors from search engines, generate conversions and build your brand. To achieve these results, we ensure your website / company is recommended for specific keywords by the search engine's autocomplete function."
by pope_meat on 7/29/21, 1:02 AM
Gotta give it to these folks, good hustle.
by gonab on 7/29/21, 12:01 PM
Google has a problem when HN becomes an issue tracker
by zulrah on 7/29/21, 11:55 AM
I've noticed another trend recently where it seems that some websites write content for google SEO instead of optimized for human readability. E.g.: I've seen my exact search phrase repeated mutiple times and then a very long article about the topic when what I searched was a simple question with a few words answer.
by kristofferR on 7/29/21, 3:48 AM
Yeah, I experienced this same spam domain for some searches I did yesterday. It's everywhere.
by knolax on 7/29/21, 3:16 AM
More reasons why a global search monopoly is suboptimal. Smaller markets like this are just going to get neglected and maintained just enough that a better alternative can't compete. Google search is basically useless for any language other than English.
by yfkar on 7/29/21, 5:22 AM
I've lately noticed that searching Google for topics related to gardening in Finnish often gives me some scraped and machine translated pages from Russia. Really annoying that totally useless content is so high up in the results.
by tvirosi on 7/29/21, 8:20 AM
Google search seems to have gotten significantly worse lately (sometimes to the point that it's barely usable). From scams like these (I've seen others) somehow getting a foothold, to a lot of internal "unbiasing" skewing the results towards googles political stance (usually totally irrelevant to my query). It's gotten to the point that I barely google anymore other than for things I already know what the results will be.
by Crazyontap on 7/29/21, 4:50 AM
Can somebody else who is in Norway can confirm this? It could be simply be a malware injecting this. Would be great to eliminate this possibility
by fleddr on 7/29/21, 2:41 PM
Makes you wonder what happens when AI can write "passing" articles. Useless to the reader, but too close to tell for the crawler.
by nkozyra on 7/29/21, 1:57 AM
> The simple solution would be to test sites regularly with an unknown IP and common user agent to check that a site isn’t just showing content to Google and gives real users something completely different. That would stop this.
Surely Google does this, right? Given that - in theory - showing different content to Google versus non-Google should result in a penalty, anyway ...
by qwerty456127 on 7/29/21, 8:09 AM
For every country/market somebody should better make a search engine to compete with Google. Now this is a chance for Norway.
by cnxsoft on 7/29/21, 2:02 PM
Google is garbage. I once complained a website stealing my contents and other people's contents was ranking very highly in Google. I was told I'd better fixed my website before looking at "competitors". Part of that was true, but at the time the person did not seem to care at all of spammy content delivered by Google.
by StreamBright on 7/29/21, 6:44 AM
Same in Hungarian. Google is full of spam and nobody cares. The top hits are auto-translated garbage for many searches.
by qwerty456127 on 7/29/21, 8:13 AM
I'm surprised to find out people actually return to the search results page using the back button. Whenever I am serious enough (enough to keep looking after the first link I click does not satisfy me) about finding something I always Middle-Click or Ctrl+Click the links to open them in new tabs.
by algismo on 7/29/21, 10:58 AM
Just tried Google.no from my computer (Norwegian IP (Larvik area)). Nothing similar. I see “normal” search results. In any way, I stopped using Google stuff 5 years ago. Never looked back since then, so my search history is kind of clean, maybe that changes their algorithm behavior.
Recommend to switch to DuckDuckGo:)
by golergka on 7/29/21, 8:30 AM
This image features characters from Smeshariki animation series, hugely popular in Russia in the last 15 years.
by tikiman163 on 7/29/21, 2:39 PM
I'm kind of curious why he's so concerned about this? They've never managed better than ninth most relevant and in most cases they didn't even make the first page of result. Any advertising person will tell you, if you aren't in the top 3 results (basically the top result now that paid ads automatically get the top 2 spots on nearly all searches) your odds of being seen and clicked on drop to almost nothing.
Are they potentially doin harm? Sure. Have the successfully managed to trick anybody with this? I'd be extremely surprised if they're getting more than a dozen people clicking through from being the ninth result in a day,and when people see they've been redirected to an advertisement the majority of people immediately click away.
This isn't like clicking on a fake prorn site that redirects to cam girls with viruses hidden in all the downloads. It's random unrelated searches redirecting you to blatant ads for cryto currency. The kind of people who are young enough to know what crypto currency is and how to buy it, also know how to spot a redirect to a fake website.
by onepunchedman on 7/29/21, 3:24 AM
The language in those scam articles is actually perfect, first time I've seen that.
by sublimefire on 7/29/21, 1:06 PM
It is interesting as you cannot see the content which is being indexed. Suspect only bot does. If I understand correctly this is the sequence of events from the bot's perspective:
## read robots.txt `curl 'https://havfruen4220.dk/robots.txt'`
## use pointer to a sitemap.xml
curl -A 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' 'https://havfruen4220.dk/sitemap-no.xml' > sitemap.xml
## read more sitemaps
curl -A 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' 'https://havfruen4220.dk/sitemap-no-1.xml' > sitemap1.xml
Other sitemaps contain a pointer to a "webpage" eg: https://havfruen4220.dk/no/7a28855e4714dd14
## read web pages
Each location in a sitemap has a "lastmod" of today/yesterday so bot returns there everyday. In addition each webpage has a "<meta name="robots" content="noarchive">"
But if you visit each of those pages then it shows you a cartoon image. It seems the actual indexed content is visible only to the bot.
## But how is actual content being rendered?
The question is, what conditions (request params/headers) result in the actual content being rendered? The bot needs to evaluate it. Suspect it is some combo of checking if the requester is an actual google bot, maybe by looking up the IP https://developers.google.com/search/docs/advanced/crawling/...
by cratermoon on 7/29/21, 3:22 PM
The Norwegian pinterest
by tapland on 7/29/21, 2:28 AM
I imagine it's done in a similar way to how reddit circumvents searching for results from certain dates. I don't like anyone messing with google results.
by techaddict009 on 7/29/21, 6:36 AM
Someone has probably found some kind of SEO Hack or Some 0 Day in Google serp. There are plenty of .it domains doing similar in Google USA serps.
by fergie on 7/29/21, 6:08 AM
Norwegian here- I haven't seen this at all- maybe the author has been somehow "fingerprinted" and targeted?
by siproprio on 8/1/21, 3:50 AM
I've seen this trend in other places too:
For example, Microsoft routinely deletes negative feedback from GitHub issue for vs code.
by ubercore on 7/29/21, 9:46 AM
FWIW, I just tried these searches (am in Norway) and didn't see that domain in the results.
by onepunchedman on 7/29/21, 3:23 AM
Wow, the Norwegian on those scam web sites is actually perfect. Never seen that before.
by punnerud on 7/29/21, 4:37 AM
I live in Norway and don’t have this problem now. I had a similar problem about a year ago on my MacBook Air because of some software that altered my Google results in all of my browsers. Don’t remember the name of it, but something smelled fishy when the results was different from the ones on my phone.
by classified on 7/29/21, 11:33 AM
If the mermaid took it, does that mean Google search is resting with the fishes?
by manceraio on 7/29/21, 2:13 PM
They will get probably outranked on the next big Google update.
by claroclinic on 7/29/21, 6:21 AM
Well this is happening in all countries
by rataata_jr on 7/29/21, 6:42 AM
Havfruen, brought to you by mountain trolls from Finmark.
by mlang23 on 7/29/21, 6:45 AM
It seems google has lost its ability to block spam effectively. Since a few months, I notice an increase amount of outright scam being promoted on YT. I even got a ad for a fake Musk telling people to invest in a shady bitcoin scheme. Knowing that Google is willing to let these slip through just to maximize their ad revenue is really a warning sign that this company, no matter how large it might be by now, should not be trusted anymore.
by chovybizzass on 7/29/21, 12:14 AM
I've been using https://search.brave.com for a few weeks. Most of the time I find what I need.
by Goety on 7/29/21, 2:49 AM
I will remain steadfast in my support from Google forever and always.
by jessaustin on 7/29/21, 1:12 AM
TFA talks about Google testing with "unknown IP", but doesn't mention any testing done by the author with cookies cleared or in incognito mode. This seems basic.
by londons_explore on 7/29/21, 7:39 AM
It's the hooking of the browser back button in a way that Google does not detect which is the real 'trick'.
Anyone who can do that can rank as high as they like for any search query.
by paxys on 7/29/21, 1:18 AM
I doubt it's some crazy sophisticated SEO hijacking operation. Probably a result of a small data set (Norwegian language web pages), specific search terms (Norwegian brands, companies), and lots of keyword stuffing. Most of the examples the author pointed out were from pages 5-10 of Google results, which are probably worthless for ad revenue anyways.