by oarth on 7/29/21, 12:12 AM with 346 comments
by Ueland on 7/29/21, 5:33 AM
I managed to find a huge spam network that set up a proxy service that delivered normal content, but injected "you can win an iPhone!" spam to all users visiting them.
Since I was in the position of being able to monitor their proxy traffic towards many sites I managed. I could easily document their behaviour.
In the same time, I wrote a crawler that visited their sites over a long, long time. I learned that they kept injecting hidden links to other sites in their network, so I did let my bot look at those also.
By this time, I also got a journalist with me that started to look at the money flow to try and find the organisation behind it.
My bot found in excess of 100K domains being used for this operation, targeting all of westeren Europe. All the 100K sites contained proxied content and was hidden behind Cloudflare, but thanks to the position I had, I managed to find their backend anyways.
We reported the sites to both CF and Google, and to my knowledge, not a single site were removed before the people behind it took it down.
Oh, and the journalist? He did find a Dutch company that was not happy to see neither him or the photographer :)
by keyme on 7/29/21, 6:58 AM
I've been using Google search for all kinds of research for 15 years. There used to be a time when you could find the answer to pretty much anything. I could find leaked source codes on public FTP servers, links to pirated software and keygens, detailed instructions for a variety of useful things. That was the golden age of the web.
These days, all the "interesting" data on the Internet is all inside closed Telegram chats, facebook groups, Discords or the rare public website here and there that Google doesn't want to index (like sci-hub, or other piracy sites).
The data that remains on SERPs is now also heavily censored for arbitrary reasons. "For your health", "For your protection". Google search is done.
by janmo on 7/29/21, 1:45 AM
For example if you check havfruen4220.dk on archive.org you can see that it appears to have been a legitimate business website before. https://web.archive.org/web/20181126203158/https://havfruen4...
How do they rank so well?
I've checked the domain on ahref and it has almost no backlinks. But if you look closely you will see that all the results that rank very well have been added very recently. On the screenshots in the article you can see things like "for 2 timer siden" which means 2 hours ago. It looks like google is ranking pages that have a very recent publishing date higher.
Edit: Here is what the content of such a site looks like: https://webcache.googleusercontent.com/search?q=cache:Bk0VsM...
by ricardo81 on 7/29/21, 10:10 AM
curl -A 'Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0' 'https://havfruen4220.dk' > 1.html
curl -A 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' 'https://havfruen4220.dk' > 2.html
diff 1.html 2.html 7d6 < <script>var b="https://havfruen4220.dk/3_5_no_14_-__1627553323/gotodate"; ( /google|yahoo|facebook|vk|mail|alpha|yandex|search|msn|DuckDuckGo|Boardreader|Ask|SlideShare|YouTube|Vimeo|Baidu|AOL|Excite/.test(document.referrer) && location.href.indexOf(".") != -1 ) && (top.location.href = b); </script>
by NorwegianDude on 7/29/21, 1:07 AM
Would be interesting to see the actual content. Based on the small snippets in the search results, it takes content from other sites, like large Norwegian news sites, and somehow outranks them hard.
I wonder what the Google Search Console looks like for that domain, considering that it's probably getting millions worth of free traffic.
EDIT: After looking more at it, it's insane how much it ranks for and how well. Straight up brand names seems to be the hardest to compere with, at least larger ones. Those seems to be around page 4-5 for me.
Some brands I was unable to find at all, but ironically another .dk domain showed up in it's place that did the same thing. There is also some .it domains using the same content.
I've found that it takes contents from multiple sources and glues it together in sometimes great ways. Like one sentence from this page, another thing from that page.
Maybe this is some ML that collects content and pieces a lot of it together sentences or half sentences to one large article? It's clearly from completely different sources, but about the same thing.
Example: "wash car"
Result in google: "A dark winter with snow and salt is hard on the car, and it's extra important to wash the car" - Collected from one article.
<some other text>
"Keep the pressure washer at 30-50 cm from the car..." - From another article.
Ironically, there is like 11 results all tied to this thing outranking the original articles(those are last), even if it's medium to large sized well known companies selling for billion(s) of dollars each year in Norway.
Sometimes it goes from one thing and switches to something completely unrelated, so I guess the spammers still have something to improve.
Weird.
by weird-eye-issue on 7/29/21, 2:32 AM
Ahrefs: 230k organic traffic valued at $124k SEMRush: 558k organic traffic valued at $355k
These are estimates and can be widely under or overestimated but they show that this is happening on a very large scale.
For a quick idea on how this is possible I looked at their top pages (according to Ahrefs). Their top page is ranking #2 for the keyword "interia" which has 207k searches per month in Norway and is rated as 0 (out of 100) for being easy to rank for. Usually when a keyword has that amount of searches it would be incredibly hard to rank for, I've never seen anything like this. So what is happening here looks like they are just taking advantage of a market with really low competition keywords.
by gnyman on 7/29/21, 6:35 AM
And add to that the some link spam and preventing the visitors to return not get any bounce back...
Either way, I can't help to be a bit impressed by the SEO spammers outsmarting the people at Google. (Edit: and I don't mean to say they are smarter or anything, just that they only need to find one weakness in the algorithm while the people working to improve it needs to make it works for everything.)
by monday_ on 7/29/21, 5:43 AM
by Schnurpel on 7/29/21, 1:47 PM
by bigpeopleareold on 7/29/21, 5:22 AM
edit: one other thing I have seen, but it doesn't mean it is always spam. All The Words In A Title Are Capitalized - it's something to pay attention to whether it is spam or not. Conventionally, titles are usually not like that in Norwegian.
by the_biot on 7/29/21, 6:48 AM
by dhosek on 7/29/21, 2:42 AM
by matsemann on 7/29/21, 7:22 AM
Can't fathom Google not catching this..
by bash-j on 7/29/21, 2:13 AM
by hayksaakian on 7/29/21, 12:52 AM
It seems like they've manipulated rankings by locking people in to reduce their bounce-back stats (in addition to keyword-stuffed content)
by rwmj on 7/29/21, 8:01 AM
This has been going on for years now, so I don't have much confidence that Google is able or willing to fix it.
by l0b0 on 7/29/21, 12:46 AM
by hoppla on 7/29/21, 9:53 AM
by wdrw on 7/29/21, 1:45 AM
by fny on 7/29/21, 1:46 AM
by mmaunder on 7/29/21, 1:25 AM
by nolito on 7/29/21, 8:20 AM
At 2. juli 2021
Thats pretty fast to work so well. But i see lots of this, with other domains, when searching and have done for years so nothing new here i think.
by Matsta on 7/29/21, 3:51 PM
If the redirect is done as a meta refresh, then you can block it in your robots.txt from being picked up from SEO tools like Ahrefs, SEMRush etc.
These types of sites are called doorway pages and have been around for ages. They are most popular in Russia and on Yandex, but you do see them on Google for super longtail keywords with 0 competition.
The other important thing to remember is that doing SEO in any language that's not English is a walk in the park. Lots of SEO influencer types have case studies showing how much extra traffic they get by translating their content. [1]
by belter on 7/29/21, 10:57 AM
They can be easily traced to a block of flats in Latvia but since their registered phone its a Toy Store in Riga...I am going to go with probably stolen identify operation and a sense of humour on their part instead of the real operation of some 12 year in Riga...
by agency on 7/29/21, 2:00 AM
by kostecki on 7/29/21, 1:20 AM
by ocdtrekkie on 7/29/21, 3:27 PM
An SEO-fighting Googler might at a glance have no reason not to think that could be a really relevant or popular site in your country.
by rapind on 7/29/21, 6:31 AM
God I hope not. If Google does do this, it sounds like a really dumb idea, which will ultimately create widespread usability issues. I can already envision SEO consultants recommending this for their clients if this is believed.
Doesn’t look like it according to https://www.seroundtable.com/google-browser-back-button-rank...
by evolve2k on 7/29/21, 1:49 AM
by franze on 7/29/21, 11:40 AM
"We help you to receive high-quality visitors from search engines, generate conversions and build your brand. To achieve these results, we ensure your website / company is recommended for specific keywords by the search engine's autocomplete function."
by pope_meat on 7/29/21, 1:02 AM
by gonab on 7/29/21, 12:01 PM
by zulrah on 7/29/21, 11:55 AM
by kristofferR on 7/29/21, 3:48 AM
by knolax on 7/29/21, 3:16 AM
by yfkar on 7/29/21, 5:22 AM
by tvirosi on 7/29/21, 8:20 AM
by Crazyontap on 7/29/21, 4:50 AM
by fleddr on 7/29/21, 2:41 PM
by nkozyra on 7/29/21, 1:57 AM
Surely Google does this, right? Given that - in theory - showing different content to Google versus non-Google should result in a penalty, anyway ...
by qwerty456127 on 7/29/21, 8:09 AM
by cnxsoft on 7/29/21, 2:02 PM
by StreamBright on 7/29/21, 6:44 AM
by qwerty456127 on 7/29/21, 8:13 AM
by algismo on 7/29/21, 10:58 AM
Recommend to switch to DuckDuckGo:)
by golergka on 7/29/21, 8:30 AM
by tikiman163 on 7/29/21, 2:39 PM
Are they potentially doin harm? Sure. Have the successfully managed to trick anybody with this? I'd be extremely surprised if they're getting more than a dozen people clicking through from being the ninth result in a day,and when people see they've been redirected to an advertisement the majority of people immediately click away.
This isn't like clicking on a fake prorn site that redirects to cam girls with viruses hidden in all the downloads. It's random unrelated searches redirecting you to blatant ads for cryto currency. The kind of people who are young enough to know what crypto currency is and how to buy it, also know how to spot a redirect to a fake website.
by onepunchedman on 7/29/21, 3:24 AM
by sublimefire on 7/29/21, 1:06 PM
## read robots.txt `curl 'https://havfruen4220.dk/robots.txt'`
## use pointer to a sitemap.xml
curl -A 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' 'https://havfruen4220.dk/sitemap-no.xml' > sitemap.xml
## read more sitemaps
curl -A 'Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' 'https://havfruen4220.dk/sitemap-no-1.xml' > sitemap1.xml
Other sitemaps contain a pointer to a "webpage" eg: https://havfruen4220.dk/no/7a28855e4714dd14
## read web pages
Each location in a sitemap has a "lastmod" of today/yesterday so bot returns there everyday. In addition each webpage has a "<meta name="robots" content="noarchive">"
But if you visit each of those pages then it shows you a cartoon image. It seems the actual indexed content is visible only to the bot.
## But how is actual content being rendered?
The question is, what conditions (request params/headers) result in the actual content being rendered? The bot needs to evaluate it. Suspect it is some combo of checking if the requester is an actual google bot, maybe by looking up the IP https://developers.google.com/search/docs/advanced/crawling/...
by cratermoon on 7/29/21, 3:22 PM
by tapland on 7/29/21, 2:28 AM
by techaddict009 on 7/29/21, 6:36 AM
by fergie on 7/29/21, 6:08 AM
by siproprio on 8/1/21, 3:50 AM
For example, Microsoft routinely deletes negative feedback from GitHub issue for vs code.
by ubercore on 7/29/21, 9:46 AM
by onepunchedman on 7/29/21, 3:23 AM
by punnerud on 7/29/21, 4:37 AM
by classified on 7/29/21, 11:33 AM
by manceraio on 7/29/21, 2:13 PM
by claroclinic on 7/29/21, 6:21 AM
by rataata_jr on 7/29/21, 6:42 AM
by mlang23 on 7/29/21, 6:45 AM
by chovybizzass on 7/29/21, 12:14 AM
by Goety on 7/29/21, 2:49 AM
by jessaustin on 7/29/21, 1:12 AM
by londons_explore on 7/29/21, 7:39 AM
Anyone who can do that can rank as high as they like for any search query.
by paxys on 7/29/21, 1:18 AM