from Hacker News

We should promote more personal indexing, rather than algorithmic indexing

by dfps on 10/25/23, 3:25 PM with 22 comments

There are high quality original sources writing information today, just like there was before Web 2.0, when people would go on the internet to learn things and spend actual quality time reading blogs and personal content as well (which wasn't written for virality, but for expression).

However, while original sources (NASA, Reuters, bloggers, authors, scholarly journals) still write and publish (including to social media), the viral content makers, a second tier of people who write about what the original author wrote about, specializing themselves for social media, just write the grabbiest headline, and the most engaging (infuriating, polarizing, salacious) version of a piece of the original content, and when we go to our platforms, these are the content examples we see and read.

The original source's publications (posts) become almost invisible. The source becomes almost unknown as a source.

Engagement algorithms can't help this, because this is actually their purpose, which is anti-original content and -quality content.

We should design platforms and systems that allow people to index their own content again (as was the case before Web 2.0 when people manually put links to other websites, blogs, organizations, and articles on their own websites. This would make original and quality content writers become visible and indexed (even just in people's worldviews, not just indexed on the internet) and make viral content makers more invisible.

It wouldn't be total, because many people prefer the emotionality of viral content, but it would at least create an internet where there was more value available.

by jasode on 10/25/23, 4:38 PM
>(NASA, Reuters, bloggers, authors, scholarly journals) still write and publish (including to social media), the viral content makers, a second tier of people who write about what the original author wrote about,
I like original primary sources for some topics and secondary sources for others. For programing topics, I'm ok reading the original papers.
But for other topics ... say "civil engineering" ... I prefer a "popularizer" like Grady Hillhouse's "Practical Engineering". His 15-minute presentations are the right amount of depth for exposing me to various city infrastructure topics. I'm not going to pretend I'd be interested in reading original scholarly journals from civil engineers. I deliberately outsource that to Grady. Hardcore engineers may complain that infotainment/edutainment is "shallow learning" but people have to strategically limit themselves to "shallow" explanations of some topics so they can spend more time to deep dive into other specialized areas of interest.
The "viral content makers" serve a useful purpose in the ecosystem to satisfy varying levels of interest. Therefore, a search engine that optimized for original academic papers instead of Grady blog posts when I ask "How does a city manage stormwater runoff?" -- would not be helpful to me in most cases. I dare say a "general" search engine that didn't put academic papers on page 1 of search results would be preferred by most people.
by rsync on 10/25/23, 5:18 PM
I'm open minded but ...
Comparing the atrocious search landscape of 2023 to personal indexing is to compare poorly.
Instead: those who lived through it should compare personal indexing to the golden years of altavista(.digital.com) and the extremely powerful and unpolluted results it produced and modified with boolean operators.
I never once thought that there was some utility left to be mined from the Yahoo approach to things once I switched to Altavista. I think we're only pining for personal indexing in comparison to the garbage that is Google in 2023.
by sharemywin on 10/25/23, 4:07 PM
The problem isn't black and white. Discussion adds value, if you didn't think so you wouldn't have posted here. so if not black and white then your talking about subtle and degree which is hard for a machine to determine. Also, most people engage with narratives/stories not facts.
I'm trying to downplay your comments, just its hard to start a social media company. there's a reason it's considered a tarpit idea.
by nh23423fefe on 10/25/23, 4:30 PM
Proposing a fake moralizing solution to a non problem is pointless. You ignore incentives and structures that make the world the way it is and falsely assert a superior past that never existed.
Inventing a fake metric like "quality" and deprioritizing "engagement" is just messaging.
by eneuman on 10/25/23, 5:23 PM
There have been a few attempts at a crowdsourced-rank search engine (which is similar to what you're suggesting - people indexing the content), but it seems to be a hard cookie, most of the examples of similar ideas I could find on ProductHunt or ShowHN seem dead:
https://payperrun.com/%3E/search?displayParams={%22q%22:%22c...
(btw, I just launched this llm-embedding based search service that lets you check if a startup idea has already been tried/failed).
I don't know if this idea has a higher death rate than the baseline, but my guess is Google/PageRank is good enough for most use-cases, and then if you want quality sources, you can just follow them on YouTube, Twitter, Instagram, etc. Wait, maybe I shouldn't try to compete with Google?
by Tomte on 10/26/23, 7:35 AM
Something that academia has (but doesn't adequately "reward") is literature surveys. A fantastic way to get a short abstract on pretty much all extant studies, papers, books and conference proceedings in a very specific field (or niche thereof) from experts, that you would never find in your limited time.
by renegat0x0 on 10/25/23, 9:14 PM
People often claim that Google broke the internet, but I do not think it did. If any other company succeeded it would also rely on some sort of algorithmic SEO. Most often on results are shown google services, and news sites. Sure you can play with your searches to find what you need. People started using social platforms, so forums became less attractive. Hobbyist found Facebook groups. It is not viable to host a fan page any more. You will not receive any traffic. Google also became lazy with their search. Internet became walled Gardena, and I am not sure if everything within walled gardens can be easily ranked right now.
It was not Google who destroyed the Search, it was the Internet that has changed
by h2odragon on 10/25/23, 3:48 PM
> people manually put links to other websites, blogs, organizations, and articles on their own websites.
i have near 3 years worth of this at https://snafuhall.com/
doing my part.
by ghaff on 10/25/23, 5:30 PM
See shareable bookmarking services, tagging/folksonomies, etc. (e.g. pinboard.in). It was one of those Web 2.0 ideas that is still around but never took off in a big way. (Also as other have noted: web rings, blogrolls, etc.)
by raytopia on 10/25/23, 5:12 PM
Web rings and web directories are also good solutions to this makes it easy to find content you actually like.
Also maybe hackernews should do a monthly post your blog/website just like the monthly job posts.
by bratbag on 10/25/23, 4:19 PM
So once I have auto generated a 10k accounts using modern generative ai and then used them to make my content go viral with apparently personal indexing, what's your next step?
by jasfi on 10/25/23, 5:26 PM
They game the system because it's worth a lot of money to win. That is, to rank as high as possible for the keywords they optimize for.
But I think you have a good point.
by jruohonen on 10/25/23, 3:28 PM
Seconded.
by PaulHoule on 10/25/23, 5:56 PM
It's a big question on my mind now. I am thinking about adding something to my YOShInOn RSS reader that picks up articles from, say, ScienceX, looks up original sources, automatically tests that if papers are open access and gives me the tools to quickly make a judgement call about what kind of link to share on what platform. In the immediate term this will be done after the system chooses articles to show me (about 5-10% of the articles it ingests) but someday it might be done before that.
There is the question of what my judgement is and the question of what your judgement is. The primary selection process (that shows me articles) is great right now, I am upvoting maybe 220 articles out of 300 in a cycle. I pick out links to post to HN and I am currently adding links to the queue faster than I am posting them which means I can definitely raise the quality of what I post but the definition of "quality" is where I get stuck. It has all sorts of factors such as a lack of annoyingness (I hate those cookie popups but there is a lot of good news behind them) but there are also articles that look really good to me at first (I like what they set out do) but then what I look at them again I realize they didn't accomplish what they set out do.
I do think votes and comments are worth something, but I also know that I could get more of both by posting clickbait articles. On one level I want to post things that are enlightening, boy I get frustrated that y'all just don't care about robotics or chemical recycling of polymers or Arduino projects. (Though my real secret ambition is to get a #1 post about sports...)
Somehow I want to pose the problem of posting to HN, Mastodon, etc. as a sequential recommendation problem which means I have to back and look at all the papers on the subject that YOShInOn has collected for me. Also I am likely to put some more work into "quality models", particularly a stacked model for predicting votes if not comments on HN articles, a broad topic model based on data from Tildes (is is sports? music? science?) and particularly sentiment models.
That last one is on my mind because I'm thinking about the emotional tone of what I post to Mastodon, some days I think I should just stick to posting flower photos because they get good engagement, but past the people who are calling everybody a "fascist" that get amplified there is a "silent majority" of people on Mastodon who try to avoid the news and other inflaming topics so I am torn between being unrelentingly positive or trying to balance out positive and negative articles to make a more appealing feed to the good people of Mastodon. It would probably be 2-3 days of labeling work to make a sentiment model but if I had to find and categorize 5000 angry toots it would kill me, but I am thinking now about grading my own posts (what the system is going to do inference on anyway) and also grading high-engagement and predicted high-engagement submissions to HN to make a model that finds high-engagement posts that aren't clickbait.